howthebodyworks + scalability   105

GraphLab: A New Parallel Framework for Machine Learning
Library claiming orders of magnitude speed improvement over hadoop and more flexibility than mapreduce. Already has amazon ec2 images ready to fire up. c++ is the native language; POJO claim to be also a first class citizen, and thus other JVM langauges. Hm.
networks  ec2  amazon  c++  mapreduce  scalability  Java  jython  Python 
august 2011 by howthebodyworks
Spark Cluster Computing Framework
hadoop-alike. Major attractions - uses scala, optimised for RAM-happy operations, fast native iteration, and supports a Google Pregel clone called Bagel.
networks  Java  Jvm  scala  mapreduce  performance  scalability 
june 2011 by howthebodyworks
Research Directions for Machine Learning and Algorithms « Machine Learning (Theory)
it is not enough to have a language for specifying your prior structural beliefs—instead we must have a language for such which results in computationally tractable solutions.
learning  ai  scalability 
june 2011 by howthebodyworks
Distributed Systems - Google Code University - Google Code
handy educational tutes on various concurrency models; a bit of a hadoop emphasis
concurrency  mapreduce  google  scalability  howto  hadoop 
march 2011 by howthebodyworks
pystream - Project Hosting on Google Code
Amazon's recommended GPU thingy for python  - sounds like a GPU accelerated SciPy.
grammarthing  gpu  python  perforamnce  scalability  cuda 
december 2010 by howthebodyworks
Amazon Web Services Blog: New EC2 Instance Type - The Cluster GPU Instance
Not only is the arrival of Tesla GPUs in the cloud very welcome, the intro blog is a great introduction to resources for using them.
gpu  cuda  opencl  performance  scalability 
december 2010 by howthebodyworks
Titus Brown: The sky is falling! The sky is falling!
"As Narayan has eloquently argued many times, it no longer makes sense for most institutions to run their own HPC, if you take into account the true costs of power, AC, and hardware. The only reason it looks like HPCs work well is because of the way institutions play games with funny money (a.k.a. "overhead charges"), channeling it to HPC behind the scenes - often with much politicking involved. If, as a scientist, your compute is "free" or even heavily subsidized, you tend not to think much about it. But now that we have to scale those clusters 10s or 100s or 1000s of X, to deal with data 100s or 1e6s of times as big, institutions will no longer be able to afford to build their own clusters with funny money. And they'll have to charge scientists for the true computational cost of their work -- or scientists will have to use the cloud."
performance  scalability  genetic  academic  methodology  cloud 
october 2010 by howthebodyworks
Smallest possible transparent PNG
Nerdiest blog post ever. 1557 words about how to save 1 byte in your transparent PNGs.
png  compression  performance  scalability 
august 2010 by howthebodyworks
PyCUDA | Andreas Klöckner's web page
"PyCUDA lets you access Nvidia‘s CUDA parallel computation API from Python. Several wrappers of the CUDA API already exist–so what's so special about PyCUDA?

Object cleanup tied to lifetime of objects. This idiom, often called RAII in C++, makes it much easier to write correct, leak- and crash-free code. PyCUDA knows about dependencies, too, so (for example) it won’t detach from a context before all memory allocated in it is also freed.
Convenience. Abstractions like pycuda.driver.SourceModule and pycuda.gpuarray.GPUArray make CUDA programming even more convenient than with Nvidia’s C-based runtime.
Completeness. PyCUDA puts the full power of CUDA’s driver API at your disposal, if you wish.
Automatic Error Checking. All CUDA errors are automatically translated into Python exceptions.
Speed. PyCUDA’s base layer is written in C++, so all the niceties above are virtually free.
Helpful Documentation."
oop  performance  scalability 
june 2010 by howthebodyworks
Guest post: Why you should track page views with MongoDB - The Future of Event Management
google analytics considered as inferior to mongodb for pageview tracking. (query API-wise, i agree. client-side, not so sure.)
scalability  analytics  browser  mongodb 
june 2010 by howthebodyworks
Pypes - Flow Based Programming
max/msp style flow control programming for python, with concurrency and weirdness. Seems to be an open-source yahoo pipes re-implementation.
ui  browser  performance  python  scalability  opensource  concurrency 
june 2010 by howthebodyworks
High Scalability - High Scalability - 7 Lessons Learned While Building Reddit to 270 Million Page Views a Month
how a massive site like reddit scales is pretty instructive. spoiler: cache everything. memoise functions
reddit  python  memcache  scalability  via:datakid 
may 2010 by howthebodyworks
Green Unicorn - Welcome
Green Unicorn (gunicorn) is an HTTP/WSGI Server for UNIX designed to serve fast clients or sleepy applications.

This is a port of Unicorn in Python.
nginx  python  scalability  performance  django  grammarthing  greenlet  event 
april 2010 by howthebodyworks
uWSGI
uWSGI is a fast (pure C), self-healing, developer-friendly WSGI server, aimed for professional python webapps deployment and development. Over time it has evolved in a complete stack for networked/clustered python applications, implementing message/object passing and process management. It uses the uwsgi (all lowercase) protocol for all the networking/interprocess communications. From the 0.9.5 release it includes a plugin loading technology that can be used to add support for other languages or platform. A Lua wsapi adaptor, a PSGI handler and an Erlang message exchanger are already available.
nginx  python  scalability  performance  django 
april 2010 by howthebodyworks
swarm-dpl - Project Hosting on Google Code
intriguing approach to distributed high performance computing, where continuations are passed about as needed to scale to the computation
java  scala  scalability  agents  simulation 
march 2010 by howthebodyworks
PiCloud | Cloud Computing. Simplified.
"import cloud; cloud.call(my_function, arguments)" - now your functions has been serialised and executed in a cloud cluster.
python  academic  science  via:simonw  scalability  cloud 
february 2010 by howthebodyworks
Pressflow in Launchpad
a performance-oriented, DVCS API-compatible drupal fork
bzr  drupal  performance  scalability  cpod 
december 2009 by howthebodyworks
The C10K problem
event IO and system forks makes you potent
server  scalability  unix  http 
december 2009 by howthebodyworks
Social Innovation Conversations | Stanford Discussions | Premal Shah (Free Podcast)
Fascinating expose on how kiva crowdsources volunteer management to be the sleekest NGO ever
Crowdsourcing  NGO  scalability 
november 2009 by howthebodyworks
Introducing Resque - GitHub
redis-backed queue for reasonably heavyweight jobs with a simple ruby interface.
ruby  scalability  queue  redis 
november 2009 by howthebodyworks
bwhitman @ variogr.am
awesome music geekery by the guy behind echonest
music  scalability  portable  dsp 
october 2009 by howthebodyworks
GraphPackage - Hama Wiki
A pregel-like graph computation framework for hadoop. apparently working
scalability  networks  hadoop  academic  agents  phd 
october 2009 by howthebodyworks
I like Unicorn because it's Unix
"There’s another problem with Unix programming in Ruby that I’ll just touch on briefly: Java people and Windows people. They’re going to tell you that fork(2) is bad because they don’t have it on their platform, or it sucks on their platform, or whatever, but it’s cool, you know, because they have native threads, and threads are like, way better anyways.

Fuck that.

Don’t ever let anyone tell you that fork(2) is bad. Thirty years from now, there will still be a fork(2) and a pipe(2) and a exec(2) and smart people will still be using them to solve hard problems reliably and predictably, just like they were thirty years ago."
ruby  unix  coding  compsci  scalability  http  geek  server 
october 2009 by howthebodyworks
Diesel: How Python Does Comet
generator based pyton keep-the-connection-alive thingy. rumoured to have poor test coverage, though.
python  http  scalability  concurrency  coroutines  evented 
october 2009 by howthebodyworks
Tornado Web Server
non-blocking masively parallel python webserver, as seen on friendfeed
friendfeed  facebook  opensource  python  scalability  performance  realtime 
september 2009 by howthebodyworks
TileCache, from MetaCarta Labs
thingy that caches map tiles in a map-tile aware way making it, well, fast. IIRC this is used by Open Street Maps.
scalability  performance  mapping  python  opensource 
september 2009 by howthebodyworks
How FriendFeed uses MySQL to store schema-less data - Bret Taylor's blog
fascintating hybrid of schemaless and schema'd high performance datastore at friendfeed. - stashing indexed free(ish)-form serialised python objects in the fields.
nosql  mysql  db  python  scalability 
september 2009 by howthebodyworks
Supervisor
an alternative process manager that doesn't require you to write daemonising startup scripts, and manages your unforked processes itself.
python  cli  server  opensource  scalability  phm  unix 
august 2009 by howthebodyworks
Defying Classification: ETags And Modification Times In Django
how to make them etags behave nice-like. an save CPU time to boot.
scalability  django  phm 
august 2009 by howthebodyworks
The Rule of Least Power
Even the w3c endorses this idea: that you should make your data as easy to understand as possible.
netcultures  parsimony  standards  complexity  scalability  philosophy  evolution 
august 2009 by howthebodyworks
Django in the Real World
Jacob does the whole deal - scalability, deployment, replication etc
django  deployment  python  scalability  db  opensource  via:cogat 
july 2009 by howthebodyworks
Welcome to django-denorm’s documentation! — django-denorm v0.1 documentation
"django-denorm provides a declarative way of denormalizing models in Django based applications while maintaining data consistency." - another solution to db scalability problems - use JOINS for canonical refs, but load denormalised copies. also allows some nice queries not natively supported by the django ORM, which this framework makes transparent.
django  scalability  db  orm 
july 2009 by howthebodyworks
FrontPage - Cassandra Wiki
Cassandra is the new hotness. V. distributed, non-relational data store with flexible indexing and data models. open-sourced by facebook. Overkill for everything I do, (i will never have that many db writes that i want to live with its warts) but nice to know it's there.
db  scalability  performance  facebook  opensource  cassandra  nosql 
july 2009 by howthebodyworks
Cassandra Project
facebook's own google bigtable workalike. Except it's better and stuff. and crashy.
facebook  opensource  db  performance  scalability  distributed  Cassandra  nosql 
june 2009 by howthebodyworks
Disco
python mapreduce framework, for all those massive corups analysis tasks. written in erlang, btw.
opensource  framework  distributed  mapreduce  scalability 
june 2009 by howthebodyworks
Vowpal Wabbit (Fast Online Learning)
There are two ways to have a fast learning algorithm: (a) start with a slow algorithm and speed it up, or (b) build an intrinsically fast learning algorithm. This project is about approach (b), and it's reached a state where it may be useful to others as a platform for research and experimentation.

There are two algorithms, one implementing specialist gradient descent (GD) on squared loss and the other implementing specialist exponentiated gradient descent (SEG) on squared loss.
learning  ai  statistics  compsci  scalability 
june 2009 by howthebodyworks
E in a Walnut
crazy - a distributed, high security language for computation over crappy, slow, compromised or untrusted networks.
security  capabilities  e  p2p  distributed  concurrency  compsci  scalability 
june 2009 by howthebodyworks
Official Google Research Blog: Large-scale graph computing at Google
google making moves to open their large scale graph analysis infrastructure. Or at least discuss the implementation.
graph  scalability  networks 
june 2009 by howthebodyworks
Home - MongoDB - 10gen Confluence
"MongoDB is a high-performance, open source, schema-free document-oriented database. MongoDB is written in C++ and offers the following features:

Collection oriented storage - easy storage of object-style data
Dynamic queries
Full index support, including on inner objects
Query profiling
Replication and fail-over support
Efficient storage of binary data including large objects (e.g. videos)
Auto-sharding for cloud-level scalability (Q209)
A key goal of MongoDB is to bridge the gap between key/value stores (which are fast and highly scalable) and traditional RDBMS systems (which are deep in functionality)."

an indexed store of JSON objects that get deserialised into whatever client language you connect using. handy. looks faster and easier than couch
json  db  python  ruby  scalability  opensource  c++  mongodb  nosql 
june 2009 by howthebodyworks
Caktus Blog » Blog Archive » Testing Django Views for Concurrency Issues
wunderschön! a decorator that allows you to write unit tests for django to detect threading collisions under load
python  django  scalability  thread  testing 
may 2009 by howthebodyworks
Rabbits and warrens. - Jason’s .plan
AMQP ("queueing") HOWTO for python, based around RabbitMQ, the erlang AQMP flagship.
opensource  python  scalability  programming  erlang  phm  amqp  howto 
april 2009 by howthebodyworks
Amazon Elastic MapReduce
how does simon willison get across al these handy things so rapidly? Anyway, amazon will deploy your mapreduce problem into hadoop for you, if you don't have the spare brainpower to both write your algorithm and support the infrastructure. One-off incredibly massive data munging job? NP.
scalability  hadoop  mapreduce  s3  ec2  amazon 
april 2009 by howthebodyworks
[berkman] David Post on scaling governance
consdiering scaling problems of democracy. Network governance and all that.
scalability  democracy  policy 
march 2009 by howthebodyworks
redis - Google Code
a kind of hybrid between memcached and a db, offering some datastore intelligence and persistence, but speed and sloppiness too.
opensource  python  performance  scalability  scaling  db  cache  c 
march 2009 by howthebodyworks
jessenoller.com - Python Threads and the Global Interpreter Lock
more on the python-thread shennanigans. Better than most, but like many, seems to ignore GUI programming. Did webdev eat *everything* else?
python  thread  scalability  coding 
february 2009 by howthebodyworks
Michael Nielsen » Using your laptop to compute PageRank for millions of webpages
turns out the naïve pythonic way of computing pagerank is not far from the sophisticated matrix approach. go forth and analyse network weights on your laptop!
python  google  search  phm  seo  performance  scalability  networks 
december 2008 by howthebodyworks
Eventlet - Second Life Wiki
eke some semblance of asynchronous IO out of python using coroutine magic. this i the very light event-y python networking thing that is built upon greenlet, if you were askin', as opposed to the crazy world of twisted or what-have-you.
opensource  Python  http  network  scalability  coroutines  greenlet 
november 2008 by howthebodyworks
happy - Google Code
write your bulk processing jobs in python (well, jython) and then run them on your cloud via mapreduce. Perfect for those bulk content analysis jobs. or for running freebase.com, which is apparently what they do with it.
java  jython  python  search  mapreduce  hadoop  nlp  opensource  scalability  howto  freebase 
september 2008 by howthebodyworks
YouTube - DjangoCon 2008 Keynote: Cal Henderson
an hour log video from that flickr bloke about what django needs to scale. included bonus kittens.
scalability  performance  django  via:cogat 
september 2008 by howthebodyworks
An Introduction to Using CouchDB with Django @ Irrational Exuberance
Wow - couchdb seems to work with django, really easily. Perhaps I should have read this before I rolled my own custom versioning backend for my blog. I didn't, though, and I'm never getting that weekend back.
scalability  django  orm  howto  possumpalace  scm 
september 2008 by howthebodyworks
IterParseFilter
neat hacks for handling bulk XML in smart ways using python, elementtree and/or stackless
Python  xml  xpath  scalability  phm 
september 2008 by howthebodyworks
« earlier      

related tags

@todo  academic  agents  ai  algorithm  amazon  amqp  analytics  apache  backup  browser  bzr  c  c++  cache  capabilities  cassandra  cellularautomata  cli  client  clojure  cloud  coding  commercial  comparison  complexity  compression  compsci  concurrency  coroutines  couchdb  cpod  Crowdsourcing  cuda  datamining  db  debug  democracy  deployment  distributed  django  drupal  dsp  e  ec2  ecology  economics  econophysics  erlang  event  evented  evolution  facebook  filetype:pdf  framework  freebase  friendfeed  futures  gae  game_theory  geek  genetic  gis  github  google  gpu  grammarthing  graph  greenlet  hadoop  hardware  hosting  howto  http  ide  java  javascript  json  Jvm  jython  last.fm  learning  mapping  mapreduce  matlap  media:document  memcache  metaphor  methodology  mind  money  mongodb  music  mysql  netcultures  network  networks  nginx  ngo  nlp  nosql  numerical_methods  oop  opencl  opensource  orm  p2p  parsimony  perforamnce  performance  phd  philosophy  phm  php  png  policy  portable  possumpalace  postgresql  profiling  programming  pypy  python  quantum  queue  realtime  reddit  redis  rest  rsync  ruby  s3  saas  scala  scalability  scaling  science  scm  search  security  seo  server  shell  simpledb  simulation  sql  standards  statistics  testing  thread  tokyocabinet  twitter  ubuntu  ui  unix  via:cogat  via:datakid  via:phmdms  via:simonw  webdev  wordpress  xml  xpath  yahoo 

Copy this bookmark:



description:


tags: