Home - GitHub
february 2011 by amy
Dumbo is a project that allows you to easily write and run Hadoop programs in Python (it’s named after Disney’s flying circus elephant, since the logo of Hadoop is an elephant and Python was named after the BBC series “Monty Python’s Flying Circus”). More generally, Dumbo can be considered to be a convenient Python API for writing MapReduce programs.
python
mapreduce
hadoop
february 2011 by amy
Cloud9: A MapReduce Library for Hadoop » Getting started with EC2
february 2011 by amy
"This tutorial will get you started with Cloud9 on Amazon's EC2 (running the simple word count demo). For a gentler introduction to Hadoop, or if you don't feel like experimenting with EC2, try my tutorial on getting started in standalone mode. This tutorial assumes you've already downloaded the libraries and gotten it set up."
hadoop
ec2
mapreduce
s3
aws
tutorials
february 2011 by amy
Cloud9: A MapReduce Library for Hadoop
february 2011 by amy
"Cloud9 is a MapReduce library for Hadoop designed to serve as both a teaching tool and to support research in data-intensive text processing. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. Hadoop provides an open-source implementation of the programming model. The library itself is available on github and distributed under the Apache License.
For additional details on MapReduce algorithm design, Data-Intensive Text Processing with MapReduce by Lin and Dyer is a good resource. This library also serves as a repository of many examples discussed in the book."
mapreduce
hadoop
aws
s3
ec2
pig
tutorials
For additional details on MapReduce algorithm design, Data-Intensive Text Processing with MapReduce by Lin and Dyer is a good resource. This library also serves as a repository of many examples discussed in the book."
february 2011 by amy
Download Hadoop | Hadoop Download | Downloading Hadoop | Cloudera
january 2011 by amy
Cloudera’s Distribution for Apache Hadoop (CDH)
hadoop
mapreduce
aws
pig
january 2011 by amy
rubenfonseca's map_crowd_reduce at master - GitHub
november 2010 by amy
Massively Distributed Browser-based Javascript Map Reduce Framework. node.js + socket.io (websockets) + webworkers + fun = global warming
javascript
mapreduce
november 2010 by amy
Map Crowd Reduce - There's no place like ::1
november 2010 by amy
My latest open project is a “SETI-at-home-like infrastructure for massively distributed CPU-intensive jobs based on HTML5 WebWorkers and node.js for distributing tasks”
browser
javascript
mapreduce
november 2010 by amy
Apache Mahout:: Scalable machine-learning and data-mining library
september 2010 by amy
"Apache Mahout's goal is to build scalable machine learning libraries. "
apache
hadoop
mapreduce
machine_learning
datamining
analytics
september 2010 by amy
Distributed Systems - Google Code University - Google Code
may 2010 by amy
One of the most important recent developments in computing is the growth in distributed and parallel applications.
Tutorials
Contributed course content
Hadoop tools and resources
Video lectures
distributed
hadoop
mapreduce
tutorials
development
programming
Tutorials
Contributed course content
Hadoop tools and resources
Video lectures
may 2010 by amy
Big Data Workshop
april 2010 by amy
Big Data Workshop (Mtn View, CA: Apr. 23): (HT @marshallk) #nosql #mongodb #hadoop #erlang #mapreduce
#erlang
#hadoop
#mongodb
#mapreduce
#nosql
erlang
hadoop
mongodb
mapreduce
nosql
from twitter_favs
april 2010 by amy
Spanner: Google’s next Massive Storage and Computation infrastructure | Scalable web architectures
march 2010 by amy
Spanner: Google’s next Massive Storage and Computation infrastructure #mapreduce #bigtable #pregel #spanner
#mapreduce
#spanner
#bigtable
#pregel
mapreduce
spanner
bigtable
pregel
from twitter_favs
march 2010 by amy
Hadoop, Pig, and Twitter (NoSQL East 2009)
november 2009 by amy
RT @sachinrekhi A look at how Twitter uses Hadoop and Pig for its data analysis needs http://bit.ly/4j0u7x
twitter_fav
@atul
mapreduce
hadoop
presentations
twitter
november 2009 by amy
Crossbow: Whole Genome Resequencing Analysis in the Clouds
october 2009 by amy
open source Crossbow: "a scalable software pipeline for whole genome resequencing analysis" - http://bit.ly/1p4WAL uses Hadoop #dna #hadoop
twitter_fav
@glynmoody
hadoop
mapreduce
genomics
aws
ec2
october 2009 by amy
Welcome to Hive!
september 2009 by amy
Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language.
hadoop
mapreduce
apache
query_language
open_source
analysis
from delicious
september 2009 by amy
Able Grape, the wine information search engine
september 2009 by amy
Search 21 million pages of trustworthy wine information. - uses hadoop
wine
search
hadoop
mapreduce
from delicious
september 2009 by amy
related tags
#bigtable ⊕ #erlang ⊕ #hadoop ⊕ #mapreduce ⊕ #mongodb ⊕ #nosql ⊕ #pregel ⊕ #spanner ⊕ @atul ⊕ @glynmoody ⊕ academia ⊕ algorithms ⊕ amazon ⊕ analysis ⊕ analytics ⊕ apache ⊕ aws ⊕ bigtable ⊕ books ⊕ browser ⊕ cloud_computing ⊕ datamining ⊕ development ⊕ distributed ⊕ ec2 ⊕ eclipse ⊕ elastic_mapreduce ⊕ erlang ⊕ examples ⊕ gae ⊕ genetics ⊕ genomics ⊕ google ⊕ hadoop ⊕ hive ⊕ java ⊕ javascript ⊕ machine_learning ⊕ mapreduce ⊖ md ⊕ mongodb ⊕ nosql ⊕ open_source ⊕ pagerank ⊕ pig ⊕ plugins ⊕ pregel ⊕ presentations ⊕ programming ⊕ python ⊕ query_language ⊕ reference ⊕ ruby ⊕ s3 ⊕ search ⊕ spanner ⊕ sqs ⊕ statistics ⊕ trends ⊕ tutorials ⊕ twitter ⊕ twitter_fav ⊕ videos ⊕ wine ⊕ wukong ⊕Copy this bookmark: