amy + mapreduce   57

Home - GitHub
Dumbo is a project that allows you to easily write and run Hadoop programs in Python (it’s named after Disney’s flying circus elephant, since the logo of Hadoop is an elephant and Python was named after the BBC series “Monty Python’s Flying Circus”). More generally, Dumbo can be considered to be a convenient Python API for writing MapReduce programs.
python  mapreduce  hadoop 
february 2011 by amy
Cloud9: A MapReduce Library for Hadoop » Getting started with EC2
"This tutorial will get you started with Cloud9 on Amazon's EC2 (running the simple word count demo). For a gentler introduction to Hadoop, or if you don't feel like experimenting with EC2, try my tutorial on getting started in standalone mode. This tutorial assumes you've already downloaded the libraries and gotten it set up."
hadoop  ec2  mapreduce  s3  aws  tutorials 
february 2011 by amy
Cloud9: A MapReduce Library for Hadoop
"Cloud9 is a MapReduce library for Hadoop designed to serve as both a teaching tool and to support research in data-intensive text processing. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. Hadoop provides an open-source implementation of the programming model. The library itself is available on github and distributed under the Apache License.

For additional details on MapReduce algorithm design, Data-Intensive Text Processing with MapReduce by Lin and Dyer is a good resource. This library also serves as a repository of many examples discussed in the book."
mapreduce  hadoop  aws  s3  ec2  pig  tutorials 
february 2011 by amy
rubenfonseca's map_crowd_reduce at master - GitHub
Massively Distributed Browser-based Javascript Map Reduce Framework. node.js + socket.io (websockets) + webworkers + fun = global warming
javascript  mapreduce 
november 2010 by amy
Map Crowd Reduce - There's no place like ::1
My latest open project is a “SETI-at-home-like infrastructure for massively distributed CPU-intensive jobs based on HTML5 WebWorkers and node.js for distributing tasks”
browser  javascript  mapreduce 
november 2010 by amy
Distributed Systems - Google Code University - Google Code
One of the most important recent developments in computing is the growth in distributed and parallel applications.
Tutorials
Contributed course content
Hadoop tools and resources
Video lectures
distributed  hadoop  mapreduce  tutorials  development  programming 
may 2010 by amy
Hadoop, Pig, and Twitter (NoSQL East 2009)
RT @sachinrekhi A look at how Twitter uses Hadoop and Pig for its data analysis needs http://bit.ly/4j0u7x
twitter_fav  @atul  mapreduce  hadoop  presentations  twitter 
november 2009 by amy
Crossbow: Whole Genome Resequencing Analysis in the Clouds
open source Crossbow: "a scalable software pipeline for whole genome resequencing analysis" - http://bit.ly/1p4WAL uses Hadoop #dna #hadoop
twitter_fav  @glynmoody  hadoop  mapreduce  genomics  aws  ec2 
october 2009 by amy
Welcome to Hive!
Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language.
hadoop  mapreduce  apache  query_language  open_source  analysis  from delicious
september 2009 by amy
Able Grape, the wine information search engine
Search 21 million pages of trustworthy wine information. - uses hadoop
wine  search  hadoop  mapreduce  from delicious
september 2009 by amy

Copy this bookmark:



description:


tags: