Using Hadoop to analyze the full Wikipedia dump files using WikiHadoop | Mappian
august 2011 by amy
Excited to announce #WikiHadoop so you can use #Hadoop to analyze the full #Wikipedia dump #wikimedia big thx 2 fellows
WikiHadoop
wikimedia
Wikipedia
Hadoop
from twitter_favs
august 2011 by amy
Home - GitHub
february 2011 by amy
Dumbo is a project that allows you to easily write and run Hadoop programs in Python (it’s named after Disney’s flying circus elephant, since the logo of Hadoop is an elephant and Python was named after the BBC series “Monty Python’s Flying Circus”). More generally, Dumbo can be considered to be a convenient Python API for writing MapReduce programs.
python
mapreduce
hadoop
february 2011 by amy
Cloud9: A MapReduce Library for Hadoop » Getting started with EC2
february 2011 by amy
"This tutorial will get you started with Cloud9 on Amazon's EC2 (running the simple word count demo). For a gentler introduction to Hadoop, or if you don't feel like experimenting with EC2, try my tutorial on getting started in standalone mode. This tutorial assumes you've already downloaded the libraries and gotten it set up."
hadoop
ec2
mapreduce
s3
aws
tutorials
february 2011 by amy
Cloud9: A MapReduce Library for Hadoop
february 2011 by amy
"Cloud9 is a MapReduce library for Hadoop designed to serve as both a teaching tool and to support research in data-intensive text processing. MapReduce is a programming model for expressing distributed computations on massive datasets and an execution framework for large-scale data processing on clusters of commodity servers. Hadoop provides an open-source implementation of the programming model. The library itself is available on github and distributed under the Apache License.
For additional details on MapReduce algorithm design, Data-Intensive Text Processing with MapReduce by Lin and Dyer is a good resource. This library also serves as a repository of many examples discussed in the book."
mapreduce
hadoop
aws
s3
ec2
pig
tutorials
For additional details on MapReduce algorithm design, Data-Intensive Text Processing with MapReduce by Lin and Dyer is a good resource. This library also serves as a repository of many examples discussed in the book."
february 2011 by amy
Crossbow: Whole Genome Resequencing Analysis in the Clouds
february 2011 by amy
Genotyping from short reads using cloud computing
hadoop
genetics
analysis
analytics
datamining
february 2011 by amy
riccomini - hadoop pig documentation
january 2011 by amy
RT @peteskomoroch: Handy #SQL to #Hadoop #Pig syntax conversion cheat sheet by @criccomini #strataconf #nosql
#Hadoop
#Pig
#nosql
#SQL
#strataconf
Hadoop
Pig
nosql
SQL
strataconf
from twitter_favs
january 2011 by amy
About Nutch
january 2011 by amy
Nutch is open source web-search software. It builds on Lucene and Solr, adding web-specifics, such as a crawler, a link-graph database, parsers for HTML and other document formats, etc.
Nutch can run on a single machine, but gains a lot of its strength from running in a Hadoop cluster
search
solr
lucene
hadoop
open_source
Nutch can run on a single machine, but gains a lot of its strength from running in a Hadoop cluster
january 2011 by amy
Download Hadoop | Hadoop Download | Downloading Hadoop | Cloudera
january 2011 by amy
Cloudera’s Distribution for Apache Hadoop (CDH)
hadoop
mapreduce
aws
pig
january 2011 by amy
Connect Pro Meeting Login
january 2011 by amy
RT @KathrynB @lusciouspear on Scalability, #NoSQL & #Hadoop. Free #strataconf webcast 10am PT today. Join:
#Hadoop
#NoSQL
#strataconf
Hadoop
NoSQL
strataconf
from twitter
january 2011 by amy
Genome Biology | Abstract | Quake: quality-aware detection and correction of sequencing errors
november 2010 by amy
RT @mza: Very cool: RT @mike_schatz: Quake: quality-aware correction of sequencing errors #hadoop #genomics
#genomics
#hadoop
genomics
hadoop
from twitter_favs
november 2010 by amy
GoToWebinar : Webinars Made Easy. Award-Winning Web Casting & Online Seminar Hosting Software
november 2010 by amy
Hmm. Cloudera and ebay doing a joint webinar next Weds about ebay's use of #hadoop. via @mikeloukides
#hadoop
hadoop
from twitter
november 2010 by amy
Apache Mahout:: Scalable machine-learning and data-mining library
september 2010 by amy
"Apache Mahout's goal is to build scalable machine learning libraries. "
apache
hadoop
mapreduce
machine_learning
datamining
analytics
september 2010 by amy
How GE uses Hadoop to analyze big data | Software, Interrupted - CNET News
september 2010 by amy
How GE uses Hadoop to analyze big data - CNET News: #hadoop
#hadoop
hadoop
from twitter_favs
september 2010 by amy
Distributed Systems - Google Code University - Google Code
may 2010 by amy
One of the most important recent developments in computing is the growth in distributed and parallel applications.
Tutorials
Contributed course content
Hadoop tools and resources
Video lectures
distributed
hadoop
mapreduce
tutorials
development
programming
Tutorials
Contributed course content
Hadoop tools and resources
Video lectures
may 2010 by amy
Big Data Workshop
april 2010 by amy
Big Data Workshop (Mtn View, CA: Apr. 23): (HT @marshallk) #nosql #mongodb #hadoop #erlang #mapreduce
#erlang
#hadoop
#mongodb
#mapreduce
#nosql
erlang
hadoop
mongodb
mapreduce
nosql
from twitter_favs
april 2010 by amy
Hadoop, Pig, and Twitter (NoSQL East 2009)
november 2009 by amy
RT @sachinrekhi A look at how Twitter uses Hadoop and Pig for its data analysis needs http://bit.ly/4j0u7x
twitter_fav
@atul
mapreduce
hadoop
presentations
twitter
november 2009 by amy
Crossbow: Whole Genome Resequencing Analysis in the Clouds
october 2009 by amy
open source Crossbow: "a scalable software pipeline for whole genome resequencing analysis" - http://bit.ly/1p4WAL uses Hadoop #dna #hadoop
twitter_fav
@glynmoody
hadoop
mapreduce
genomics
aws
ec2
october 2009 by amy
Welcome to Hive!
september 2009 by amy
Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language.
hadoop
mapreduce
apache
query_language
open_source
analysis
from delicious
september 2009 by amy
Able Grape, the wine information search engine
september 2009 by amy
Search 21 million pages of trustworthy wine information. - uses hadoop
wine
search
hadoop
mapreduce
from delicious
september 2009 by amy
related tags
#aws ⊕ #erlang ⊕ #genomics ⊕ #hadoop ⊕ #mapreduce ⊕ #mongodb ⊕ #nosql ⊕ #Pig ⊕ #SQL ⊕ #strataconf ⊕ @atul ⊕ @glynmoody ⊕ academia ⊕ algorithms ⊕ amazon ⊕ analysis ⊕ analytics ⊕ apache ⊕ aws ⊕ book ⊕ books ⊕ cassandra ⊕ cloud_computing ⊕ computing ⊕ cool ⊕ datamining ⊕ development ⊕ distributed ⊕ draft ⊕ ec2 ⊕ eclipse ⊕ erlang ⊕ examples ⊕ genetics ⊕ genomics ⊕ hadoop ⊖ hive ⊕ lucene ⊕ machine_learning ⊕ mapreduce ⊕ mongodb ⊕ nosql ⊕ open_source ⊕ pagerank ⊕ pig ⊕ plugins ⊕ presentations ⊕ programming ⊕ python ⊕ query_language ⊕ reference ⊕ ruby ⊕ s3 ⊕ search ⊕ solr ⊕ SQL ⊕ sqs ⊕ strataconf ⊕ trends ⊕ tutorials ⊕ twitter ⊕ twitter_fav ⊕ videos ⊕ WikiHadoop ⊕ wikimedia ⊕ Wikipedia ⊕ wine ⊕ wukong ⊕Copy this bookmark: