MapReduce Patterns, Algorithms, and Use Cases « Highly Scalable
february 2012 by mcroydon
Good stuff well explained.
algorithms
hadoop
mapreduce
java
patterns
february 2012 by mcroydon
Basho | Riak and Hadoop (Sitting in a tree)
november 2011 by mcroydon
Proof of concept implementation
riak
hadoop
november 2011 by mcroydon
Hortonworks | Architecting the future of big data
september 2011 by mcroydon
Lots of good content coming from this Yahoo! spinoff.
yahoo
hadoop
september 2011 by mcroydon
AccumuloProposal - Incubator Wiki
september 2011 by mcroydon
NoSQL with some properties similar to HBase with some interesting per-cel ACL. Born at the NSA.
apache
hadoop
nosql
nsa
september 2011 by mcroydon
NextGen MapReduce Hits Apache Hadoop Mainline | Hortonworks
august 2011 by mcroydon
Favorite bulletpoint: "NextGen MapReduce has nearly 100,000 lines of code (roughly – just the *.java files). That’s nearly 1/3 of current Apache Hadoop codebase we’ve added in the last 12 months!" All SLOC jokes aside, it sounds like an awesome development.
hadoop
java
sloc
august 2011 by mcroydon
Twitter to open source Hadoop-like tool — Cloud Computing News
august 2011 by mcroydon
More open source bits from twitter/backtype.
developers
opensource
hadoop
scaling
twitter
august 2011 by mcroydon
Apache Hadoop Goes Realtime at Facebook
july 2011 by mcroydon
Elephants gone wild.
hadoop
hbase
mapreduce
programming
july 2011 by mcroydon
HPCC Systems | Open-source. Fast. Scalable. Simple.
june 2011 by mcroydon
"...a massive parallel-processing computing platform that solves Big Data problems." From Lexis-Nexis.
bigdata
hadoop
opensource
tools
june 2011 by mcroydon
Pigs, Bees, and Elephants: A Comparison of Eight MapReduce Languages « Dataspora
april 2011 by mcroydon
A nice overview of toolkits.
hadoop
hive
java
mapreduce
pig
april 2011 by mcroydon
The dark side of Hadoop - BackType Technology
april 2011 by mcroydon
These are the kinds of things that you don't find out until you've been knee deep in something for awhile.
hadoop
apache
java
mapreduce
map-reduce
april 2011 by mcroydon
Brisk – Apache Hadoop™ powered by Cassandra | DataStax
march 2011 by mcroydon
HDFS-like storage layer for Hadoop/Hive using Cassandra.
hadoop
cassandra
hive
march 2011 by mcroydon
NYC Tech Talks Presentation (Hadoop, Node.js)
february 2011 by mcroydon
An interesting node + hadoop architecture overview.
node
hadoop
node.js
architecture
slides
slide
february 2011 by mcroydon
Factual Dev Blog » Blog Archive » Gratuitous Hadoop: Stress Testing on the Cheap with Hadoop Streaming and EC2
january 2011 by mcroydon
Stress testing using Hadoop and EC2.
amazon
architecture
hadoop
ec2
testing
january 2011 by mcroydon
Brandyn White - Computer Vision, Hadoop, Mobile Computing, Kinect, and Big Data
january 2011 by mcroydon
A Cython-based MapReduce library.
python
cython
hadoop
mapreduce
january 2011 by mcroydon
High Scalability - High Scalability - Facebook's New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month
architecture article bigdata cassandra database mail infrastructure hbase hadoop facebook messages messaging news nosql performance samples storage servers scalability
november 2010 by mcroydon
architecture article bigdata cassandra database mail infrastructure hbase hadoop facebook messages messaging news nosql performance samples storage servers scalability
november 2010 by mcroydon
HBase and Hadoop at Facebook | Jeremiah Peschka
november 2010 by mcroydon
It's awesome to see HBase get some love.
analysis
article
cap
cassandra
comparison
database
db
programming
nosql
imported
hdfs
hbase
hadoop
facebook
replication
scalability
scale
scaling
november 2010 by mcroydon
OpenTSDB - A Distributed, Scalable Monitoring System
november 2010 by mcroydon
OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.
analysis
architecture
bigdata
cloud
data
database
db
java
lgpl
hbase
hadoop
development
graph
distributed
monitoring
nosql
opensource
operations
scalability
scale
time
sysadmin
software
storage
series
opentsdb
rrd
stumbleupon
time-series
timeseries
november 2010 by mcroydon
Yelp Engineering Blog: mrjob: Distributed Computing for Everybody
november 2010 by mcroydon
A really nice wrapper around EMR.
algorithms
amazon
aws
cloudcomputing
computing
data
hadoop
framework
distributed
dist
development
datamining
library
map-reduce
map
mapreduce
nosql
opensource
yelp
webservices
search
reduce
python
programming
aa
elasticmapreduce
logs
emr
mrjob
november 2010 by mcroydon
s4: distributed stream computing platform
november 2010 by mcroydon
"S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data."
apache
bigdata
cloud
cloudcomputing
cluster
computing
mapreduce
map
java
hadoop
framework
distributed
data
opensource
processing
platform
programming
real-time
streaming
stream
software
scalability
reduce
realtime
streamprocessing
yahoo
tool
s4
streams
november 2010 by mcroydon
SHARD: Storing and Querying Large-Scale SemWeb Data
november 2010 by mcroydon
An excellent slide deck presenting SHARD at HadoopWorld.
rdf
triplestore
hadoop
hdfs
mapreduce
lubm
november 2010 by mcroydon
SHARD Triple-Store
october 2010 by mcroydon
"SHARD is a proof-of-concept use of high-performance, low-cost distributed computing technology to develop a highly scalable triple-store. SHARD is released as an open-source project on the BSD license."
database
db
cloud
distributed
hadoop
lubm
mapreduce
rdf
store
sparql
storage
shard
semweb
semanticweb
scalability
triple-store
october 2010 by mcroydon
Google search index splits with MapReduce • The Register
september 2010 by mcroydon
Teach the world to Zig then Zag.
algorithms
architecture
bigdata
article
bigtable
gfs
distributed
database
computing
computers
caffeine
google
grid
hadoop
index
indexing
mapreduce
technology
search
scalability
research
programming
colossus
gfs2
news
september 2010 by mcroydon
appengine-mapreduce - Project Hosting on Google Code
july 2010 by mcroydon
Pretty slick that it's built atop vanilla appengine.
appengine
cloud
cloud-computing
google
gae
hadoop
saas
python
optimization
nosql
mapreduce
scalability
webdev
develop
july 2010 by mcroydon
Building a distributed concurrent queue with Apache ZooKeeper « Cloudera » Apache Hadoop for the Enterprise
july 2010 by mcroydon
Python bindings and example code.
hadoop
zookeeper
july 2010 by mcroydon
Enabling Hadoop Batch Processing Systems to Consume Streaming Data (Hadoop and Distributed Computing at Yahoo!)
june 2010 by mcroydon
The intersection of batch processing and the real-time web.
collection
hadoop
log
metadata
processing
yahoo
stream
june 2010 by mcroydon
NoSQL at Twitter (NoSQL EU 2010)
april 2010 by mcroydon
A pretty thorough look behind the curtain at Twitter.
analytics
architecture
cassandra
cloud
databases
database
db
grid
hbase
nosql
hadoop
presentation
pig
programming
read
twitter
slideshare
slides
scribe
scaling
scalability
flockdb
yam
april 2010 by mcroydon
How Raytheon Researchers are Using Hadoop to Build a Scalable, Distributed Triple Store « Cloudera » Apache Hadoop for the Enterprise
march 2010 by mcroydon
Triplestores and Hadoop. Together.
hadoop
triplestore
article
articles
cloud
cloudcomputing
cloudera
database
hdfs
graphs
graph
distributed
development
mapreduce
nosql
programming
repository
rdf
scalability
toread
sparql
semanticweb
shard
semantic_web
semantic
triple
web
march 2010 by mcroydon
Lucid Imagination » Integrating Apache Mahout with Apache Lucene and Solr – Part I (of 3)
march 2010 by mcroydon
It looks like it's pretty darn easy to hook up Solr and Mahout. I can't wait to read more.
analysis
apache
learning
hadoop
lucene
machinelearning
solr
searchengine
opensource
mapreduce
ml
nlp
mahout
textmining
toread
march 2010 by mcroydon
Lineland
march 2010 by mcroydon
Scroll through for lots and lots of HBase internals.
blog
distributed
hadoop
hbase
nosql
mapreduce
programming
systems
storage
reference
march 2010 by mcroydon
Why Europe’s Largest Ad Targeting Platform Uses Hadoop « Cloudera » Apache Hadoop for the Enterprise
march 2010 by mcroydon
Moving from Postgres to HDFS + Pig and MapReduce for large data storage, analysis, and aggregation.
clojure
data
cloud
database
development
hadoop
mapreduce
web
nosql
march 2010 by mcroydon
HBase vs. Cassandra: NoSQL Battle! | Road to Failure
march 2010 by mcroydon
Someone who chose HBase over Cassandra (and why).
amazon
architecture
article
bigdata
bigtable
blog
cap
computing
comparison
compare
clustering
cassandra
database
development
dht
distributed
foss
grid
hadoop
scalability
research
nosql
mysql
hbase
post
vs
march 2010 by mcroydon
HBase vs Cassandra: why we moved « Bits and Bytes.
march 2010 by mcroydon
A look at someone who started with HBase and moved to Cassandra (and why).
architecture
benchmark
article
blog
cap
cassandra
cloud
development
database
db
computer
foss
comparison
compare
distributed
hadoop
hbase
mysql
nosql
performance
toread
scalability
programming
consistency
march 2010 by mcroydon
Lineland: HBase Architecture 101 - Storage
march 2010 by mcroydon
A solid overview of HBase's storage architecture.
arch
architecture
article
bigtable
blog
data
database
important
hbase
hadoop
good
distributed
design
databases
info
kvstore
nosql
performance
reference
storage
toread
march 2010 by mcroydon
Data-Intensive Text Processing with MapReduce
february 2010 by mcroydon
I flipped through this during the conference and need to read through it more thoroughly.
programming
design
data
reference
book
free
books
geek
pdf
text
hadoop
distributed
online
algorithm
algorithms
mapreduce
datamining
to-read
ebooks
nlp
ir
textmining
distributedcomputing
text-mining
developers
draft
february 2010 by mcroydon
GIS on Hadoop - Nathan Kerr
december 2009 by mcroydon
WKT for the win.
data
gis
geo
hadoop
parallel
analytics
processing
census
spatial
via:pskomoroch
december 2009 by mcroydon
Lineland: HBase vs. BigTable Comparison
november 2009 by mcroydon
A nice feature rundown as compared to BigTable.
google
reference
database
toread
storage
hadoop
distributed
bigtable
comparison
mapreduce
nosql
hbase
compare
november 2009 by mcroydon
Hadoop World: NYC 2009 | Cloudera
november 2009 by mcroydon
All of the presentation decks in one place. Handy.
hadoop
presentation
yahoo
mapreduce
cloud
list
slides
conference
2009
presentations
event
cloudera
foss
world
new
conferences
nyc
hadoopworld
november 2009 by mcroydon
Hw09 Counting And Clustering And Other Data Tricks
november 2009 by mcroydon
"Large scale computing is transformative for NYTimes.com."
hadoop
nytimes
data
analysis
november 2009 by mcroydon
Lineland: Hive vs. Pig
november 2009 by mcroydon
Different tools for different jobs, but it's hard choosing sometimes when you're in the Hadoop ecosystem.
database
hadoop
mapreduce
comparison
hive
pig
november 2009 by mcroydon
Building a Data Intensive Web Application with Cloudera, Hadoop, Hive, Pig, and EC2 | Cloudera
november 2009 by mcroydon
A nice look at end-to-end data analysis of big datasets using things like Pig and Hive.
programming
web
data
database
tools
webdev
business
toread
howto
tutorial
amazon
dev
scalability
hadoop
architecture
aws
computing
cluster
ec2
pig
trends
hive
cloudera
cloudcomputing
analytics
datamining
mapreduce
cloud
application
november 2009 by mcroydon
SourceForge.net: pydoop
november 2009 by mcroydon
Python C++ wrappers for HDFS and MapReduce. It's probably quicker than Dumbo.
python
code
library
hadoop
c++
analytics
project
examples
hdfs
via:pskomoroch
november 2009 by mcroydon
Hbase/Stargate - Hadoop Wiki
november 2009 by mcroydon
Alpha-quality RESTful interface for HBase. Includes plain text, JSON, XML, and ProtocolBuffer serializers.
rest
hadoop
hbase
xml
json
web-services
november 2009 by mcroydon
Hadoop, Pig, and Twitter (NoSQL East 2009)
november 2009 by mcroydon
An awesome deck showing of Pig and how Twitter uses it.
programming
data
database
howto
server
dev
search
statistics
hadoop
presentation
mapreduce
datamining
twitter
analytics
slides
presentations
nosql
bigdata
socialnetworks
slideshare
hdfs
pig
big
functional
data_mining
november 2009 by mcroydon
Avro: a Format for Big Data » Cloudera Hadoop & Big Data Blog
november 2009 by mcroydon
Another data interchange format (I think) like ProtocolBuffers and Thrift. I think one of the bigger problems that the Hadoop/big data community has is parallel internal implementations of building blocks that are later open-sourced.
data
database
storage
distributed
hadoop
apache
cloud
json
messaging
encoding
protocol
portable
cloudera
bigdata
data-structures
serialization
format
foss
thrift
buffers
introduction
avro
november 2009 by mcroydon
Journal of Eivind Uggedal: NoSQL East 2009 - Summary of Day 1
november 2009 by mcroydon
Some interesting bits and more of the same but I really like the dark-launch approach that Scribe allows.
data
database
toread
blog
scalability
internet
distributed
article
hadoop
scaling
db
cloud
couchdb
conference
papers
keyvalue
nosql
links
cassandra
2009
mongodb
dynomite
riak
november 2009 by mcroydon
GoodDoop
october 2009 by mcroydon
A nice set of recipes for Hadoop that probably translate well to other Map/Reduce architectures.
wiki
algorithms
algorithm
hadoop
mapreduce
examples
recipe
october 2009 by mcroydon
Analyzing Human Genomes with Hadoop » Cloudera Hadoop & Big Data Blog
october 2009 by mcroydon
A fantastic writeup of absurdly fast sequencing software that can analyze a human genome in about 3 hours for less than $100 of AWS resources. Pretty darned impressive.
data
opensource
computer
amazon
algorithms
aws
hadoop
ec2
mapreduce
dna
bioinformatics
cloudera
trend
genetics
genome
foss
genomics
october 2009 by mcroydon
Training to Climb an Everest of Digital Data
october 2009 by mcroydon
Big data is big and almost always requires a completely different mindset than the one that is taught in computer science programs.
data
database
processing
google
news
toread
ibm
energy
datasets
mining
search
research
science
internet
algorithms
storage
scaling
education
hadoop
analysis
computer-science
datacuration
october 2009 by mcroydon
Cloudera Desktop | Cloudera
october 2009 by mcroydon
A GUI for scheduling jobs and checking in on your cluster.
programming
web
software
development
tools
distributed
ui
management
computing
hadoop
cloud
cluster
clustering
mapreduce
os
gui
tool
backup
desktop
admin
cloudera
monitor
operations
october 2009 by mcroydon
Tracking Trends with Hadoop and Hive on EC2 » Cloudera Hadoop & Big Data Blog
august 2009 by mcroydon
A detailed run through of data warehousing and creating trending results for wikipedia data.
python
data
database
howto
tutorial
wikipedia
rails
scaling
hadoop
aws
resources
ec2
cloud
mapreduce
datamining
sparklines
rubyonrails
trends
hive
cloudera
bigdata
log
t
trendingtopics
august 2009 by mcroydon
NAACL/HLT 2009 Tutorial: Data-Intensive Text Processing with MapReduce
august 2009 by mcroydon
"This half-day tutorial introduces participants to data-intensive text processing with the MapReduce programming model (Dean and Ghemawat, 2004), using the open-source Hadoop implementation."
tutorial
hadoop
graph
slides
mapreduce
nlp
machine_learning
textmining
via:pskomoroch
august 2009 by mcroydon
HadoopDB Project
july 2009 by mcroydon
Interesting approach, we'll see if it has legs.
programming
software
development
database
java
opensource
research
scalability
distributed
performance
scaling
hadoop
cluster
postgresql
databases
mysql
hadoopdb
map-reduce
hive
dbms
rdbms
2009
analytics
postgres
db
sql
mapreduce
yale
vldb
july 2009 by mcroydon
DBMS Musings: Announcing release of HadoopDB (longer version)
july 2009 by mcroydon
/me increments the "databases atop hadoop" counter (and takes a sip).
database
opensource
scalability
research
distributed
performance
hadoop
cluster
postgresql
mapreduce
project
datawarehouse
hadoopdb
distributedcomputing
july 2009 by mcroydon
up and running with cassandra
july 2009 by mcroydon
More detail than usual for an intro article.
development
data
database
code
toread
blog
tutorial
api
ruby
dev
java
scalability
distributed
rails
storage
hadoop
cluster
application
cloud
twitter
db
facebook
web
example
key-value
bigtable
keyvalue
cassandra
nosql
bigdata
july 2009 by mcroydon
Project Voldemort Blog : Building a terabyte-scale data cycle at LinkedIn with Hadoop and Project Voldemort
july 2009 by mcroydon
More on what makes Voldemort tick.
design
development
data
database
toread
erlang
java
scalability
storage
architecture
distributed
performance
scaling
hadoop
cluster
grid
cloud
mapreduce
db
caching
analytics
arch
key-value
dht
keyvalue
scale
voldemort
batch
linkedin
datastore
july 2009 by mcroydon
Should you go Beyond Relational Databases? | Think Vitamin
july 2009 by mcroydon
Includes links to a bunch of graph databases too.
programming
development
web
data
reference
database
toread
technology
opensource
dev
scalability
storage
work
hadoop
article
graph
databases
mysql
nosql
keyvalue
bigtable
rdbms
resource
reading
couchdb
comparison
db
mapreduce
relational
alternative
july 2009 by mcroydon
braindump: NOSQL debrief
july 2009 by mcroydon
Slidedump.
software
nosql
kvstore
mongodb
dynomite
video
database
scalability
storage
architecture
distributed
scaling
hadoop
presentation
mysql
databases
db
sql
slides
couchdb
conference
videos
bigtable
hbase
trends
keyvalue
key-value
cassandra
voldemort
hypertable
july 2009 by mcroydon
Coding Horror: Scaling Up vs. Scaling Out: Hidden Costs
june 2009 by mcroydon
Food for thought with the caveat that scaling out is a lot easier if you don't have any per-server software costs. Big iron costs less to operate though.
programming
hardware
business
server
scalability
coding
networking
architecture
performance
distributed
scaling
web-development
cluster
hadoop
hosting
clustering
comparison
servers
distribution
it
codinghorror
2009
stackoverflow
june 2009 by mcroydon
HBase Goes Realtime
june 2009 by mcroydon
"We improved our performance by more than an order of magnitude in most cases"
slides
pdf
hadoop
hbase
performance
june 2009 by mcroydon
Engineering @ Facebook's Notes | Facebook
june 2009 by mcroydon
Big big big data warehousing / data mining.
design
data
database
blog
java
map
scalability
storage
distributed
computing
scaling
article
hadoop
sql
mapreduce
db
reading
facebook
analytics
rdbms
arch
comment
hive
datawarehouse
warehouse
data-warehousing
hdfs
dw
june 2009 by mcroydon
Steve: Developing on the Edge - the Yahoo! Hadoop distro
june 2009 by mcroydon
I see Yahoo and Cloudera's distributions of Hadoop a lot like Ububtu vs. Debian where mainline hadooop is Debian stability and these distributions are the Ubuntu compromise for new features.
yahoo
cloudera
hadoop
map
reduce
map-reduce
june 2009 by mcroydon
Neo4j - a Graph Database that Kicks Buttox | High Scalability
june 2009 by mcroydon
The most common complaint about existing graph databases is performance. Hopefully a stable of good, performant graph databases will change that.
data
database
toread
visualization
java
opensource
network
scalability
cool
architecture
performance
graph
hadoop
databases
db
graphs
2009
arch
socialnetworking
socialmedia
dataviz
neo4j
graph_database
graph-database
relationship
june 2009 by mcroydon
related tags
@toread ⊕ aa ⊕ academia ⊕ ad ⊕ admin ⊕ ai ⊕ algorithm ⊕ algorithms ⊕ alternative ⊕ amazing ⊕ amazon ⊕ analysis ⊕ analytics ⊕ apache ⊕ api ⊕ appengine ⊕ application ⊕ apt ⊕ arch ⊕ architecture ⊕ article ⊕ articles ⊕ avro ⊕ aws ⊕ backtype ⊕ backup ⊕ bash ⊕ bashreduce ⊕ batch ⊕ benchmark ⊕ benchmarks ⊕ berkeley ⊕ big ⊕ bigdata ⊕ bigtable ⊕ bioinformatics ⊕ blog ⊕ bloom ⊕ bloom-filter ⊕ bloomfilter ⊕ book ⊕ books ⊕ buffers ⊕ business ⊕ c++ ⊕ cache ⊕ caching ⊕ cacti ⊕ caffeine ⊕ cap ⊕ cascading ⊕ cassandra ⊕ census ⊕ class ⊕ click ⊕ clojure ⊕ cloud ⊕ cloud-computing ⊕ cloudcomputing ⊕ cloudera ⊕ cloudkick ⊕ cluster ⊕ clustering ⊕ clusters ⊕ code ⊕ coding ⊕ codinghorror ⊕ collection ⊕ colossus ⊕ comment ⊕ community ⊕ compare ⊕ comparison ⊕ compress ⊕ compression ⊕ compsci ⊕ computer ⊕ computer-science ⊕ computers ⊕ computerscience ⊕ computing ⊕ concurrency ⊕ conference ⊕ conferences ⊕ configuration ⊕ consistency ⊕ cool ⊕ couchdb ⊕ course ⊕ cs ⊕ ctypes ⊕ cython ⊕ data ⊕ data-mining ⊕ data-structures ⊕ data-warehousing ⊕ database ⊕ databases ⊕ datacuration ⊕ dataflow ⊕ datamining ⊕ dataprocessing ⊕ datasets ⊕ datastax ⊕ datastore ⊕ datastructure ⊕ datastructures ⊕ dataviz ⊕ datawarehouse ⊕ data_mining ⊕ db ⊕ dbms ⊕ demo ⊕ design ⊕ desktop ⊕ dev ⊕ develop ⊕ developers ⊕ developerworks ⊕ development ⊕ dht ⊕ differences ⊕ digg ⊕ dist ⊕ distributed ⊕ distributed-computing ⊕ distributedcomputing ⊕ distribution ⊕ dna ⊕ doin-it-wrong ⊕ draft ⊕ dw ⊕ dynomite ⊕ ebooks ⊕ ebs ⊕ ec2 ⊕ education ⊕ elastic ⊕ elasticmapreduce ⊕ emr ⊕ encoding ⊕ energy ⊕ engineering ⊕ english ⊕ erlang ⊕ event ⊕ example ⊕ examples ⊕ facebook ⊕ file ⊕ filesystem ⊕ filter ⊕ filters ⊕ flockdb ⊕ format ⊕ foss ⊕ framework ⊕ frameworks ⊕ free ⊕ freebase ⊕ functional ⊕ future ⊕ gae ⊕ geek ⊕ genetics ⊕ genome ⊕ genomics ⊕ geo ⊕ gfs ⊕ gfs2 ⊕ gis ⊕ good ⊕ google ⊕ graph ⊕ graph-database ⊕ graphd ⊕ graphdb ⊕ graphics ⊕ graphs ⊕ graph_database ⊕ grid ⊕ gui ⊕ hack ⊕ hadoop ⊖ hadoopdb ⊕ hadoopworld ⊕ happy ⊕ hardware ⊕ hashing ⊕ hbase ⊕ hdfs ⊕ hive ⊕ hop ⊕ hosting ⊕ howto ⊕ hpc ⊕ hypertable ⊕ ibm ⊕ implementation ⊕ important ⊕ imported ⊕ index ⊕ indexing ⊕ info ⊕ information-retrieval ⊕ infrastructure ⊕ install ⊕ interesting ⊕ internet ⊕ introduction ⊕ ir ⊕ it ⊕ jabber ⊕ java ⊕ javascript ⊕ jaylinks ⊕ jdbc ⊕ json ⊕ jvm ⊕ jython ⊕ katta ⊕ key-value ⊕ keyvalue ⊕ knowledge ⊕ kvs ⊕ kvstore ⊕ last.fm ⊕ learning ⊕ lesen ⊕ lgpl ⊕ lib ⊕ library ⊕ linkedin ⊕ links ⊕ linux ⊕ list ⊕ log ⊕ logs ⊕ london ⊕ lubm ⊕ lucene ⊕ mac ⊕ machinelearning ⊕ machine_learning ⊕ mahout ⊕ mail ⊕ management ⊕ map ⊕ map-reduce ⊕ mapreduce ⊕ maryland ⊕ merge ⊕ messages ⊕ messaging ⊕ metadata ⊕ metaweb ⊕ microformats ⊕ mining ⊕ mit ⊕ ml ⊕ moa ⊕ mongodb ⊕ monitor ⊕ monitoring ⊕ mrjob ⊕ multicore ⊕ mysql ⊕ neo4j ⊕ netflix ⊕ network ⊕ networking ⊕ networks ⊕ new ⊕ news ⊕ nlp ⊕ node ⊕ node.js ⊕ nokia ⊕ nosql ⊕ nsa ⊕ nyc ⊕ nytimes ⊕ online ⊕ ontology ⊕ open-source ⊕ opensource ⊕ opentsdb ⊕ open_source ⊕ operations ⊕ ops ⊕ optimization ⊕ os ⊕ overview ⊕ package ⊕ packaging ⊕ pagerank ⊕ papers ⊕ parallel ⊕ patterns ⊕ pdf ⊕ performance ⊕ pig ⊕ pipe ⊕ platform ⊕ portable ⊕ post ⊕ postgres ⊕ postgresql ⊕ presentation ⊕ presentations ⊕ processing ⊕ programming ⊕ project ⊕ protocol ⊕ python ⊕ query ⊕ rails ⊕ rdbms ⊕ rdf ⊕ read ⊕ reading ⊕ real-time ⊕ realtime ⊕ recipe ⊕ recommendation ⊕ recommendations ⊕ reddit ⊕ reduce ⊕ reference ⊕ regression ⊕ relational ⊕ relationaldb ⊕ relationship ⊕ replication ⊕ reporting ⊕ repository ⊕ research ⊕ resource ⊕ resources ⊕ rest ⊕ riak ⊕ rpm ⊕ rrd ⊕ rsync ⊕ ruby ⊕ rubyonrails ⊕ s3 ⊕ s4 ⊕ saas ⊕ samples ⊕ scala ⊕ scalability ⊕ scale ⊕ scaling ⊕ science ⊕ scribe ⊕ script ⊕ scripting ⊕ search ⊕ searchengine ⊕ semantic ⊕ semanticweb ⊕ semantic_web ⊕ semweb ⊕ seo ⊕ serialization ⊕ series ⊕ server ⊕ servers ⊕ service ⊕ shard ⊕ sharding ⊕ shell ⊕ similarity ⊕ slide ⊕ slides ⊕ slideshare ⊕ sloc ⊕ small ⊕ socialmedia ⊕ socialnetworking ⊕ socialnetworks ⊕ software ⊕ solr ⊕ source ⊕ sparklines ⊕ sparql ⊕ spatial ⊕ sql ⊕ stackoverflow ⊕ statistics ⊕ stats ⊕ storage ⊕ store ⊕ stream ⊕ streaming ⊕ streamprocessing ⊕ streams ⊕ stumbleupon ⊕ sysadmin ⊕ systems ⊕ t ⊕ tech ⊕ technology ⊕ testing ⊕ text ⊕ text-mining ⊕ textmining ⊕ thrift ⊕ time ⊕ time-series ⊕ timeseries ⊕ tips ⊕ to-read ⊕ todo ⊕ tool ⊕ tools ⊕ toread ⊕ tracking ⊕ trend ⊕ trendingtopics ⊕ trends ⊕ triple ⊕ triple-store ⊕ triplestore ⊕ tuple ⊕ tuples ⊕ tuplespace ⊕ tutorial ⊕ twitter ⊕ ui ⊕ uk ⊕ unix ⊕ unread ⊕ usergroup ⊕ via:pskomoroch ⊕ video ⊕ videos ⊕ visualization ⊕ vldb ⊕ voldemort ⊕ vs ⊕ warehouse ⊕ web ⊕ web-development ⊕ web-services ⊕ webdev ⊕ webservice ⊕ webservices ⊕ weka ⊕ wiki ⊕ wikipedia ⊕ work ⊕ world ⊕ xml ⊕ xmpp ⊕ yahoo ⊕ yale ⊕ yam ⊕ yelp ⊕ zippy ⊕ zookeeper ⊕Copy this bookmark: