410
showing only delicious [see all]
Ancestry.com Forum Dataset
The Ancestry.com Forum Dataset was created with the cooperation of Ancestry.com in an effort to promote research on information retrieval, language technologies, and social network analysis. It contains a full snapshot of the Ancestry.com online forum, boards.ancestry.com, from July 2010. This message board is large, with over 22 million messages, over 3.5 million authors, and active participation for over ten years.
dataset  text  forum  messages  socialnetwork  ancestry  search  from delicious
8 weeks ago
The Geomblog: The Shonan Meeting (Part 3): Optimal Distributed Sampling
ly to the distributed setting. Each player now runs this protocol instead of the previous one, and every time the coordinate gets an update, it sends out a new global threshold (the minimum over all thresholds sent in) to all nodes. If you want to maintain a sample of size
sampling  reservoir  distributed  statistics  from delicious
january 2012
Common Crawl Corpus : Public Data Sets : Amazon Web Services
A corpus of web crawl data composed of 5 billion web pages. This data set is freely available on Amazon S3 and formatted in the ARC (.arc) file format.
dataset  commoncrawl  web  text  corpus  pagerank  from delicious
january 2012
Panopticon - Wikipedia, the free encyclopedia
The Panopticon is a type of institutional building designed by English philosopher and social theorist Jeremy Bentham in the late eighteenth century. The concept of the design is to allow an observer to observe (-opticon) all (pan-) inmates of an institution without them being able to tell whether or not they are being watched.
identity  privacy  observation  society  from delicious
january 2012
IndexTank - hosted search you control
Suppose you want to find only the true enthusiasts on the forum. You can search for posts that contain Bioshock and love.
indextank  tutorial  ruby  search  service  from delicious
december 2011
« earlier      
advice ajax algorithm algorithms amazon analysis analytics apache api app article automation aws benchmark beowulf berkeley bioinformatics blog book books business c C++ capistrano chart cloud cloudera cluster clustering cmu code collaborative commercial community company competition computerscience computing conference configuration continuous-integration course crawler data database datamining dataset dc deployment design detection development distributed django documentation ec2 ec2post economics education email event example extraction facebook fedora filtering finance freebsd geo gis git github google government graph hack hacks hadoop howto image install iphone java javascript jquery keyword learning lectures library linkedin links linux list location log longtail lucene mac machinelearning map mapreduce marketing markov mashup mathematics matlab matplotlib matrix mechanicalturk microsoft mit mpi mysql named_entity netflixprize network neuralnetwork neuroscience news nlp numpy nutch opensource optimization osx pagerank paper parallel pdf people performance physics pig plugin politics prediction presentation processing programming publicdata python query queryminer questions r rails ranking recognition recommendation record_linkage redis redistributable reference research resources rest retail ruby rubyonrails s3 sales scalability scipy screencast search security sentiment seo service similarity skills slides social socialnetwork software sparse spatial spec ssh stanford startup statistics streaming syntax sysadmin tag talk testing text textmining timeseries tips tool tools towatch transparency trendingtopics trends tutorial twitter ubuntu via:chl video visualization web web2.0 webservice wikipedia xml yahoo

Copy this bookmark:



description:


tags: