Ancestry.com Forum Dataset
8 weeks ago
The Ancestry.com Forum Dataset was created with the cooperation of Ancestry.com in an effort to promote research on information retrieval, language technologies, and social network analysis. It contains a full snapshot of the Ancestry.com online forum, boards.ancestry.com, from July 2010. This message board is large, with over 22 million messages, over 3.5 million authors, and active participation for over ten years.
dataset
text
forum
messages
socialnetwork
ancestry
search
from delicious
8 weeks ago
sherlock-holmes-a-game-of-shadows-00-470-75.jpg (470×265)
february 2012
Street Fighting Detective
image
streetfighting
holmes
datascience
from delicious
february 2012
The Geomblog: The Shonan Meeting (Part 3): Optimal Distributed Sampling
january 2012
ly to the distributed setting. Each player now runs this protocol instead of the previous one, and every time the coordinate gets an update, it sends out a new global threshold (the minimum over all thresholds sent in) to all nodes. If you want to maintain a sample of size
sampling
reservoir
distributed
statistics
from delicious
january 2012
Common Crawl Corpus : Public Data Sets : Amazon Web Services
january 2012
A corpus of web crawl data composed of 5 billion web pages. This data set is freely available on Amazon S3 and formatted in the ARC (.arc) file format.
dataset
commoncrawl
web
text
corpus
pagerank
from delicious
january 2012
Panopticon - Wikipedia, the free encyclopedia
january 2012
The Panopticon is a type of institutional building designed by English philosopher and social theorist Jeremy Bentham in the late eighteenth century. The concept of the design is to allow an observer to observe (-opticon) all (pan-) inmates of an institution without them being able to tell whether or not they are being watched.
identity
privacy
observation
society
from delicious
january 2012
IndexTank - hosted search you control
december 2011
Suppose you want to find only the true enthusiasts on the forum. You can search for posts that contain Bioshock and love.
indextank
tutorial
ruby
search
service
from delicious
december 2011
advice
ajax
algorithm
algorithms
amazon
analysis
analytics
apache
api
app
article
automation
aws
benchmark
beowulf
berkeley
bioinformatics
blog
book
books
business
c
C++
capistrano
chart
cloud
cloudera
cluster
clustering
cmu
code
collaborative
commercial
community
company
competition
computerscience
computing
conference
configuration
continuous-integration
course
crawler
data
database
datamining
dataset
dc
deployment
design
detection
development
distributed
django
documentation
ec2
ec2post
economics
education
email
event
example
extraction
facebook
fedora
filtering
finance
freebsd
geo
gis
git
github
google
government
graph
hack
hacks
hadoop
howto
image
install
iphone
java
javascript
jquery
keyword
learning
lectures
library
linkedin
links
linux
list
location
log
longtail
lucene
mac
machinelearning
map
mapreduce
marketing
markov
mashup
mathematics
matlab
matplotlib
matrix
mechanicalturk
microsoft
mit
mpi
mysql
named_entity
netflixprize
network
neuralnetwork
neuroscience
news
nlp
numpy
nutch
opensource
optimization
osx
pagerank
paper
parallel
pdf
people
performance
physics
pig
plugin
politics
prediction
presentation
processing
programming
publicdata
python
query
queryminer
questions
r
rails
ranking
recognition
recommendation
record_linkage
redis
redistributable
reference
research
resources
rest
retail
ruby
rubyonrails
s3
sales
scalability
scipy
screencast
search
security
sentiment
seo
service
similarity
skills
slides
social
socialnetwork
software
sparse
spatial
spec
ssh
stanford
startup
statistics
streaming
syntax
sysadmin
tag
talk
testing
text
textmining
timeseries
tips
tool
tools
towatch
transparency
trendingtopics
trends
tutorial
twitter
ubuntu
via:chl
video
visualization
web
web2.0
webservice
wikipedia
xml
yahoo