donturn + data   66

Data-Intensive Text Processing with MapReduce
free ebook w/github supported edits - Data-Intensive Text Processing with MapReduce #datascience #text
datascience  github  text  research  mapreduce  data 
6 weeks ago by donturn
Welcome to Hive!
Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive provides a mechanism to project structure onto this data and query the data using a SQL-like language called HiveQL. At the same time this language also allows traditional map/reduce programmers to plug in their custom mappers and reducers when it is inconvenient or inefficient to express this logic in HiveQL.
data  database  hadoop  opensource  nosql 
november 2011 by donturn
Testing Benford's Law
examining large datasets for Benford's Law
statistics  math  data  from twitter
november 2011 by donturn
Welcome to Apache Pig!
Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.
apache  data  hadoop  mapreduce  opensource  sql  database  datascience 
september 2011 by donturn
Apache Flume
Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop's HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
opensource  apache  logs  datascience  kdd  data 
september 2011 by donturn
Data mining, forecasting and bioinformatics competitions on Kaggle
contests where winners are paid (not very much) to solve data oriented problems
datamining  data  outsourcing  stats 
february 2011 by donturn
Needlebase
merge data, crawl web data, then chart and explore it
api  data  database  datascience  kdd  etl  stats  charts  map 
february 2011 by donturn
Chart Chooser – Juice Analytics
Chart Chooser lets you pick what you're trying to present & get powerpoint and excel templates #data
analytics  viz  charts  data  datascience  stats  #data 
february 2011 by donturn
Wrangler
looks a lot like excel. but in the cloud? but with bigger data sets?
analysis  analytics  data  tools  visualization  viz  etl  kdd  excel 
february 2011 by donturn
google-refine - Project Hosting on Google Code
Google Refine looks like a great tool for cleaning & transforming messy data for use w/web services
analysis  data  datamining  google  tools  dev  datascience  from twitter
november 2010 by donturn
comScore, Inc.
data collection, but murky methodology
quant  quantia  behavior  analytics  ratings  data 
april 2010 by donturn
cityofsound: The street as platform
The way the street feels may soon be defined by what cannot be seen with the naked eye.
mobile  design  kdd  data_mining  privacy  ubicomp  wireless  wifi  web  research  scifi  urbanism  networks  data  information 
march 2008 by donturn

related tags

#data  academic  analysis  analytics  apache  api  atom  austin  backup  behavior  bi  bibliometrics  blog  blogs  cf  charts  cities  classification  cloud  code  communication  content  crawler  dashboard  data  database  datamining  datascience  data_mining  data_science  design  dev  dns  economist  empirical  etl  excel  extensions  finance  firefox  geo  github  google  graph  graphics  gui  hacks  hadoop  hci  history  ia  ibm  influence  information  information_architecture  intelligence  interface  internet  investing  ir  katta  kdd  kelvin  km  language  links  location  logs  lucene  mac  map  mapreduce  math  media  metroia  microsoft  mis  ml  mobile  mozilla  network  networks  nlp  nosql  olap  open  opensource  outsourcing  parsing  pim  pkm  planet  plugin  privacy  programming  public  python  qualitative  quanit  quant  quantia  quantitative  ranking  ratings  rdf  readability  recommender  regex  regression  reports  research  rss  rstats  rsync  science  scifi  scraper  search  security  sentiment  social  socialgraph  socialnetworks  social_computing  solr  spider  spreadsheet  sql  startup  statistics  stats  study  survey  sync  tagging  text  tools  trading  twitter  ubicomp  urbanism  ux  visualization  viz  web  wifi  wikipedia  windows  wireless  wordnet  yahoo  zip 

Copy this bookmark:



description:


tags: