mcroydon + data   433

Machine Learning in Python Has Never Been Easier! « The Official Blog of BigML.com
This looks really neat. ML in the cloud or bring it back down and run it yourself.
data  machine  machinelearning  ml  python 
23 days ago by mcroydon
Data Mining: Finding Similar Items and Users
Similar to the first couple of chapters of Programming Collective Intelligence.
algorithm  data  data-mining  programming 
january 2012 by mcroydon
Driving down the cost of Big-Data analytics - All Things Distributed
"The Amazon Elastic MapReduce (EMR) team announced today the ability to seamlessly use Amazon EC2 Spot Instances with their service, significantly driving down the cost of data analytics in the cloud."
analytics  aws  bigdata  data  datamining 
september 2011 by mcroydon
MemoryImage
Something that Martin Fowler said.
data  memory  database  performance  scalability 
september 2011 by mcroydon
Mining of Massive Datasets
Looks like a fantastic book on data mining.
book  books  data  datamining  mapreduce 
september 2011 by mcroydon
Spark Cluster Computing Framework
"Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write."
analytics  hadoop  data  scala 
july 2011 by mcroydon
Arq S3 data format
On-S3 data format that Arq uses. Similar to the way git does it.
backup  documentation  data  mac 
may 2011 by mcroydon
Data.js
A solid looking persistent graph database and some other nice data storage primitives. Works in browser or node.js.
data  graph  inspiration  javascript  json 
march 2011 by mcroydon
Buzz by Google Research
"Dremel is a scalable, interactive ad-hoc query system for analysis of read-only nested data. By combining multi-level execution trees and columnar data layout, it is capable of running aggregation queries over trillion-row tables in seconds. The system scales to thousands of CPUs and petabytes of data, and has thousands of users at Google. In this paper, we describe the architecture and implementation of Dremel, and explain how it complements MapReduce-based computing. We present a novel columnar storage representation for nested records and discuss experiments on few-thousand node instances of the system."
google  data  nosql  research  paper 
march 2011 by mcroydon
d3.js
Beautiful visualizations, beautiful API.
data  framework  javascript  svg  visualization  via:nelson 
march 2011 by mcroydon
Pattern | CLiPS
A Python NLP package with emphasis on retrieving and analyzing language found on the web.
analysis  data  datamining  nlp  python 
february 2011 by mcroydon
WeatherSpark | Interactive Weather Charts
Impressive visualizations of historical weather data.
weather  visualization  data  via:nelson 
february 2011 by mcroydon
Basho Riak: Schema Design and the Transition from Relational Databases
A solid collection of introductory to deep dive material for folks used to having it easy and relational.
riak  nosql  sql  data  migration 
december 2010 by mcroydon
Kafka
"Kafka is a distributed publish/subscribe messaging system"
activity  asynchronous  backend  data  analytics  messaging 
december 2010 by mcroydon
Digital Obstacle File
"The Digital Obstacle File describes all known obstacles of interest to aviation users in the U.S., with limited coverage of the Pacific, the Caribbean, Canada, and Mexico. The obstacles are assigned unique numerical identifiers; accuracy codes, and listed in order by state."
faa  data 
november 2010 by mcroydon
OpenTSDB - A Distributed, Scalable Monitoring System
OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.
analysis  architecture  bigdata  cloud  data  database  db  java  lgpl  hbase  hadoop  development  graph  distributed  monitoring  nosql  opensource  operations  scalability  scale  time  sysadmin  software  storage  series  opentsdb  rrd  stumbleupon  time-series  timeseries 
november 2010 by mcroydon
New Startup Analyzes 100,000 Web Pages With a Snap of Your Fingers
These tools are getting a lot better but still require lots of human intervention to avoid false assertions.
analysis  extractiv  data  research  science  semantic  tool  startup  semanticweb  crawling 
november 2010 by mcroydon
s4: distributed stream computing platform
"S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data."
apache  bigdata  cloud  cloudcomputing  cluster  computing  mapreduce  map  java  hadoop  framework  distributed  data  opensource  processing  platform  programming  real-time  streaming  stream  software  scalability  reduce  realtime  streamprocessing  yahoo  tool  s4  streams 
november 2010 by mcroydon
KDD 2011: KDD Cup
Big collaborative filtering dataset with interesting properties.
data  dataset  music  yahoo 
november 2010 by mcroydon
How to publish Linked Data on the Web
This is a fantastic overview of linked open data and how to create your own resources and link to others.
article  foaf  data  howto  linked  linked-data  linked_data  publishing  programming  microformats  metadata  linkeddata  rdf  reference  semantic  semantic-web  semantic_web  uri  tutorials  tutorial  standards  toread  semanticweb  semweb  web  web2.0  web3.0  webdev  ontologies  structured 
october 2010 by mcroydon
Silk - A Link Discovery Framework for the Web of Data
"The Silk framework is a tool for discovering relationships between data items within different Linked Data sources. Data publishers can use Silk to set RDF links from their data sources to other data sources on the Web."
app  code  applications  data  datamining  framework  library  owl  linking  lod  opensource  linkeddata  linked-data  programming  python  rdf  semantic  semantic-web  web  tools  tool  sparql  software  semweb  semanticweb  silk 
october 2010 by mcroydon
Datasets in the next LOD Cloud
The dataset behind the increasingly insane LOD cloud.
data  dataset  datasets  index  linkeddata  lod  metadata  semweb  semanticweb 
october 2010 by mcroydon
New York Times - Linked Open Data
"For the last 150 years, The New York Times has maintained one of the most authoritative news vocabularies ever developed. In 2009, we began to publish this vocabulary as linked open data."
api  data  dataset  database  datasets  folksonomy  free  linked  nytimes  nyt  metadata  media  lod  linkeddata  linked_data  open  opendata  opensource  rdf  research  semantic  tagging  semweb  semanticweb  semantic_web  semantic-web  taxonomy  vocabulary  linked-data  new_york_times 
october 2010 by mcroydon
ALFRED: ArchivaL Federal Reserve Economic Data
"ALFRED® allows you to retrieve vintage versions of economic data that were available on specific dates in history."
archive  business  data  economics  finance  government  realtime  publicdata  research  interest  links  stocks  statistics  resource  historical  federal_reserve  federalreserve  govdocs  fred 
september 2010 by mcroydon
AWS Import/Export
A station wagon full of backup tapes now has an API.
amazon  aws  bigdata  beta  backup  cloud  cloud_computing  service  s3  large  import  export  ec2  data  cloudcomputing  storage  tools  carrier  sneakernet  transfer 
june 2010 by mcroydon
Cassandra Basics: indexing
A deck on Cassandra data modeling from a Seattle scalability meetup.
cassandra  slide  slides  data  model  data-model 
may 2010 by mcroydon
« earlier      

related tags

2.0  3d  3g  4store  6.830  37signals  @toread  a/b  aa  ab  abdera  abtest  abtesting  academic  academics  access  accessibility  acm  action  actionscript  activity  ad  admin  adobe  advice  aggregation  agile  ai  air  aircraft  airline  airplane  airplanes  airport  ajax  alcohol  algo  algorithims  algorithm  algorithms  algos  alternative  amazing  amazon  ambient  america  analyser  analysis  analytics  and  android  animation  anonymity  ap  apache  api  apis  apl  app  appengine  apple  application  applications  apps  apt  arcgis  arch  architecture  archive  archives  arduino  art  article  articles  artificialintelligence  asdi  asp  asynchronous  atom  atompub  augmented  authentication  autocomplete  availability  avatar  aviation  avr  avro  awesome  aws  baby  backend  backtype  backup  bandwidth  bar  barcode  barcodes  bash  bashreduce  basics  basketball  batch  battery  bayesian  bbc  bdd  beautifulsoup  benchmark  benchmarks  berkeley  bestpractices  beta  bi  big  bigdata  bigtable  billing  binary  bioinformatics  biology  birthday  bit  bitmap  biz  bizgres  blob  block  blog  blogs  bloom  bloom-filter  bloomfilter  bloomfilters  boeing  book  book:lgc2  book:PCI  bookmarking  books  bridge  britain  british  broadband  browser  browsers  bubble  buffers  business  businessintelligence  buy  c  c#  c++  cache  caching  calendar  campaign  campaignfinance  canada  canvas  cap  car  career  carrier  cartographie  cartography  caspio  cassandra  cemetery  census  center  chart  charting  charts  cheatsheet  chemistry  chesapeake-bay  cinema  cingular  cities  citizen  city  ckan  class  classes  classification  classified  clean  cli  click  clientside  clock  clojure  clothing  cloud  cloud-computing  cloudcomputing  cloudera  cloud_computing  cloverfield  cluster  clustering  clusters  cms  code  codes  coding  collaboration  collaborative  collaborativeediting  collection  college  column  column-oriented  column-store  columndb  comment  communication  community  companies  company  comparison  complexity  compress  compression  compsci  computer  computer-science  computers  computerscience  computervision  computer_science  computing  concurrency  conference  congress  consistency  consistent  consulting  contacts  container  content  contentstrategy  context  conversion  convert  converter  conveyor-belt  cool  cooling  copyright  corporate  corpus  cost  couchdb  counter-strike  course  courses  court  courts  crash  crawl  crawler  crawling  crazy  creation  creative-commons  creativecommons  crime  crowdsourcing  cs  css  csv  ctan  cuecat  culture  curating  custom  customer  d2g  dabbledb  damien  dashboard  data  data-mining  data-model  data-structure  data-structures  data-visualization  data-warehousing  database  database-journalism  databases  databrowser  datacenter  datacenters  datacuration  dataflow  datamining  datascientist  dataset  datasets  datastore  datastructure  datastructures  datavisualization  dataviz  datawarehouse  data_mining  data_structure  data_warehouse  date  db  dbms  dbpedia  decisions  delay  demographics  deployment  design  dev  developer  developers  development  dht  diagram  diff  differences  digg  directory  discussion  disk  disney  display  dist  distributed  distributed-computing  distributedcomputing  distribution  district  django  dna  doc  document  documentation  dojo  dom  domain  dot  downloads  draft  driven  dst  dtn  dtnrg  dublincore  dump  dw  dwh  dynamo  dynomite  e-books  ean  ebook  ebooks  ebs  ec2  ecommerce  economics  economy  ecosystem  edd  edge  editing  editor  editors  edu  education  effects  efficiency  efficient  elastic  elasticmapreduce  election  elections  electricity  electronic  electronics  ellington  emr  encoding  encyclopedia  energy  engineering  england  english  enterprise  entrepreneurship  epoc  erlang  erp  estate  ethics  eval  evaluation  events  everyblock  evolution  example  examples  excel  expenses  expensive  experience  experiment  experiments  export  extract  extraction  extractiv  extractor  faa  facebook  fail  fast  fastbit  federal  federalreserve  federal_reserve  federation  feed  feeds  feedserver  ferrett  fiction  file  filesystem  fileupload  film  filter  filtering  filters  finance  find  firefox  firewire  flash  flex  flickr  flight  flights  flot  flow  flowchart  flowingdata  flu  foaf  foia  folksonomy  fonts  food  football  form  format  formidable  forms  forum  foss  framework  frameworks  fred  free  freebase  freelancing  freeware  friendfeed  friends  frustration  ftp  ftrain  fun  functional  funding  funny  fuse  future  gadgets  game  games  gaming  garlik  gdata  geek  gem  gems  genealogy  generator  genetics  genius  genome  genomics  geo  geocoding  geodata  geographic  geography  geojson  geolocation  georss  geospatial  geotagging  geowanking  gephi  ghetto  gift  gis  git  glow  gmail  gmaps  gnome  good  google  google-maps  google-refine  googlecode  googledocs  gov  govdocs  government  gpl  gps  grain  graph  graph-database  graph-theory  graphd  graphic  graphics  graphing  graphs  graphviz  graph_database  greatbritain  greenplum  grid  gridworks  guadec  guardian  guide  guides  hack  hackernews  hacking  hacks  hadoop  halvarian  hardware  harvard  hash  hashing  haskell  haystack  hbase  hci  hdfs  heatmap  hid  highway  hiring  historical  history  hive  hivemind  hl2  hockey  holovaty  hosting  howto  hpc  hsdpa  html  html5  http  humor  hyperlocal  hypertable  ia  ibm  id  ideas  identification  identity  ie  ietf  illustrator  image  images  imdb  implementation  import  important  imported  impressive  index  indexing  inference  info  infographic  infographics  infoiasi  information  information-design  information_extraction  infovis  infoviz  infrastructure  innovation  inspection  inspiration  integration  intelligence  interactive  interchange  interest  interesting  interface  internals  internationalization  internet  interop  interview  intro  introduction  ip  iphone  ipod  ir  ireland  isbn  it  iteration  java  javascript  jaylinks  jenit  jet  jobs  jon-udell  journal-world  journalism  jquery  js  json  judicial  jvm  kansas  kbi  kestrel  key  key-value  keystore  keyvalue  khp  kml  kmz  knitting  knowledge  ku  kvs  kvstore  language  large  large-scale  last.fm  latex  launch  law  lawrence  layout  lbs  leadership  learning  lecture  lecture-notes  lectures  legal  lego  lehigh  lesen  lessons  lgpl  lib  libraries  library  license:afl  license:bsd  license:mit  light  linguistics  linked  linked-data  linkeddata  linkedin  linked_data  linking  linkingopendata  links  linux  lisp  list  lists  livejournal  ljworld  local  localization  location  lod  log  logger  logging  login  logs  london  lubm  lucene  luciddb  mac  machine  machine-learning  machine.learning  machinelearning  machine_learning  magic  make  management  manual  map  map-reduce  mapping  mapreduce  maps  market  marketing  markov  markov-chain  markov.chain  markovchain  markup  marmota  mashup  mashups  math  mathematics  matlab  measurement  media  medicare  medicine  memcache  memcached  memory  merge  messagequeue  messaging  meta  metadata  metaweb  meth  methamphetamine  methodology  metric  metrics  microformats  migration  migurski  mindstorms  mining  minor  mit  mix  ml  moa  mobile  mobwrite  model  mods  module  mogilefs  monetdb  money  mongodb  monitor  monitoring  monte-carlo  monte.carlo  moon  motion  movie  movies  moving  mp3  mrjob  multicore  multidimensional  multimedia  multipart  music  mysql  name  names  natural-language  navigation  ncaa  neo4j  neogeography  netflix  nettuts  network  networking  networks  newmedia  news  newsmedia  newspaper  newspapers  newyorktimes  new_york_times  nfs  nlp  node  node.js  nodebox  nodejs  nokia  nosql  notes  numpy  nxt  nymag  nyt  nytimes  oakland  oauth  oceanography  ocw  odf  office  ogd  olap  olson  oma  online  ontologies  ontology  open  open-source  opencontent  opencourseware  opendata  openid  openknowledge  openoffice  openplatform  opensocial  opensource  openstreetmap  opentsdb  open_data  open_source  operations  ops  optimisation  optimization  oreilly  orm  os  osm  oss  osx  ottawa  overview  owl  packaging  pagerank  panda  pandas  paper  papers  parallel  parse  parser  parsing  partition  pathfinding  pattern  patterns  pdf  pdfminer  people  performance  perl  persistence  person  personal  personalization  phone  phones  photo  photography  photos  php  physics  pictures  pig  plane  planes  planning  platform  playground  plot  plugin  plugins  police  politics  population  portable  possession  postgis  postgres  postgresql  postscript  pov  power  pownce  poynter  precinct  prediction  pregel  presentation  presentations  press  pretty  primary  privacy  probability  processing  processing.js  processing.org  product  production  productivity  programming  project  projects  protocol  protovis  provider  ps  psion  pstricks  public  public-domain  publicdata  publishing  pymunk  python  qgis  quality  query  queue  queues  r  radio  raid  rails  raytracing  rdbms  rdf  rdfa  rdfstore  read  read-later  reading  real-time  realtime  recommendation  redis  reduce  reference  references  regex  registration  regression  relational  relationaldb  relationship  release  remix  rendering  repl  report  reporting  repository  research  resource  resources  rest  restaurant  restaurants  restful  review  reviews  riak  robotics  robots  roller-coaster  rpm  rrd  rss  rsync  ruby  rubyonrails  s3  s4  saas  salaries  salary  saml  sample  sas  satellite  scala  scalability  scalaris  scale  scaling  schema  schema-less  schemaless  scheme  school  schools  science  science-fiction  scientist  scipy  scotland  scrape  scraper  scraping  screen  screenscraping  script  scripting  scripts  search  security  semantic  semantic-web  semantic.web  semantics  semanticweb  semantic_web  semweb  seo  serialization  series  server  servers  service  services  sesame  sharding  share  sharing  shell  ship  shipping  shopping  sicp  sig  silk  simpledb  simplegeo  skills  skype  slide  slides  slideshare  smartphones  sna  sneakernet  snippets  soa  social  socialgraph  socialmedia  socialnetwork  socialnetworking  socialnetworks  socialsoftware  social_media  society  software  softwareengineering  solr  source  space  sparkfun  sparkline  sparklines  sparql  spatial  specification  speed  spider  split-testing  sports  spreadsheet  spreadsheets  sql  sqlite  ssjs  sso  stackoverflow  stamen  standard  standards  stanford  starling  startup  stata  state  statistical  statistical_learning  statistics  stats  steam  stocks  storage  store  strangeloop  strategy  stream  streaming  streamprocessing  streams  structure  structured  structures  stumbleupon  suggestions  sun  supercomputer  support  survey  sustainability  svg  sweater  swivel  sxsw  symbian  sync  synchronization  syncml  syndication  sysadmin  system  systems  t  ta  tables  tagging  tags  taskforce  taxonomy  tdd  tech  technique  techniques  technology  template  templates  test  testing  tex  text  text-mining  textile  textmining  theorem  theory  thrift  time  time-based  time-series  timeline  timeseries  timezone  timezones  tips  to-read  todo  tokyocabinet  tokyotyrant  tolerant  tool  toolkit  tools  toread  torrent  tour  towatch  track  tracking  trading  traffic  tragedy  training  transfer  transformation  transparency  transportation  travel  tree  trees  trend  trendingtopics  trends  tricks  trie  triple  triple-store  triplestore  tufte  tuple  tuples  tuplespace  tutorial  tutorials  tweet  twisted  twitter  twittertools  type  types  tz  ubigraph  ucc  ui  uiuc  uk  unicode  unintentional  university  unix  upc  upload  uploads  ups  urban  uri  url  us  usability  usb  useful  user  user-interface  users  utf-8  utilities  ux  validation  value  vanity  vector  venn  venndiagram  venture  via:chl  via:jacobkm  via:jkokerhans  via:nelson  via:pskomoroch  video  videos  virtual  vision  visual  visualisation  visualizaation  visualization  vocabulary  voldemort  voronoi  vote  voting  vr  w3c  wages  wales  warehouse  warehousing  washington  washingtonpost  water  weather  web  web-design  web-services  web2.0  web3.0  webapp  webapps  webdesign  webdev  weblog  websemantique  webserver  webservice  webservices  weka  weta  widget  widgets  wiki  wikipedia  wired  wireless  wishlist  work  workflow  world  wrangler  wrangling  writing  xhtml  xml  xmp  xmpp  xtech  yahoo  yaml  yelp  yes  youtube  yui  zippy 

Copy this bookmark:



description:


tags: