jonty + data   46

LOCFIT - Local Regression and Likelihood
"LOCFIT is a software system for fitting curves and surfaces to data, using the local regression and likelihood methods. The code is mostly written in C; an interface is provided that enables LOCFIT to be used as an S-Plus or R library."
r  statistics  plugin  curve  fitting  graph  data  regression  stats  from delicious
july 2011 by jonty
The MessagePack Project
"MessagePack is a binary-based efficient object serialization library. It enables to exchange structured objects between many languages like JSON. But unlike JSON, it is very fast and small." - Also includes an RPC protocol and implementations.
thrift  protocolbuffers  protobuf  network  rpc  api  ipc  data  message  protocol  nt  from delicious
june 2011 by jonty
mlpy - Machine Learning PYthon - Predictive Modeling - Classification and Regression
"It provides high level procedures that support, with few lines of code, the design of rich Data Analysis Protocols (DAPs) for preprocessing, clustering, predictive classification, regression and feature selection. Methods are available for feature weighting and ranking, data resampling, error evaluation and experiment landscaping."
python  machinelearning  ai  numpy  ml  library  processing  data  clustering  cluster  from delicious
april 2011 by jonty
d3.js
"D3 allows you to bind arbitrary data to a Document Object Model (DOM), and then apply data-driven transformations to the document. As a trivial example, you can use D3 to generate a basic HTML table from an array of numbers. Or, use the same data to create an interactive SVG bar chart with smooth transitions and interaction."
javascript  visualisation  data  framework  svg  force  directed  graph  graphs  canvas  from delicious
march 2011 by jonty
ngrep - network grep
"ngrep strives to provide most of GNU grep's common features, applying them to the network layer. ngrep is a pcap-aware tool that will allow you to specify extended regular or hexadecimal expressions to match against data payloads of packets."
network  grep  monitoring  wireshark  pcap  tcpdump  data  from delicious
march 2011 by jonty
Pattern
"Pattern is a web mining module for the Python programming language. It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks)."
python  datamining  nlp  web  data  parsing  text  language  twitter  google  wikipedia  sentiment  analysis  flickr  lsa  wordnet  ngram  html  dom  parser  graph  visualisation  from delicious
february 2011 by jonty
Singular Value Decomposition (SVD) Tutorial
"When you browse standard web sources like Singular Value Decomposition (SVD) on Wikipedia, you find many equations, but not an intuitive explanation of what it is or how it works. Singular Value Decomposition is a way of factoring matrices into a series of linear approximations that expose the underlying structure of the matrix."
svd  matrix  matricies  lsa  statistics  data  ai  from delicious
january 2011 by jonty
construct
"Construct is a python library for parsing and building of data structures (binary or textual). It is based on the concept of defining data structures in a declarative manner, rather than procedural code: more complex constructs are composed of a hierarchy of simpler ones. It's the first library that makes parsing fun, instead of the usual headache it is today."
python  parser  parsing  binary  datastructures  data  structure  from delicious
january 2011 by jonty
Doc⚡split
"Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text via OCR if necessary, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages...)"
ruby  pdf  document  parsing  ocr  documents  data  processing  split  from delicious
december 2010 by jonty
Network&Society - Borderline
"Redrawing the map of Great Britain from a network of human interactions." - Utilises an anonymised set of several million phone call records to partition the UK based on the geography of the call participants, with surprising and beautiful results.
map  mapping  uk  britain  network  phone  records  data  visualisation  partitioning  boundary  boundaries  call  society  clustering  region  regional 
december 2010 by jonty
The Infinite Monkeywrench
"... is a collection of tools to download, clean, process, and package datasets from a variety of sources (HTML, RSS, XML, CSV, &c) into a variety of formats (XML, CSV, Excel, JSON, SQL, YAML, &c). Interacting with IMW is as simple as creating a YAML file which describes the workflow involved in processing the data and feeding it to the imw command line program."
data  ruby  processing  process  parsing  csv  yaml  xml  json  rss  html  format  parser 
december 2010 by jonty
scikits.learn: machine learning in python
"scikits.learn is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib). It aims to provide simple and efficient solutions to learning problems that are accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering."
python  machinelearning  library  ai  learning  scipy  numpy  matplotlib  matlab  machine  ml  data  processing  algorithms 
october 2010 by jonty
Tracker Video Analysis and Modeling Tool
"Tracker is a free video analysis and modeling tool built on the Open Source Physics (OSP) Java framework." - Supports object tracking, autotracking and video modeling. Has a complex data analysis tool with automatic and manual curve fitting. Bring on the next youtube hoax, I want to analyse the hell out of it.
physics  video  science  analysis  tracker  videos  processing  data  modelling 
october 2010 by jonty
Zero Intelligence Agents » The Value of Edges in Complex Network Visualization
"In my experience, except for the sparsest of network data, edges adds very little information to the visualization. In fact, edges often detract from the analytical value of a network plot by creating a confusing weave of lines that are impossible to follow or understand. I propose that the value of drawing edges is actually an asymptotic function of the density of the network data in question. I even made a picture."
graphs  layout  network  networks  visualisation  data  graph  edges  edge  forcedirected 
april 2010 by jonty
An Easy Way to Make a Treemap | FlowingData
Back in 1990, Ben Shneiderman, of the University of Maryland, wanted to visualize what was going on in his always-full hard drive. He wanted to know what was taking up so much space. Given the hierarchical structure of directories and files, he first tried a tree diagram. It got too big too fast to be useful though. Too many nodes. Too many branches.

The treemap was his solution. It's an area-based visualization where the size of each rectangle represents a metric since made popular by Martin Wattenberg's Map of the Market and Marcos Weskamp's newsmap.
visualisation  r  statistics  treemap  data  graphics  tutorial  mathematics  maths  graph  graphs 
march 2010 by jonty
How to: make a scatterplot with a smooth fitted line | FlowingData
Oftentimes, you'll want to fit a line to a bunch of data points to make it easier to spot patterns or relationships. It might be observations over time or it might be two variables that are possibly related. In either case, a scatter plot just might not be enough to see anything useful. This tutorial will show you how to graph a fitted line, or loess curve, to such a scatter plot.
r  statistics  data  scatterplot  graph  stats  tutorial  mathematics  maths  visualisation  plot 
march 2010 by jonty
How to Make a Heatmap – a Quick and Easy Solution | FlowingData
How do you make a heatmap? This came from kerimcan in the FlowingData forums, and krees followed up with a couple of good links on how to do them in R. It really is super easy. Here's how to make a heatmap with just a few lines of code.
heatmap  visualisation  r  data  graphics  charts  chart  graph  graphs  statistics 
january 2010 by jonty
Orange - Data Mining Fruitful & Fun
Open source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Extensions for bioinformatics and text mining. Comprehensive, flexible and fast.
python  datamining  software  programming  statistics  ai  data 
december 2009 by jonty
Choosing a chart type
A flowchart to help you decide how best to represent data.
visualisation  charts  presentation  reference  chart  data  graphics  statistics 
november 2009 by jonty
Data Mining with R: learning by case studies
The main goal of this book is to introduce the reader to the use of R as a tool for performing data mining. R is a freely downloadable language and environment for statistical computing and graphics. Its capabilities and the large set of available packages turn this tool into an excellent alternative to the existing (and expensive!) data mining tools.
r  statistics  datamining  data  processing  reference  book  stats 
november 2009 by jonty
Fitbit
Fitbit is an accelerometer based activity tracker.
technology  tracking  wireless  sleep  exercise  fitness  health  pedometer  accelerometer  data  logger 
september 2009 by jonty
The Data Liberation Front (the Data Liberation Front)
We intend for this site to be a central location for information on how to move your data in and out of Google products.
google  data  privacy  dataportability  open  rights  export  import  gmail 
september 2009 by jonty
NaPTAN - Wikipedia, the free encyclopedia
The National Public Transport Access Node (NaPTAN) database is a UK nationwide system for uniquely identifying all the points of access to public transport in the UK.

Every UK railway station, coach terminus, airport, ferry terminal, bus stop, taxi rank or other place where public transport can be joined or left is allocated a unique NaPTAN identifier. The relationship of the stop to a City, Town, Village or other locality can be indicated through an association with elements of the National Public Transport Gazetteer.
naptan  transport  transportation  trains  train  bus  buses  journey  tube  data  xml 
august 2009 by jonty
Socrata | Making Data Social
Discover useful, unique and unusual datasets created by the community.
data  database  statistics  government  socialnetworking  social  opendata  research 
june 2009 by jonty
indiemapper
Indiemapper is the smarter, easier, more elegant way to make thematic maps from digital data.

We're building indiemapper to bring traditional cartography into the 21st century. It's platform independent, location independent and huge-software-budget independent.

Indiemapper closes the gap between data and map by taking a visual approach to map-making. See your data. Make your map. For the first time ever, it's just that simple.
maps  cartography  gis  data  geo  map  web 
may 2009 by jonty
Scalable Bloom Filters
Bloom Filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability. Scalable Bloom Filters are a variant of Bloom Filters that can adapt dynamically to the number of elements stored, while assuring a maximum false positive probability.
erlang  algorithm  algorithms  data  papers  datastructures  paper  bloomfilter 
may 2009 by jonty
www.parliament.uk | Bills before Parliament
Feeds of the current bills before the uk parliament, with archives back to 2002.
uk  parliament  bills  government  law  legislation  public  rss  feed  data  state 
may 2009 by jonty
Judy Arrays
A Judy tree is generally faster than and uses less memory than contemporary forms of trees such as binary (AVL) trees, b-trees, and skip-lists. When used in the "Judy Scalable Hashing" configuration, Judy is generally faster then a hashing method at all populations.
programming  algorithm  datastructures  algorithms  data  hashing  hash  tree  array 
february 2009 by jonty
Adjacent Stations
Tube stations it's quicker to walk between
london  data  underground  map  tube  stations 
january 2009 by jonty
Departure boards | Transport for London
Live TFL departure info, sadly not available for all lines, or via an API. Sort it awwwwt.
underground  tube  transport  services  london  tfl  travel  data  hacking 
december 2008 by jonty
Magic/Replace - Data Cleanup for Everyone
Really clever mass editing tool for tabular data. The video is an excellent demo.
tools  excel  data  clean  csv  xls  replace  text  textediting  editor 
november 2008 by jonty
PhotoRec - CGSecurity
Photo recovery software for damaged media
recovery  photo  photography  tools  flash  data  backup 
march 2008 by jonty

related tags

accelerometer  ai  algorithm  algorithms  analysis  android:bookmarks  api  architecture  array  backup  bills  binary  bloomfilter  book  boundaries  boundary  britain  bus  buses  call  canvas  cartography  cc128  chart  charts  clean  cli  cluster  clustering  code  coding  csv  currentcost  curve  data  database  datamining  dataportability  datastructures  directed  disk  dns  document  documentation  documents  dom  domain  edge  edges  editor  ejabberd  energy  erlang  excel  exercise  export  feed  fitness  fitting  flash  flickr  force  forcedirected  format  framework  free  geo  geolocation  gis  glasto  glastonbury  gmail  google  government  gps  graph  graphics  graphs  graphviz  grep  guide  hacking  hash  hashing  health  heatmap  html  import  informationretrieval  ipc  jabber  java  javascript  journey  json  language  law  layout  learning  legislation  library  linux  logger  london  lsa  machine  machinelearning  map  mapping  maps  math  mathematics  maths  matlab  matplotlib  matricies  matrix  message  ml  modelling  monitoring  naptan  network  networks  ngram  nlp  nt  numpy  oauth  ocr  open  opendata  paper  papers  parliament  parser  parsing  partition  partitioning  pcap  pdf  pedometer  phone  photo  photography  physics  plot  plugin  postcode  postcodes  presentation  privacy  process  processing  programming  project  protobuf  protocol  protocolbuffers  public  pubsub  python  r  records  recovery  reference  region  regional  regression  replace  research  rights  rpc  rss  ruby  scaling  scatterplot  science  scipy  search  sentiment  services  similarity  sleep  social  socialnetworking  society  software  split  state  stations  statistics  stats  storage  structure  svd  svg  tcpdump  technology  text  textediting  tfidf  tfl  thrift  tld  tools  tracker  tracking  train  trains  transport  transportation  travel  tree  treemap  tube  tutorial  twitter  uk  underground  video  videos  visualisation  visualization  web  wikipedia  wireless  wireshark  wordnet  xls  xml  xmpp  yaml  zonefile 

Copy this bookmark:



description:


tags: