LOCFIT - Local Regression and Likelihood
july 2011 by jonty
"LOCFIT is a software system for fitting curves and surfaces to data, using the local regression and likelihood methods. The code is mostly written in C; an interface is provided that enables LOCFIT to be used as an S-Plus or R library."
r
statistics
plugin
curve
fitting
graph
data
regression
stats
from delicious
july 2011 by jonty
The MessagePack Project
june 2011 by jonty
"MessagePack is a binary-based efficient object serialization library. It enables to exchange structured objects between many languages like JSON. But unlike JSON, it is very fast and small." - Also includes an RPC protocol and implementations.
thrift
protocolbuffers
protobuf
network
rpc
api
ipc
data
message
protocol
nt
from delicious
june 2011 by jonty
mlpy - Machine Learning PYthon - Predictive Modeling - Classification and Regression
april 2011 by jonty
"It provides high level procedures that support, with few lines of code, the design of rich Data Analysis Protocols (DAPs) for preprocessing, clustering, predictive classification, regression and feature selection. Methods are available for feature weighting and ranking, data resampling, error evaluation and experiment landscaping."
python
machinelearning
ai
numpy
ml
library
processing
data
clustering
cluster
from delicious
april 2011 by jonty
d3.js
march 2011 by jonty
"D3 allows you to bind arbitrary data to a Document Object Model (DOM), and then apply data-driven transformations to the document. As a trivial example, you can use D3 to generate a basic HTML table from an array of numbers. Or, use the same data to create an interactive SVG bar chart with smooth transitions and interaction."
javascript
visualisation
data
framework
svg
force
directed
graph
graphs
canvas
from delicious
march 2011 by jonty
ngrep - network grep
march 2011 by jonty
"ngrep strives to provide most of GNU grep's common features, applying them to the network layer. ngrep is a pcap-aware tool that will allow you to specify extended regular or hexadecimal expressions to match against data payloads of packets."
network
grep
monitoring
wireshark
pcap
tcpdump
data
from delicious
march 2011 by jonty
Pattern
february 2011 by jonty
"Pattern is a web mining module for the Python programming language. It bundles tools for data retrieval (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), text analysis (rule-based shallow parser, WordNet interface, syntactical + semantical n-gram search algorithm, tf-idf + cosine similarity + LSA metrics) and data visualization (graph networks)."
python
datamining
nlp
web
data
parsing
text
language
twitter
google
wikipedia
sentiment
analysis
flickr
lsa
wordnet
ngram
html
dom
parser
graph
visualisation
from delicious
february 2011 by jonty
Singular Value Decomposition (SVD) Tutorial
january 2011 by jonty
"When you browse standard web sources like Singular Value Decomposition (SVD) on Wikipedia, you find many equations, but not an intuitive explanation of what it is or how it works. Singular Value Decomposition is a way of factoring matrices into a series of linear approximations that expose the underlying structure of the matrix."
svd
matrix
matricies
lsa
statistics
data
ai
from delicious
january 2011 by jonty
construct
january 2011 by jonty
"Construct is a python library for parsing and building of data structures (binary or textual). It is based on the concept of defining data structures in a declarative manner, rather than procedural code: more complex constructs are composed of a hierarchy of simpler ones. It's the first library that makes parsing fun, instead of the usual headache it is today."
python
parser
parsing
binary
datastructures
data
structure
from delicious
january 2011 by jonty
Doc⚡split
december 2010 by jonty
"Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text via OCR if necessary, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages...)"
ruby
pdf
document
parsing
ocr
documents
data
processing
split
from delicious
december 2010 by jonty
Network&Society - Borderline
december 2010 by jonty
"Redrawing the map of Great Britain from a network of human interactions." - Utilises an anonymised set of several million phone call records to partition the UK based on the geography of the call participants, with surprising and beautiful results.
map
mapping
uk
britain
network
phone
records
data
visualisation
partitioning
boundary
boundaries
call
society
clustering
region
regional
december 2010 by jonty
The Infinite Monkeywrench
december 2010 by jonty
"... is a collection of tools to download, clean, process, and package datasets from a variety of sources (HTML, RSS, XML, CSV, &c) into a variety of formats (XML, CSV, Excel, JSON, SQL, YAML, &c). Interacting with IMW is as simple as creating a YAML file which describes the workflow involved in processing the data and feeding it to the imw command line program."
data
ruby
processing
process
parsing
csv
yaml
xml
json
rss
html
format
parser
december 2010 by jonty
bitly's data_hacks at master - GitHub
october 2010 by jonty
Command line utilities for data analysis
data
python
analysis
cli
datamining
processing
graph
october 2010 by jonty
scikits.learn: machine learning in python
october 2010 by jonty
"scikits.learn is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib). It aims to provide simple and efficient solutions to learning problems that are accessible to everybody and reusable in various contexts: machine-learning as a versatile tool for science and engineering."
python
machinelearning
library
ai
learning
scipy
numpy
matplotlib
matlab
machine
ml
data
processing
algorithms
october 2010 by jonty
Tracker Video Analysis and Modeling Tool
october 2010 by jonty
"Tracker is a free video analysis and modeling tool built on the Open Source Physics (OSP) Java framework." - Supports object tracking, autotracking and video modeling. Has a complex data analysis tool with automatic and manual curve fitting. Bring on the next youtube hoax, I want to analyse the hell out of it.
physics
video
science
analysis
tracker
videos
processing
data
modelling
october 2010 by jonty
Zero Intelligence Agents » The Value of Edges in Complex Network Visualization
april 2010 by jonty
"In my experience, except for the sparsest of network data, edges adds very little information to the visualization. In fact, edges often detract from the analytical value of a network plot by creating a confusing weave of lines that are impossible to follow or understand. I propose that the value of drawing edges is actually an asymptotic function of the density of the network data in question. I even made a picture."
graphs
layout
network
networks
visualisation
data
graph
edges
edge
forcedirected
april 2010 by jonty
An Easy Way to Make a Treemap | FlowingData
march 2010 by jonty
Back in 1990, Ben Shneiderman, of the University of Maryland, wanted to visualize what was going on in his always-full hard drive. He wanted to know what was taking up so much space. Given the hierarchical structure of directories and files, he first tried a tree diagram. It got too big too fast to be useful though. Too many nodes. Too many branches.
The treemap was his solution. It's an area-based visualization where the size of each rectangle represents a metric since made popular by Martin Wattenberg's Map of the Market and Marcos Weskamp's newsmap.
visualisation
r
statistics
treemap
data
graphics
tutorial
mathematics
maths
graph
graphs
The treemap was his solution. It's an area-based visualization where the size of each rectangle represents a metric since made popular by Martin Wattenberg's Map of the Market and Marcos Weskamp's newsmap.
march 2010 by jonty
How to: make a scatterplot with a smooth fitted line | FlowingData
march 2010 by jonty
Oftentimes, you'll want to fit a line to a bunch of data points to make it easier to spot patterns or relationships. It might be observations over time or it might be two variables that are possibly related. In either case, a scatter plot just might not be enough to see anything useful. This tutorial will show you how to graph a fitted line, or loess curve, to such a scatter plot.
r
statistics
data
scatterplot
graph
stats
tutorial
mathematics
maths
visualisation
plot
march 2010 by jonty
How to Make a Heatmap – a Quick and Easy Solution | FlowingData
january 2010 by jonty
How do you make a heatmap? This came from kerimcan in the FlowingData forums, and krees followed up with a couple of good links on how to do them in R. It really is super easy. Here's how to make a heatmap with just a few lines of code.
heatmap
visualisation
r
data
graphics
charts
chart
graph
graphs
statistics
january 2010 by jonty
Current Cost CC128 Display Unit
december 2009 by jonty
XML OUTPUT DESCRIPTION - v0.11
xml
documentation
currentcost
cc128
energy
data
december 2009 by jonty
Orange - Data Mining Fruitful & Fun
december 2009 by jonty
Open source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Extensions for bioinformatics and text mining. Comprehensive, flexible and fast.
python
datamining
software
programming
statistics
ai
data
december 2009 by jonty
Choosing a chart type
november 2009 by jonty
A flowchart to help you decide how best to represent data.
visualisation
charts
presentation
reference
chart
data
graphics
statistics
november 2009 by jonty
Data Mining with R: learning by case studies
november 2009 by jonty
The main goal of this book is to introduce the reader to the use of R as a tool for performing data mining. R is a freely downloadable language and environment for statistical computing and graphics. Its capabilities and the large set of available packages turn this tool into an excellent alternative to the existing (and expensive!) data mining tools.
r
statistics
datamining
data
processing
reference
book
stats
november 2009 by jonty
Fitbit
september 2009 by jonty
Fitbit is an accelerometer based activity tracker.
technology
tracking
wireless
sleep
exercise
fitness
health
pedometer
accelerometer
data
logger
september 2009 by jonty
The Data Liberation Front (the Data Liberation Front)
september 2009 by jonty
We intend for this site to be a central location for information on how to move your data in and out of Google products.
google
data
privacy
dataportability
open
rights
export
import
gmail
september 2009 by jonty
NaPTAN - Wikipedia, the free encyclopedia
august 2009 by jonty
The National Public Transport Access Node (NaPTAN) database is a UK nationwide system for uniquely identifying all the points of access to public transport in the UK.
Every UK railway station, coach terminus, airport, ferry terminal, bus stop, taxi rank or other place where public transport can be joined or left is allocated a unique NaPTAN identifier. The relationship of the stop to a City, Town, Village or other locality can be indicated through an association with elements of the National Public Transport Gazetteer.
naptan
transport
transportation
trains
train
bus
buses
journey
tube
data
xml
Every UK railway station, coach terminus, airport, ferry terminal, bus stop, taxi rank or other place where public transport can be joined or left is allocated a unique NaPTAN identifier. The relationship of the stop to a City, Town, Village or other locality can be indicated through an association with elements of the National Public Transport Gazetteer.
august 2009 by jonty
Socrata | Making Data Social
june 2009 by jonty
Discover useful, unique and unusual datasets created by the community.
data
database
statistics
government
socialnetworking
social
opendata
research
june 2009 by jonty
indiemapper
may 2009 by jonty
Indiemapper is the smarter, easier, more elegant way to make thematic maps from digital data.
We're building indiemapper to bring traditional cartography into the 21st century. It's platform independent, location independent and huge-software-budget independent.
Indiemapper closes the gap between data and map by taking a visual approach to map-making. See your data. Make your map. For the first time ever, it's just that simple.
maps
cartography
gis
data
geo
map
web
We're building indiemapper to bring traditional cartography into the 21st century. It's platform independent, location independent and huge-software-budget independent.
Indiemapper closes the gap between data and map by taking a visual approach to map-making. See your data. Make your map. For the first time ever, it's just that simple.
may 2009 by jonty
Scalable Bloom Filters
may 2009 by jonty
Bloom Filters provide space-efficient storage of sets at the cost of a probability of false positives on membership queries. The size of the filter must be defined a priori based on the number of elements to store and the desired false positive probability. Scalable Bloom Filters are a variant of Bloom Filters that can adapt dynamically to the number of elements stored, while assuring a maximum false positive probability.
erlang
algorithm
algorithms
data
papers
datastructures
paper
bloomfilter
may 2009 by jonty
www.parliament.uk | Bills before Parliament
may 2009 by jonty
Feeds of the current bills before the uk parliament, with archives back to 2002.
uk
parliament
bills
government
law
legislation
public
rss
feed
data
state
may 2009 by jonty
Judy Arrays
february 2009 by jonty
A Judy tree is generally faster than and uses less memory than contemporary forms of trees such as binary (AVL) trees, b-trees, and skip-lists. When used in the "Judy Scalable Hashing" configuration, Judy is generally faster then a hashing method at all populations.
programming
algorithm
datastructures
algorithms
data
hashing
hash
tree
array
february 2009 by jonty
Adjacent Stations
january 2009 by jonty
Tube stations it's quicker to walk between
london
data
underground
map
tube
stations
january 2009 by jonty
Departure boards | Transport for London
december 2008 by jonty
Live TFL departure info, sadly not available for all lines, or via an API. Sort it awwwwt.
underground
tube
transport
services
london
tfl
travel
data
hacking
december 2008 by jonty
Magic/Replace - Data Cleanup for Everyone
november 2008 by jonty
Really clever mass editing tool for tabular data. The video is an excellent demo.
tools
excel
data
clean
csv
xls
replace
text
textediting
editor
november 2008 by jonty
Free The Postcode!
october 2008 by jonty
A project to collect postcode data.
reference
maps
gps
geolocation
free
database
data
postcodes
postcode
open
project
october 2008 by jonty
PhotoRec - CGSecurity
march 2008 by jonty
Photo recovery software for damaged media
recovery
photo
photography
tools
flash
data
backup
march 2008 by jonty
related tags
accelerometer ⊕ ai ⊕ algorithm ⊕ algorithms ⊕ analysis ⊕ android:bookmarks ⊕ api ⊕ architecture ⊕ array ⊕ backup ⊕ bills ⊕ binary ⊕ bloomfilter ⊕ book ⊕ boundaries ⊕ boundary ⊕ britain ⊕ bus ⊕ buses ⊕ call ⊕ canvas ⊕ cartography ⊕ cc128 ⊕ chart ⊕ charts ⊕ clean ⊕ cli ⊕ cluster ⊕ clustering ⊕ code ⊕ coding ⊕ csv ⊕ currentcost ⊕ curve ⊕ data ⊖ database ⊕ datamining ⊕ dataportability ⊕ datastructures ⊕ directed ⊕ disk ⊕ dns ⊕ document ⊕ documentation ⊕ documents ⊕ dom ⊕ domain ⊕ edge ⊕ edges ⊕ editor ⊕ ejabberd ⊕ energy ⊕ erlang ⊕ excel ⊕ exercise ⊕ export ⊕ feed ⊕ fitness ⊕ fitting ⊕ flash ⊕ flickr ⊕ force ⊕ forcedirected ⊕ format ⊕ framework ⊕ free ⊕ geo ⊕ geolocation ⊕ gis ⊕ glasto ⊕ glastonbury ⊕ gmail ⊕ google ⊕ government ⊕ gps ⊕ graph ⊕ graphics ⊕ graphs ⊕ graphviz ⊕ grep ⊕ guide ⊕ hacking ⊕ hash ⊕ hashing ⊕ health ⊕ heatmap ⊕ html ⊕ import ⊕ informationretrieval ⊕ ipc ⊕ jabber ⊕ java ⊕ javascript ⊕ journey ⊕ json ⊕ language ⊕ law ⊕ layout ⊕ learning ⊕ legislation ⊕ library ⊕ linux ⊕ logger ⊕ london ⊕ lsa ⊕ machine ⊕ machinelearning ⊕ map ⊕ mapping ⊕ maps ⊕ math ⊕ mathematics ⊕ maths ⊕ matlab ⊕ matplotlib ⊕ matricies ⊕ matrix ⊕ message ⊕ ml ⊕ modelling ⊕ monitoring ⊕ naptan ⊕ network ⊕ networks ⊕ ngram ⊕ nlp ⊕ nt ⊕ numpy ⊕ oauth ⊕ ocr ⊕ open ⊕ opendata ⊕ paper ⊕ papers ⊕ parliament ⊕ parser ⊕ parsing ⊕ partition ⊕ partitioning ⊕ pcap ⊕ pdf ⊕ pedometer ⊕ phone ⊕ photo ⊕ photography ⊕ physics ⊕ plot ⊕ plugin ⊕ postcode ⊕ postcodes ⊕ presentation ⊕ privacy ⊕ process ⊕ processing ⊕ programming ⊕ project ⊕ protobuf ⊕ protocol ⊕ protocolbuffers ⊕ public ⊕ pubsub ⊕ python ⊕ r ⊕ records ⊕ recovery ⊕ reference ⊕ region ⊕ regional ⊕ regression ⊕ replace ⊕ research ⊕ rights ⊕ rpc ⊕ rss ⊕ ruby ⊕ scaling ⊕ scatterplot ⊕ science ⊕ scipy ⊕ search ⊕ sentiment ⊕ services ⊕ similarity ⊕ sleep ⊕ social ⊕ socialnetworking ⊕ society ⊕ software ⊕ split ⊕ state ⊕ stations ⊕ statistics ⊕ stats ⊕ storage ⊕ structure ⊕ svd ⊕ svg ⊕ tcpdump ⊕ technology ⊕ text ⊕ textediting ⊕ tfidf ⊕ tfl ⊕ thrift ⊕ tld ⊕ tools ⊕ tracker ⊕ tracking ⊕ train ⊕ trains ⊕ transport ⊕ transportation ⊕ travel ⊕ tree ⊕ treemap ⊕ tube ⊕ tutorial ⊕ twitter ⊕ uk ⊕ underground ⊕ video ⊕ videos ⊕ visualisation ⊕ visualization ⊕ web ⊕ wikipedia ⊕ wireless ⊕ wireshark ⊕ wordnet ⊕ xls ⊕ xml ⊕ xmpp ⊕ yaml ⊕ zonefile ⊕Copy this bookmark: