michaelfox + data   165

auditory-research-suite - Auditory Research Suite - Google Project Hosting
This project is designed to create a general purpose toolkit to support research on auditory perception. Although our primary goal is to better understand the musical experience, we are also interested in exploring general issues of auditory perception and cognition. To this end, we will focus on several veins of research:

Sensory integration
Rhythm/timing
The communication of emotion through sound
data  health  quantifiedself  resources  reference  audio  tests  javascript 
9 weeks ago by michaelfox
pyvttbl - Multidimensional pivot tables, data processing, statistical computation - Google Project Hosting
Pivot tables (also called contingency tables and cross tabulation tables) are a powerful means of data visualization and data summarization. When dealing with large data sets with multiple variables, or multiple datasets manually manipulating the pivot tables in WYSIWYG (what you see is what you get) spreadsheets can quickly become troublesome and error prone. In these instances it becomes preferred or even necessary to use a YAFIYGI (you ask for it you got it) model to automate all or part of the data summarization process.
There are already existing Python pivot table modules available. The ones I have found don't support multidimensional data, require Windows and Excel, or are incomplete and abandoned. They also are usually tailored towards an information technology audience as opposed to a scientific/research audience. On the other extreme are projects like PyTables. PyTables is an impressive undertaking but many datasets just aren't complex enough to justify the effort required to get data into PyTables. The pyvttbl module presented here offers a solution for datasets of "Goldilocks" complexity; too much for spreadsheets, but too little for coding custom solutions or configuring PyTables.
pivot  python  tables  data  statistics  tools 
9 weeks ago by michaelfox
hrvtracker - Logging software for heart rate data including R-R intervals - Google Project Hosting
HRV tracker is designed to log R-R intervals from an ANT-compatible heart rate strap. This application can receive data from either a serial-port ANT receiver device (e.g.: the SparkFun ANT USB stick) or the ANT USB receiver stick that ships with many ANT-compatible heart rate monitors and GPS units (e.g.: Garmin ForeRunner series).

Features:
Record from either consumer ANT receivers (e.g.: GPS or heart-rate monitors) or serial-port ANT receivers
Script-based recording for automated tracking
Record R-R intervals to text files for import in analysis applications. Log files are automatically time stamped for improved record-keeping.
HRV tracker is designed for Microsoft Windows using the .NET framework, with C# source code written in Visual Studio 2008. Mac OSX version is currently available in BETA.

For more information on how to use heart rate variability analysis to improve your training performance Peak Performance has a comprehensive article at http://www.pponline.co.uk/encyc/heart-rate-variability-analysis-how-to-improve-your-training-performance-40837
health  metrics  heart  heartrate  lifestyle  fitness  data  tracking  quantifiedself  hrv  heartratevariability  javascript 
9 weeks ago by michaelfox
HRVathlete | Heart Rate Variability (HRV) | Sports and Exercise Heart Rate Analysis
HRVathlete has been developed by the sports scientists at FitSense Australia and is based on their work in heart rate variability since 2001. During this time, they have established a reliable means of monitoring the level of fatigue in athletes, thus allowing coaches to maximise performance and reduce injury.

HRVathlete analysis has been used with Olympic and world champion cyclists and triathletes, as well as a large number of teams at an elite and professional level. 

HRVathlete is best used to objectively measure athlete fatigue/recovery and provide feedback to coaches who can fine tune training loads to maximise performance and reduce injury risk.

HRVathlete is web-based hence you can access it from virtually anywhere you have an internet connection
health  metrics  heart  heartrate  lifestyle  fitness  data  tracking  quantifiedself  hrv  heartratevariability  javascript 
9 weeks ago by michaelfox
csvkit 0.4.2 (beta) — csvkit 0.4.2 (beta) documentation
1. Getting started
1.1. Description
1.2. Following along
1.3. Getting the data
1.4. Fixing the files with sed
1.5. Piping
1.6. Output redirection
1.7. Putting it together
2. Examining the data
2.1. Cutting up the data with csvcut
2.2. Statistics on demand with csvstat
2.3. Searching for rows with csvgrep
2.4. Flipping column order with csvcut
2.5. Sorting with csvsort
2.6. Using line numbers as proxy for rank
2.7. Reading through data with csvlook and less
2.8. Saving your work
2.9. Onward to merging
3. Adding another year of data
csv  csvkit  tools  manual  shell  cli  data  documentation 
january 2012 by michaelfox
Genomera. Heal the world.
We’re crowd-sourcing health discovery by
helping anyone create group health studies.
personalinformatics  quantifiedself  health  data  tracking  self  javascript 
january 2012 by michaelfox
DIYgenomics
... crowd-sourced clinical trials and personal genome apps
personalinformatics  quantifiedself  supplements  health  data  tracking  self  javascript 
january 2012 by michaelfox
azavea/Open-Data-Catalog - GitHub
Open Data Catalog is an open-source version of the site code behind OpenDataPhilly.org, a portal that provides access to over 100 open data sets, applications, and APIs related to the Philadelphia region. Open Data Catalog is intended to display information and links to publicly available data in an easily searchable format. The code also includes options for data owners to submit data for consideration and for registered public users to nominate a type of data they would like to see openly available to the public. Written using Django, Python and PostgreSQL. — Read more
http://www.opendataphilly.org/
javascript  github  github-repo  data 
january 2012 by michaelfox
Tangle: a JavaScript library for reactive documents
Tangle is a JavaScript library for creating reactive documents. Your readers can interactively explore possibilities, play with parameters, and see the document update immediately. Tangle is super-simple and easy to learn.
library  reactive  documents  interactive  data  charting 
october 2011 by michaelfox
Scraper - Google Chrome Extension
Scraper is a simple data mining extension for Google Chrome™ that is useful for online research when you need to quickly analyze data in spreadsheet form.

To use it: highlight a part of the webpage you'd like to scrape, right-click and choose "Scrape similar...". Anything that's similar to what you highlighted will be rendered in a table ready for export, compatible with Google Docs™.

This is a work-in-progress (i.e. there are bugs), and is currently intended for intermediate to advanced users who are comfortable with XPath, though jQuery is also supported to an extent.
scraper  data  tools  browser  plugins  extensions  chrome 
june 2011 by michaelfox
Data Science Toolkit
A collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON API with command line, Python and Javascript interfaces. Available as a self-contained VM or EC2 AMI that you can deploy yourself.

It's essentially a specialized Linux distribution, with a lot of useful data software pre-installed and exposing a simple interface. For full documentation, see http://www.datasciencetoolkit.org/developerdocs.

Version 0.30 - March 20th 2011

Credits
The Data Science Toolkit was assembled by Pete Warden and the source code is available at http://github.com/petewarden/dstk

Country boundaries by Thematic Mapping.

Contains Ordnance Survey data © Crown copyright and database right 2010.

Irish boundaries by Ben Raue.

New Zealand boundaries from Statistics NZ.

Worldwide states and provinces from Natural Earth.

US neighborhood boundaries provided by Zillow under a CC-SLA license.

This product includes GeoLite data created by MaxMind, available from http://www.maxmind.com/.

The OpenStreetMap and PostGIS projects have also provided some fantastic tools.

Using geocoding code from Schuyler Erle.

Uses the Ocropus project for OCR on images, and catdoc for parsing pre-XML Word and Excel documents

Uses the Hpricot library for parsing HTML.

The Boilerpipe library is used to recognize and extract the main story text from documents.

Uses my Ruby port of Eamon Daly and Jon Orwant's original GenderFromName Perl module to classify first names.
api  toolkit  tools  data  javascript 
march 2011 by michaelfox
Free, Public Data Sets | jacquesmattheij.com
I'm a data junkie, I have to confess to that.

Hi, my name is Jacques and I have a problem. Whenever I see a large chunk of structured
(or even unstructured!) data pass by I just have to have a copy. It's not that I'm a
packrat, it's just that large gobs of data are always inspirational in some way or other.

What could you do with that data, in what new ways could you slice and dice it to get
new insights, in what ways can you combine it with data that you already have to
enable you to do new things. This is where in my opinion the real value of these
datasets lies, the sum is much bigger than the parts.

Because of that I keep a list of public data sets handy, and a bunch of harddrives
full with interesting stuff that I've collected over the years.

Some of it is at first glance boring, on second thought fascinating, other times
it's the other way around. Whatever comes out of it you'll always learn and there
are always little surprises.

Here are some pointers if you want to start your own collection, it's nothing but
a starting point but be warned, before you know it you'll be drowning in data and
ideas on what you could do with it, you'll get pulled in before you know it ;)

Google has a number of public data sets. These are fairly US centric but
contain lots of interesting information:

http://www.google.com/publicdata/directory

Amazon has a few datasets, the annotated human genome and other bioinformatics
data, some US census databases and a dump of freebase. If you're already on
EC2 you should have easy access to this data.

http://aws.amazon.com/publicdatasets/

The project Gutenberg dvd ISO file:

http://www.gutenberg.org/cdproject/pgdvd042010.iso.torrent

This contains all the books in the collection, they're older books (out of
copyright) so you won't be training your spam filter on them but it is a vast
corpus of written text.

All of WikiPedia can be downloaded in bulk:

http://dumps.wikimedia.org/

This contains articles, metadata and so on.

MaxMind has a pretty useful GeoIp database available for download:

http://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip

MusicBrainz has a huge database of information with music related data:

http://musicbrainz.org/doc/Database_Download

It's in the postgres 'copy to' format so you'll need postgres to use it,
of course once you have the data imported in postgres you can convert
it to any format that you want.

America Online released a set of files containing millions of search queries in
2005. It's hard to find copies of this but even though it's dated it is still
quite a goldmine if you're trying to gain insight in how people search and
what they search for. Technically speaking that's not a 'free, public dataset'
because AOL has done a lot of work to try to put the genie back in to the
bottle but 5 minutes of googling will turn up a few copies.

There is a huge collection of links to datasets here:

http://www.datawrangling.com/some-datasets-available-on-the-web

Not all of them work but most do.

Another page with some links to datasets:

http://www.kdnuggets.com/datasets/

http://www.guardian.co.uk/uk/datablog/2011/feb/02/ukcrime-data-store

http://www.guardian.co.uk/data

An interesting side effect of writing this is that there is now a *much*
better place to look for even more data: http://news.ycombinator.com/item?id=2165497

if you want to stay up-to-date about stuff like this follow me on twitter: http://www.twitter.com/jmattheij
data  dataset  collection  reference  resources 
february 2011 by michaelfox
« earlier      

related tags

*download  *todo  activerecord  address  admin  ajax  algorithms  amazon  analysis  analytics  antipatterns  api  apis  app  apple  applescript  architecture  array  art  atom  audio  automatedtesting  automation  aws  baby  backend  backup  benchmark  bestpractices  bigdata  blog  bookmarking  bookmarklets  breadcrumbs  browser  c  cache  caching  canonical  canvas  chart  charting  charts  cheatsheet  checkout  chrome  class  cli  cocoa  code  codegenerator  codeigniter  cognitive  coherent  collaborative  collection  community  compare  comparison  conference  convert  copy  core  coredata  countries  course  crawler  css  csv  csvkit  culture  curl  dashboard  data  database  datagrid  datamining  datarepresentation  dataset  datasets  datasource  datastructures  datavisualization  db  debug  demo  design  designpatterns  development  dhtml  diagram  diff  display  documentation  documents  dojo  DOM  download  dynamodb  earth  ecommerce  economics  encoding  english  entertainment  events  example  extensions  feed  filter  firefox  fitness  flash  flatfile  flex  flexigrid  folksonomies  foobar  forms  framework  free  fulltext  generator  geo  geography  git  github  github-repo  google  gps  graph  graphics  graphing  graphs  greasemonkey  grid  grids  hci  head  header  headers  health  heart  heartrate  heartratevariability  heuristics  howto  href  hrv  html  html5  http  indexeddb  indexing  info  infographics  information  injection  inline  inspiration  interactive  interface  ios  iphone  ipsum  iterator  javascript  journalism  jqgrid  jquery  json  kml  language  languageagnostic  layout  learning  library  lifelog  lifestyle  link  links  list  location  logging  lookup  lorem  machinelearning  manual  map  maps  markdown  markup  mashup  medical  meta  metadata  metric  metrics  microdata  microformats  microsummaries  microsummary  mobile  model  models  mongodb  mood  mootools  music  muxtape  mysql  nike  node  node.js  normalization  nosql  objective-c  opensource  osx  parenting  parse  parsing  patterns  pear  percentage  performance  personalinformatics  php  pie  pivot  placeholder  plist  plugin  plugins  podcasts  politics  presentations  processing  productivity  programming  projects  prototype  python  quantifiedself  rdf  rdfa  reactive  reader  reference  rel  relation  relational  releation  repo  report  reporting  representation  research  resources  responsive  rest  rev  row  rss  ruleofthumb  rwd  saas  sample  scrape  scraper  screencast  scripting  scripts  search  security  self  selfhacking  sem  semantic  semantics  semanticweb  seo  service  set  sheets  shell  showcase  sleep  snippets  social  sorting  space  sphinx  spl  spreadsheet  sprite  sproutcore  sql  sqlite  stackoverflow  standard  standardization  state  statistics  stopwords  storage  string  supplements  syndication  table  tables  tagging  talks  test  testing  tests  text  tips  toolbar  toolkit  tools  tracking  traverse  treatments  tree  tutorial  ui  unix  uri  url  usability  utilities  utility  utils  ux  validate  via:smashingmagazine.com  visualization  w3c  web  web2.0  webapp  webdesign  webdev  webrest  websockets  webstandards  words  world  xcode  xhtml  xml  xpath  xsl  xslt  xsltproc  yahoo  yaml  youtube  yql 

Copy this bookmark:



description:


tags: