michaelfox + data 165
auditory-research-suite - Auditory Research Suite - Google Project Hosting
9 weeks ago by michaelfox
This project is designed to create a general purpose toolkit to support research on auditory perception. Although our primary goal is to better understand the musical experience, we are also interested in exploring general issues of auditory perception and cognition. To this end, we will focus on several veins of research:
Sensory integration
Rhythm/timing
The communication of emotion through sound
data
health
quantifiedself
resources
reference
audio
tests
javascript
Sensory integration
Rhythm/timing
The communication of emotion through sound
9 weeks ago by michaelfox
pyvttbl - Multidimensional pivot tables, data processing, statistical computation - Google Project Hosting
9 weeks ago by michaelfox
Pivot tables (also called contingency tables and cross tabulation tables) are a powerful means of data visualization and data summarization. When dealing with large data sets with multiple variables, or multiple datasets manually manipulating the pivot tables in WYSIWYG (what you see is what you get) spreadsheets can quickly become troublesome and error prone. In these instances it becomes preferred or even necessary to use a YAFIYGI (you ask for it you got it) model to automate all or part of the data summarization process.
There are already existing Python pivot table modules available. The ones I have found don't support multidimensional data, require Windows and Excel, or are incomplete and abandoned. They also are usually tailored towards an information technology audience as opposed to a scientific/research audience. On the other extreme are projects like PyTables. PyTables is an impressive undertaking but many datasets just aren't complex enough to justify the effort required to get data into PyTables. The pyvttbl module presented here offers a solution for datasets of "Goldilocks" complexity; too much for spreadsheets, but too little for coding custom solutions or configuring PyTables.
pivot
python
tables
data
statistics
tools
There are already existing Python pivot table modules available. The ones I have found don't support multidimensional data, require Windows and Excel, or are incomplete and abandoned. They also are usually tailored towards an information technology audience as opposed to a scientific/research audience. On the other extreme are projects like PyTables. PyTables is an impressive undertaking but many datasets just aren't complex enough to justify the effort required to get data into PyTables. The pyvttbl module presented here offers a solution for datasets of "Goldilocks" complexity; too much for spreadsheets, but too little for coding custom solutions or configuring PyTables.
9 weeks ago by michaelfox
hrvtracker - Logging software for heart rate data including R-R intervals - Google Project Hosting
9 weeks ago by michaelfox
HRV tracker is designed to log R-R intervals from an ANT-compatible heart rate strap. This application can receive data from either a serial-port ANT receiver device (e.g.: the SparkFun ANT USB stick) or the ANT USB receiver stick that ships with many ANT-compatible heart rate monitors and GPS units (e.g.: Garmin ForeRunner series).
Features:
Record from either consumer ANT receivers (e.g.: GPS or heart-rate monitors) or serial-port ANT receivers
Script-based recording for automated tracking
Record R-R intervals to text files for import in analysis applications. Log files are automatically time stamped for improved record-keeping.
HRV tracker is designed for Microsoft Windows using the .NET framework, with C# source code written in Visual Studio 2008. Mac OSX version is currently available in BETA.
For more information on how to use heart rate variability analysis to improve your training performance Peak Performance has a comprehensive article at http://www.pponline.co.uk/encyc/heart-rate-variability-analysis-how-to-improve-your-training-performance-40837
health
metrics
heart
heartrate
lifestyle
fitness
data
tracking
quantifiedself
hrv
heartratevariability
javascript
Features:
Record from either consumer ANT receivers (e.g.: GPS or heart-rate monitors) or serial-port ANT receivers
Script-based recording for automated tracking
Record R-R intervals to text files for import in analysis applications. Log files are automatically time stamped for improved record-keeping.
HRV tracker is designed for Microsoft Windows using the .NET framework, with C# source code written in Visual Studio 2008. Mac OSX version is currently available in BETA.
For more information on how to use heart rate variability analysis to improve your training performance Peak Performance has a comprehensive article at http://www.pponline.co.uk/encyc/heart-rate-variability-analysis-how-to-improve-your-training-performance-40837
9 weeks ago by michaelfox
HRVathlete | Heart Rate Variability (HRV) | Sports and Exercise Heart Rate Analysis
9 weeks ago by michaelfox
HRVathlete has been developed by the sports scientists at FitSense Australia and is based on their work in heart rate variability since 2001. During this time, they have established a reliable means of monitoring the level of fatigue in athletes, thus allowing coaches to maximise performance and reduce injury.
HRVathlete analysis has been used with Olympic and world champion cyclists and triathletes, as well as a large number of teams at an elite and professional level.
HRVathlete is best used to objectively measure athlete fatigue/recovery and provide feedback to coaches who can fine tune training loads to maximise performance and reduce injury risk.
HRVathlete is web-based hence you can access it from virtually anywhere you have an internet connection
health
metrics
heart
heartrate
lifestyle
fitness
data
tracking
quantifiedself
hrv
heartratevariability
javascript
HRVathlete analysis has been used with Olympic and world champion cyclists and triathletes, as well as a large number of teams at an elite and professional level.
HRVathlete is best used to objectively measure athlete fatigue/recovery and provide feedback to coaches who can fine tune training loads to maximise performance and reduce injury risk.
HRVathlete is web-based hence you can access it from virtually anywhere you have an internet connection
9 weeks ago by michaelfox
csvkit 0.4.2 (beta) — csvkit 0.4.2 (beta) documentation
january 2012 by michaelfox
1. Getting started
1.1. Description
1.2. Following along
1.3. Getting the data
1.4. Fixing the files with sed
1.5. Piping
1.6. Output redirection
1.7. Putting it together
2. Examining the data
2.1. Cutting up the data with csvcut
2.2. Statistics on demand with csvstat
2.3. Searching for rows with csvgrep
2.4. Flipping column order with csvcut
2.5. Sorting with csvsort
2.6. Using line numbers as proxy for rank
2.7. Reading through data with csvlook and less
2.8. Saving your work
2.9. Onward to merging
3. Adding another year of data
csv
csvkit
tools
manual
shell
cli
data
documentation
1.1. Description
1.2. Following along
1.3. Getting the data
1.4. Fixing the files with sed
1.5. Piping
1.6. Output redirection
1.7. Putting it together
2. Examining the data
2.1. Cutting up the data with csvcut
2.2. Statistics on demand with csvstat
2.3. Searching for rows with csvgrep
2.4. Flipping column order with csvcut
2.5. Sorting with csvsort
2.6. Using line numbers as proxy for rank
2.7. Reading through data with csvlook and less
2.8. Saving your work
2.9. Onward to merging
3. Adding another year of data
january 2012 by michaelfox
Genomera. Heal the world.
january 2012 by michaelfox
We’re crowd-sourcing health discovery by
helping anyone create group health studies.
personalinformatics
quantifiedself
health
data
tracking
self
javascript
helping anyone create group health studies.
january 2012 by michaelfox
DIYgenomics
january 2012 by michaelfox
... crowd-sourced clinical trials and personal genome apps
personalinformatics
quantifiedself
supplements
health
data
tracking
self
javascript
january 2012 by michaelfox
azavea/Open-Data-Catalog - GitHub
january 2012 by michaelfox
Open Data Catalog is an open-source version of the site code behind OpenDataPhilly.org, a portal that provides access to over 100 open data sets, applications, and APIs related to the Philadelphia region. Open Data Catalog is intended to display information and links to publicly available data in an easily searchable format. The code also includes options for data owners to submit data for consideration and for registered public users to nominate a type of data they would like to see openly available to the public. Written using Django, Python and PostgreSQL. — Read more
http://www.opendataphilly.org/
javascript
github
github-repo
data
http://www.opendataphilly.org/
january 2012 by michaelfox
Tangle: a JavaScript library for reactive documents
october 2011 by michaelfox
Tangle is a JavaScript library for creating reactive documents. Your readers can interactively explore possibilities, play with parameters, and see the document update immediately. Tangle is super-simple and easy to learn.
library
reactive
documents
interactive
data
charting
october 2011 by michaelfox
Scraper - Google Chrome Extension
june 2011 by michaelfox
Scraper is a simple data mining extension for Google Chrome™ that is useful for online research when you need to quickly analyze data in spreadsheet form.
To use it: highlight a part of the webpage you'd like to scrape, right-click and choose "Scrape similar...". Anything that's similar to what you highlighted will be rendered in a table ready for export, compatible with Google Docs™.
This is a work-in-progress (i.e. there are bugs), and is currently intended for intermediate to advanced users who are comfortable with XPath, though jQuery is also supported to an extent.
scraper
data
tools
browser
plugins
extensions
chrome
To use it: highlight a part of the webpage you'd like to scrape, right-click and choose "Scrape similar...". Anything that's similar to what you highlighted will be rendered in a table ready for export, compatible with Google Docs™.
This is a work-in-progress (i.e. there are bugs), and is currently intended for intermediate to advanced users who are comfortable with XPath, though jQuery is also supported to an extent.
june 2011 by michaelfox
Data Science Toolkit
march 2011 by michaelfox
A collection of the best open data sets and open-source tools for data science, wrapped in an easy-to-use REST/JSON API with command line, Python and Javascript interfaces. Available as a self-contained VM or EC2 AMI that you can deploy yourself.
It's essentially a specialized Linux distribution, with a lot of useful data software pre-installed and exposing a simple interface. For full documentation, see http://www.datasciencetoolkit.org/developerdocs.
Version 0.30 - March 20th 2011
Credits
The Data Science Toolkit was assembled by Pete Warden and the source code is available at http://github.com/petewarden/dstk
Country boundaries by Thematic Mapping.
Contains Ordnance Survey data © Crown copyright and database right 2010.
Irish boundaries by Ben Raue.
New Zealand boundaries from Statistics NZ.
Worldwide states and provinces from Natural Earth.
US neighborhood boundaries provided by Zillow under a CC-SLA license.
This product includes GeoLite data created by MaxMind, available from http://www.maxmind.com/.
The OpenStreetMap and PostGIS projects have also provided some fantastic tools.
Using geocoding code from Schuyler Erle.
Uses the Ocropus project for OCR on images, and catdoc for parsing pre-XML Word and Excel documents
Uses the Hpricot library for parsing HTML.
The Boilerpipe library is used to recognize and extract the main story text from documents.
Uses my Ruby port of Eamon Daly and Jon Orwant's original GenderFromName Perl module to classify first names.
api
toolkit
tools
data
javascript
It's essentially a specialized Linux distribution, with a lot of useful data software pre-installed and exposing a simple interface. For full documentation, see http://www.datasciencetoolkit.org/developerdocs.
Version 0.30 - March 20th 2011
Credits
The Data Science Toolkit was assembled by Pete Warden and the source code is available at http://github.com/petewarden/dstk
Country boundaries by Thematic Mapping.
Contains Ordnance Survey data © Crown copyright and database right 2010.
Irish boundaries by Ben Raue.
New Zealand boundaries from Statistics NZ.
Worldwide states and provinces from Natural Earth.
US neighborhood boundaries provided by Zillow under a CC-SLA license.
This product includes GeoLite data created by MaxMind, available from http://www.maxmind.com/.
The OpenStreetMap and PostGIS projects have also provided some fantastic tools.
Using geocoding code from Schuyler Erle.
Uses the Ocropus project for OCR on images, and catdoc for parsing pre-XML Word and Excel documents
Uses the Hpricot library for parsing HTML.
The Boilerpipe library is used to recognize and extract the main story text from documents.
Uses my Ruby port of Eamon Daly and Jon Orwant's original GenderFromName Perl module to classify first names.
march 2011 by michaelfox
Free, Public Data Sets | jacquesmattheij.com
february 2011 by michaelfox
I'm a data junkie, I have to confess to that.
Hi, my name is Jacques and I have a problem. Whenever I see a large chunk of structured
(or even unstructured!) data pass by I just have to have a copy. It's not that I'm a
packrat, it's just that large gobs of data are always inspirational in some way or other.
What could you do with that data, in what new ways could you slice and dice it to get
new insights, in what ways can you combine it with data that you already have to
enable you to do new things. This is where in my opinion the real value of these
datasets lies, the sum is much bigger than the parts.
Because of that I keep a list of public data sets handy, and a bunch of harddrives
full with interesting stuff that I've collected over the years.
Some of it is at first glance boring, on second thought fascinating, other times
it's the other way around. Whatever comes out of it you'll always learn and there
are always little surprises.
Here are some pointers if you want to start your own collection, it's nothing but
a starting point but be warned, before you know it you'll be drowning in data and
ideas on what you could do with it, you'll get pulled in before you know it ;)
Google has a number of public data sets. These are fairly US centric but
contain lots of interesting information:
http://www.google.com/publicdata/directory
Amazon has a few datasets, the annotated human genome and other bioinformatics
data, some US census databases and a dump of freebase. If you're already on
EC2 you should have easy access to this data.
http://aws.amazon.com/publicdatasets/
The project Gutenberg dvd ISO file:
http://www.gutenberg.org/cdproject/pgdvd042010.iso.torrent
This contains all the books in the collection, they're older books (out of
copyright) so you won't be training your spam filter on them but it is a vast
corpus of written text.
All of WikiPedia can be downloaded in bulk:
http://dumps.wikimedia.org/
This contains articles, metadata and so on.
MaxMind has a pretty useful GeoIp database available for download:
http://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip
MusicBrainz has a huge database of information with music related data:
http://musicbrainz.org/doc/Database_Download
It's in the postgres 'copy to' format so you'll need postgres to use it,
of course once you have the data imported in postgres you can convert
it to any format that you want.
America Online released a set of files containing millions of search queries in
2005. It's hard to find copies of this but even though it's dated it is still
quite a goldmine if you're trying to gain insight in how people search and
what they search for. Technically speaking that's not a 'free, public dataset'
because AOL has done a lot of work to try to put the genie back in to the
bottle but 5 minutes of googling will turn up a few copies.
There is a huge collection of links to datasets here:
http://www.datawrangling.com/some-datasets-available-on-the-web
Not all of them work but most do.
Another page with some links to datasets:
http://www.kdnuggets.com/datasets/
http://www.guardian.co.uk/uk/datablog/2011/feb/02/ukcrime-data-store
http://www.guardian.co.uk/data
An interesting side effect of writing this is that there is now a *much*
better place to look for even more data: http://news.ycombinator.com/item?id=2165497
if you want to stay up-to-date about stuff like this follow me on twitter: http://www.twitter.com/jmattheij
data
dataset
collection
reference
resources
Hi, my name is Jacques and I have a problem. Whenever I see a large chunk of structured
(or even unstructured!) data pass by I just have to have a copy. It's not that I'm a
packrat, it's just that large gobs of data are always inspirational in some way or other.
What could you do with that data, in what new ways could you slice and dice it to get
new insights, in what ways can you combine it with data that you already have to
enable you to do new things. This is where in my opinion the real value of these
datasets lies, the sum is much bigger than the parts.
Because of that I keep a list of public data sets handy, and a bunch of harddrives
full with interesting stuff that I've collected over the years.
Some of it is at first glance boring, on second thought fascinating, other times
it's the other way around. Whatever comes out of it you'll always learn and there
are always little surprises.
Here are some pointers if you want to start your own collection, it's nothing but
a starting point but be warned, before you know it you'll be drowning in data and
ideas on what you could do with it, you'll get pulled in before you know it ;)
Google has a number of public data sets. These are fairly US centric but
contain lots of interesting information:
http://www.google.com/publicdata/directory
Amazon has a few datasets, the annotated human genome and other bioinformatics
data, some US census databases and a dump of freebase. If you're already on
EC2 you should have easy access to this data.
http://aws.amazon.com/publicdatasets/
The project Gutenberg dvd ISO file:
http://www.gutenberg.org/cdproject/pgdvd042010.iso.torrent
This contains all the books in the collection, they're older books (out of
copyright) so you won't be training your spam filter on them but it is a vast
corpus of written text.
All of WikiPedia can be downloaded in bulk:
http://dumps.wikimedia.org/
This contains articles, metadata and so on.
MaxMind has a pretty useful GeoIp database available for download:
http://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip
MusicBrainz has a huge database of information with music related data:
http://musicbrainz.org/doc/Database_Download
It's in the postgres 'copy to' format so you'll need postgres to use it,
of course once you have the data imported in postgres you can convert
it to any format that you want.
America Online released a set of files containing millions of search queries in
2005. It's hard to find copies of this but even though it's dated it is still
quite a goldmine if you're trying to gain insight in how people search and
what they search for. Technically speaking that's not a 'free, public dataset'
because AOL has done a lot of work to try to put the genie back in to the
bottle but 5 minutes of googling will turn up a few copies.
There is a huge collection of links to datasets here:
http://www.datawrangling.com/some-datasets-available-on-the-web
Not all of them work but most do.
Another page with some links to datasets:
http://www.kdnuggets.com/datasets/
http://www.guardian.co.uk/uk/datablog/2011/feb/02/ukcrime-data-store
http://www.guardian.co.uk/data
An interesting side effect of writing this is that there is now a *much*
better place to look for even more data: http://news.ycombinator.com/item?id=2165497
if you want to stay up-to-date about stuff like this follow me on twitter: http://www.twitter.com/jmattheij
february 2011 by michaelfox
related tags
*download ⊕ *todo ⊕ activerecord ⊕ address ⊕ admin ⊕ ajax ⊕ algorithms ⊕ amazon ⊕ analysis ⊕ analytics ⊕ antipatterns ⊕ api ⊕ apis ⊕ app ⊕ apple ⊕ applescript ⊕ architecture ⊕ array ⊕ art ⊕ atom ⊕ audio ⊕ automatedtesting ⊕ automation ⊕ aws ⊕ baby ⊕ backend ⊕ backup ⊕ benchmark ⊕ bestpractices ⊕ bigdata ⊕ blog ⊕ bookmarking ⊕ bookmarklets ⊕ breadcrumbs ⊕ browser ⊕ c ⊕ cache ⊕ caching ⊕ canonical ⊕ canvas ⊕ chart ⊕ charting ⊕ charts ⊕ cheatsheet ⊕ checkout ⊕ chrome ⊕ class ⊕ cli ⊕ cocoa ⊕ code ⊕ codegenerator ⊕ codeigniter ⊕ cognitive ⊕ coherent ⊕ collaborative ⊕ collection ⊕ community ⊕ compare ⊕ comparison ⊕ conference ⊕ convert ⊕ copy ⊕ core ⊕ coredata ⊕ countries ⊕ course ⊕ crawler ⊕ css ⊕ csv ⊕ csvkit ⊕ culture ⊕ curl ⊕ dashboard ⊕ data ⊖ database ⊕ datagrid ⊕ datamining ⊕ datarepresentation ⊕ dataset ⊕ datasets ⊕ datasource ⊕ datastructures ⊕ datavisualization ⊕ db ⊕ debug ⊕ demo ⊕ design ⊕ designpatterns ⊕ development ⊕ dhtml ⊕ diagram ⊕ diff ⊕ display ⊕ documentation ⊕ documents ⊕ dojo ⊕ DOM ⊕ download ⊕ dynamodb ⊕ earth ⊕ ecommerce ⊕ economics ⊕ encoding ⊕ english ⊕ entertainment ⊕ events ⊕ example ⊕ extensions ⊕ feed ⊕ filter ⊕ firefox ⊕ fitness ⊕ flash ⊕ flatfile ⊕ flex ⊕ flexigrid ⊕ folksonomies ⊕ foobar ⊕ forms ⊕ framework ⊕ free ⊕ fulltext ⊕ generator ⊕ geo ⊕ geography ⊕ git ⊕ github ⊕ github-repo ⊕ google ⊕ gps ⊕ graph ⊕ graphics ⊕ graphing ⊕ graphs ⊕ greasemonkey ⊕ grid ⊕ grids ⊕ hci ⊕ head ⊕ header ⊕ headers ⊕ health ⊕ heart ⊕ heartrate ⊕ heartratevariability ⊕ heuristics ⊕ howto ⊕ href ⊕ hrv ⊕ html ⊕ html5 ⊕ http ⊕ indexeddb ⊕ indexing ⊕ info ⊕ infographics ⊕ information ⊕ injection ⊕ inline ⊕ inspiration ⊕ interactive ⊕ interface ⊕ ios ⊕ iphone ⊕ ipsum ⊕ iterator ⊕ javascript ⊕ journalism ⊕ jqgrid ⊕ jquery ⊕ json ⊕ kml ⊕ language ⊕ languageagnostic ⊕ layout ⊕ learning ⊕ library ⊕ lifelog ⊕ lifestyle ⊕ link ⊕ links ⊕ list ⊕ location ⊕ logging ⊕ lookup ⊕ lorem ⊕ machinelearning ⊕ manual ⊕ map ⊕ maps ⊕ markdown ⊕ markup ⊕ mashup ⊕ medical ⊕ meta ⊕ metadata ⊕ metric ⊕ metrics ⊕ microdata ⊕ microformats ⊕ microsummaries ⊕ microsummary ⊕ mobile ⊕ model ⊕ models ⊕ mongodb ⊕ mood ⊕ mootools ⊕ music ⊕ muxtape ⊕ mysql ⊕ nike ⊕ node ⊕ node.js ⊕ normalization ⊕ nosql ⊕ objective-c ⊕ opensource ⊕ osx ⊕ parenting ⊕ parse ⊕ parsing ⊕ patterns ⊕ pear ⊕ percentage ⊕ performance ⊕ personalinformatics ⊕ php ⊕ pie ⊕ pivot ⊕ placeholder ⊕ plist ⊕ plugin ⊕ plugins ⊕ podcasts ⊕ politics ⊕ presentations ⊕ processing ⊕ productivity ⊕ programming ⊕ projects ⊕ prototype ⊕ python ⊕ quantifiedself ⊕ rdf ⊕ rdfa ⊕ reactive ⊕ reader ⊕ reference ⊕ rel ⊕ relation ⊕ relational ⊕ releation ⊕ repo ⊕ report ⊕ reporting ⊕ representation ⊕ research ⊕ resources ⊕ responsive ⊕ rest ⊕ rev ⊕ row ⊕ rss ⊕ ruleofthumb ⊕ rwd ⊕ saas ⊕ sample ⊕ scrape ⊕ scraper ⊕ screencast ⊕ scripting ⊕ scripts ⊕ search ⊕ security ⊕ self ⊕ selfhacking ⊕ sem ⊕ semantic ⊕ semantics ⊕ semanticweb ⊕ seo ⊕ service ⊕ set ⊕ sheets ⊕ shell ⊕ showcase ⊕ sleep ⊕ snippets ⊕ social ⊕ sorting ⊕ space ⊕ sphinx ⊕ spl ⊕ spreadsheet ⊕ sprite ⊕ sproutcore ⊕ sql ⊕ sqlite ⊕ stackoverflow ⊕ standard ⊕ standardization ⊕ state ⊕ statistics ⊕ stopwords ⊕ storage ⊕ string ⊕ supplements ⊕ syndication ⊕ table ⊕ tables ⊕ tagging ⊕ talks ⊕ test ⊕ testing ⊕ tests ⊕ text ⊕ tips ⊕ toolbar ⊕ toolkit ⊕ tools ⊕ tracking ⊕ traverse ⊕ treatments ⊕ tree ⊕ tutorial ⊕ ui ⊕ unix ⊕ uri ⊕ url ⊕ usability ⊕ utilities ⊕ utility ⊕ utils ⊕ ux ⊕ validate ⊕ via:smashingmagazine.com ⊕ visualization ⊕ w3c ⊕ web ⊕ web2.0 ⊕ webapp ⊕ webdesign ⊕ webdev ⊕ webrest ⊕ websockets ⊕ webstandards ⊕ words ⊕ world ⊕ xcode ⊕ xhtml ⊕ xml ⊕ xpath ⊕ xsl ⊕ xslt ⊕ xsltproc ⊕ yahoo ⊕ yaml ⊕ youtube ⊕ yql ⊕Copy this bookmark: