howthebodyworks + statistics   170

Orange – Data Mining Fruitful & Fun
Design your data analysis process through visual programming. Orange remembers your choices, suggests most frequently used combinations, and intelligently chooses which communication channels between widgets to use.

Orange is packed with different visualizations, from scatterplots, bar charts, trees, to dendrograms, networks and heatmaps.

Actions seamlessly propagate through data analysis schema. Selection of data subset in one widget can automatically trigger change of display in the other one. By combining various widgets you can design data analytics framework of choice.

Over 100 widgets and growing. Coverage of most of standard data analysis tasks. Also specialized add-ons are available, like Bioorange for bioinformatics.

With scripting interface in Python, programming new algorithms and developing complex data analysis procedures is pure joy, using and reusing all power found in v
opensource  via:Strangefeatures  statistics  visualization  python  from delicious
9 days ago by howthebodyworks
rrr00bb: Disconnected Procedure Calls
REST for the sneakernet - how to send your next few likely uses for a sporadically available API by sending a queued decision tree of API calls.
statistics  rpc  api  from delicious
12 weeks ago by howthebodyworks
factorie - Probabilistic programming with imperatively-defined factor graphs - Google Project Hosting
FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
It is object-oriented... in the definition of random variables, factors, inference and learning methods.
It is scalable, with demonstrated success on problems with many millions of variables and factors, and on models that have changing structure, such as case factor diagrams... capable of handling billions of variables.
It is flexible, supporting multiple modeling and inference paradigms. Its original emphasis was on conditional random fields, undirected graphical models, MCMC inference, online training, and discriminative parameter estimation... has preliminary support for variational inference, including belief propagation and mean-field methods.
monte_carlo  datamining  statistics  scala  nlp  from delicious
january 2012 by howthebodyworks
Pattern | CLiPS
combo data-mining/NLP/web-scraping toolkit for instant natural experiments online
api  statistics  nlp  datamining  python  from delicious
january 2012 by howthebodyworks
Sparse- and low-rank approximation wiki - Sparse Solver Wiki
This wiki has information on solvers and problems that arise in these fields (and subfields, such as compressed sensing).
Everyone is welcome and encouraged to edit this wiki; it runs
Contents

The contents of this wiki have been organized into categories.
Problems formulations that arise in sparse- and low-rank approximation.
Solvers that are used for solving these problems. There are many sub-categories of solvers, such as:
Convex solvers
Greedy solvers
Matrix completion solvers
and many more (all of them listed at Solvers ).
Benchmarking/Test problems for comparing algorithms
Applications (in software) of sparsity or low-rank based techniques
Hardware devices that perform compressed sensing.
...
compressed_sensing  linear_algebra  statistics  compact_representation  from delicious
january 2012 by howthebodyworks
Create maps with maptools R package | Statisfaction
How to do stats on the surface of he earth with beautiful visualizations
mapping  gis  r  statistics  france  from delicious
december 2011 by howthebodyworks
How I automated my writing career - O'Reilly Radar
a growth industry: automating the production of journalism, books, textual content
nlp  writing  ai  statistics  journalism  from delicious
november 2011 by howthebodyworks
Peter L. Hurd's page of local resources
weirldy relaxed academic dispensing soothing quotes:
Do not burn yourselves out. Be as I am - a reluctant enthusiast, a part-time crusader, a half-hearted fanatic. Save the other half of yourselves and your lives for pleasure and adventure. It is not enough to fight for natural land and the west; it is even more important to enjoy it. While you can. While it's still there... Enjoy yourselves, keep your brain in your head and your head firmly attached to the body, the body active and alive, and I promise you this much: I promise you this one sweet victory over our enemies, over those desk-bound men with their hearts in a safe deposit box, and their eyes hypnotized by desk calculators. I promise you this: you will outlive the bastards. --Ed Abbey.
statistics  from delicious
november 2011 by howthebodyworks
A Roadmap for Rich Scientific Data Structures in Python | Quant Pythonista
"So, this post is a bit of a brain dump on rich data structures in Python and what needs to happen in the very near future. I care about them for statistical computing (I want to build a statistical computing environment that trounces R) and financial data analysis (all evidence leads me to believe that Python is the best all-around tool for the finance space). Other people in the scientific Python community want them for numerous other applications: geophysics, neuroscience, etc. It’s really hard to make everyone happy with a single solution. But the current state of affairs has me rather anxious. And I’d like to explain why..."
statistics  Python  R  visualisation  db  nosql  has:for 
september 2011 by howthebodyworks
pandas: a python data analysis library — pandas v0.4.0dev documentation
"pandas is a python package providing convenient data structures for time series, cross-sectional, or any other form of “labeled” data, with tools for building statistical and econometric models."

handle data in python intuitively. pass to R for fiddly bits.
Python  R  metadata  statistics 
august 2011 by howthebodyworks
NIST/SEMATECH e-Handbook of Statistical Methods
neat alternate perspective on statistics, in that kind of living-in-a-funding-bubble US NIST kind of way. Bit of an exploratory daya analysis focus, but still worthwhile.
statistics  howto  hps 
august 2011 by howthebodyworks
Peter Doyle
a great collection of mathquirk, all online and free
maths  statistics  geometry 
august 2011 by howthebodyworks
The Practical Quant: Compressed Sensing and Big Data
Best explanation of this sparse image representation thing that I have yet seen. Well wicked.
computer_vision  learning  statistics  GRAMMARTHING  grammarface  compressed_sensing 
july 2011 by howthebodyworks
ExploringDataBlog: Interestingness Measures
categorical data has other Shannon-information-like estiamtors of "interestingness"
statistics  r  information_theory  bubble_economy 
may 2011 by howthebodyworks
Overview — NetworkX v1.4 documentation
native python graph handling with ultralight api built aroudn hashes.
python  networks  statistics 
may 2011 by howthebodyworks
graph-tool
c++ graph lib for python, optimised for performance.
python  networks  c++  boost  statistics 
may 2011 by howthebodyworks
Generalized Information Measures and Their Applications
TANEJA. I.J. (2001), - this has the stuff about norms and maximum entropy measures etc and their implications
information_theory  statistics 
april 2011 by howthebodyworks
Bayesian inference of the median - Statistical Modeling, Causal Inference, and Social Science
"In that sense, when people play with loss functions, they are essentially also playing with probability distributions that are entailed by the loss functions. When they use L1 or L2 regularization for regression, they are picking either Gaussian or Laplace priors for the parameters. The reason for the popularity has been primarily the realization that Laplace prior is better than Gaussian prior on many benchmarks. I wonder if the log(1+d^2) norms will generate as many papers as L1, or whether statisticians will migrate from Student to Laplace."
bayes  statistics  maxent 
april 2011 by howthebodyworks
Think Stats: Probability and Statistics for Programmers
hawt python statistics. If you don't actually like R THAT much.
python  statistics 
march 2011 by howthebodyworks
MADlib
statiscial analysis on your database contents: "MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data". Seems to be largely in PL/C, with some bonus python on the front.
python  statistics  postgresql  sql  db  greenplum  mapreduce  ai  classification 
march 2011 by howthebodyworks
ConnectMV | Process Improvement using Data
interesting bunch of courses on statistical data fiddling
r  howto  statistics 
march 2011 by howthebodyworks
Understanding Uncertainty
science communication expert explains probability with rare precision. excellent for tips about communicating risk and lots of tasty examples (comparing hose riding with ecstasy, motorbikes with pregnancy, all that stuff)
risk  crisis  statistics 
january 2011 by howthebodyworks
Ruby/GSL
Rubt's sparse but serviceable answer to numpy
gnu  science  linear_algebra  statistics  ruby  mathematics 
december 2010 by howthebodyworks
SOME IDEAS ON COMMUNICATING RISK TO THE GENERAL PUBLIC | Decision Science News
not pretty, but necessary. How to comunicate statistics and risks to people in a matter which suffers least at the hands of our inherent cognitive biases
visualization  risk  mind  crisis  statistics 
december 2010 by howthebodyworks
InterSciWiki:Community Portal - InterSciWiki
interesting little wiki of complexity and emergence for the student-type.
transdisciplinary  complexity  statistics  methodology  networks  wiki 
november 2010 by howthebodyworks
“simply start over and build something better” « Xi'an's Og
the case for replacing the language of R with somethig else that can access the same statistical power
r  statistics  compsci  coding 
november 2010 by howthebodyworks
Power-law Distributions
how to find, in a statistically valid fashion, that your data fits a power law.
statistics  scaling  powerlaw  matlab  python  r 
november 2010 by howthebodyworks
MCMC and likelihood-free methods
MCMC, Monte Carlo methods, and the Metropolis -Hastings sampling doohickey for fun and profit. Well, profit.
statistics  numerical_methods  methodology 
november 2010 by howthebodyworks
pymc - Project Hosting on Google Code
Bayesian estimation, particularly using Markov chain Monte Carlo (MCMC), is an increasingly relevant approach to statistical estimation. However, few statistical software packages implement MCMC samplers, and they are non-trivial to code by hand. PyMC is a python module that implements the Metropolis-Hastings algorithm as a python class, and is extremely flexible and applicable to a large suite of problems. PyMC includes methods for summarizing output, plotting, goodness-of-fit and convergence diagnostics
monte_carlo  python  statistics  bayes  datamining  markov  modelling 
october 2010 by howthebodyworks
Numerical Recipes in C
numerical recipes in c, 2nd edition, is online.
compsci  c  algorithm  programming  simulation  statistics 
october 2010 by howthebodyworks
Come on in my kitchen ...
the statistics and evolution of interesting random cellular automata
cellularautomata  statistics  evolution 
october 2010 by howthebodyworks
pyentropy - Project Hosting on Google Code
information theoretic widgets and doohickeys for python statistics
information_theory  python  statistics  neuron  numpy  academic 
october 2010 by howthebodyworks
http://rpy.sourceforge.net/rpy2.html
the python<->R interface is being rebooted
r  python  api  statistics 
october 2010 by howthebodyworks
bryan-code - Project Hosting on Google Code
handy image processing code in python, including similarity detection, mutual information volumetric rendering and so on
image  python  numpy  statistics  3d 
october 2010 by howthebodyworks
Programmers Need To Learn Statistics Or I Will Kill Them All
This guy is reliably entertaining. in this case, it's a userful, arrogant rant about what you need to know about stats to push beyond the "mean request time or whatever. highlights:

>The next day we had IBM fixing the problem (turned out to be a single update index command) and we all kept our jobs. That’s what a proper analysis method can do for you.

>still I see software developers begging for gazillions of dollars to buy some crap tool that doesn’t even mention “standard deviation”, but throws “user” around like it’s Dr. Phil treating Robert Downey Jr. for heroin addiction.
r  reference  statistics  performance  compsci  dear_me 
october 2010 by howthebodyworks
Geospatial Analysis - spatial and GIS analysis techniques and GIS software
online free ebook about the stats and tools needed for all that trendy geospatial shit to happen in a rigorous way
gis  howto  academic  ebook  agents  statistics 
september 2010 by howthebodyworks
Indirect Inference
some nice hacks for validating agent based models
phd  statistics  agents 
september 2010 by howthebodyworks
wavii's pfp at master - GitHub
parser for probabilistic context free grammars using the CYK algorithm found in the Stanford NLP parser, but faster and python-happy
nlp  parser  language  python  statistics  c++ 
september 2010 by howthebodyworks
The Tower of Babel
part I of the SFI language deep history project
language  history  phylogeny  statistics 
september 2010 by howthebodyworks
The Tower of Babel
part II of the SFI language deep history project
language  history  phylogeny  statistics 
september 2010 by howthebodyworks
deplump general purpose lossless compression
new compression algorithms which works on streams and is smaller than most stuff. licence unclear. background theory intriguing.
compression  java  statistics 
september 2010 by howthebodyworks
on-line prediction wiki - Wiki for On-Line Prediction
learning in the game-theoretic statistical framework of vovk and shafer
statistics  ai  learning  methodology  hps  phd 
september 2010 by howthebodyworks
Monte Carlo Method
monte carlo methods for dummies. Pellucid.
statistics  howto 
august 2010 by howthebodyworks
Bitzstein
inferring networks given connection weights, and generating them.
livingthing_content  networks  statistics 
august 2010 by howthebodyworks
BAYESIANISM AND CAUSALITY, OR, WHY I AM ONLY A HALF-BAYESIAN
Judea Pearl on the problems of using probabilistic reasoning when the human mind tends to causal
mind  statistics  bayes  causality  filetype:pdf  media:document 
july 2010 by howthebodyworks
Maximum Entropy Principle
tasty howto maxent guides from UC Davis
howto  maxent  statistics 
june 2010 by howthebodyworks
CSSR: An Algorithm for Building Markov Models from Time Series
Shalizi's package for inferring hidden recursive markov model values and strucutre from a time series
grammarthing  statistics  markov  c++  via:cshalizi 
june 2010 by howthebodyworks
peach - Project Hosting on Google Code
"Peach is a pure-python module, based on SciPy and NumPy to implement algorithms for computational intelligence and machine learning. Methods implemented include, but are not limited to, artificial neural networks, fuzzy logic, genetic algorithms, swarm intelligence and much more."
learning  ai  statistics  genetic  python  agents 
june 2010 by howthebodyworks
Eurozine - The defence minister's new philosophy - Karl Palmås
the implications of statistical approaches to intelligence gathering. we're all part of the mass society now, eh?
privacy  terrorism  agents  statistics 
may 2010 by howthebodyworks
Prediction Services
magickal classifiers and predictors on arbitrary data, courtesy google. probably handy for stuff, esp if they actually disclosed the algorithm.
nlp  statistics  schmooze  google  performance  csv 
may 2010 by howthebodyworks
R Videos
learn r monkey-see-monkey-do style
howto  r  statistics 
march 2010 by howthebodyworks
« earlier      

related tags

3d  @todo  academic  accessibility  admin  advertising  advocacy  agents  aggregator  agile  agnsw  ai  algorithm  america  analytics  anthropology  api  array  asp  australia  bayes  behaviour  bioinformatics  biology  blog  boost  browser  bubble_economy  business  c  c++  causality  cellularautomata  chat  citation  classification  client  climate  clustering  coding  collaborative  compact_representation  comparison  complexity  compressed_sensing  compression  compsci  computer_vision  convert  crisis  crowdsourcing  csv  cython  data  datamining  db  dear_me  debug  del.icio.us  design  diagram  diversity  diy  ebook  economics  econophysics  education  emergence  ethics  evolution  filetype:pdf  france  gametheory  game_theory  geek  genetic  geo  geocoding  geography  geometry  gis  gnu  google  grammarface  grammarthing  graph  graphical_models  graphics  greenplum  has:for  health  heuristic  history  howto  hps  http  ica  image  information  information_theory  java  journalism  language  leaning  learning  library  linear_algebra  livingthing_content  logic  macroeconomics  mail  mapping  mapreduce  markov  mashup  mathematics  maths  matlab  matlap  maxent  media  media:document  medical  melbourne  metadata  metaphor  methodology  mind  modelling  money  monte_carlo  music  mutual_information  netcultures  netlogo  networks  neuron  newmedia  nlp  nosql  nsw  numerical_methods  numpy  ocsse  openaccess  opengl  opensource  parser  parsimony  pca  pdf  performance  phd  philosophy  phm  php  phylogeny  physics  plugin  population  portable  possumpalace  postgresql  powerlaw  privacy  processing  productivity  programming  psychology  python  q&a  r  ranking  rationality  reference  risk  rpc  rss  ruby  scala  scalability  scaling  schmooze  science  scipy  search  security  semantic  server  sex  shopping  simss  simulation  social  socialmedia  software  sound  spam  sql  standards  statistics  survey  sustainability  syntax  tagging  teachmyself  technology  terrorism  text  thermodynamics  transdisciplinary  twitter  ui  uk  undergrowth  usa  via:cshalizi  via:miriaml  via:mtchl  via:Strangefeatures  victoria  visualisation  visualization  webdev  wiki  wordpress  writing 

Copy this bookmark:



description:


tags: