arthegall + statistics   366

From Words to Concepts and Back: Dictionaries for Linking Text, Entities and Ideas | Research Blog
What's a "concept" again? (Is this what they meant, when they were writing about "the end of Models?")
concepts  ontology  words  peter-norvig  google  research  dbpedia  statistics 
6 days ago by arthegall
Pasupathy, "Generating Nonhomogeneous Poisson Processes"
Sampling event-times from non-homogeous Poisson processes, given a representation of the intensity function...
poisson-process  pdf  sampling  random-process  statistics 
17 days ago by arthegall
Systems Biology - A Probability-based Approach for the Analysis of Large-scale RNAi Screens
Evaluating hits in RNAi screens with replicates and multiple RNAs per gene. (Mmmmmfff. This seems rife for bugs, errors, weirdness.)
rnai  bioinformatics  software  screening  statistics 
4 weeks ago by arthegall
The Neutral Model of Inquiry (or, What Is the Scientific Literature, Chopped Liver?)
Writing a version of the thought experiment in this post as a simulation was part of my "fun work" this weekend...
science  statistics  simulation  by:cshalizi  projects  weekend-programming 
4 weeks ago by arthegall
Jerry Reiter
"Below are descriptions of my research on various aspects of statistical disclosure limitation, including assessing risk and utility, synthetic data methods, remote access servers, and secure analyses of distributed data. I also have included papers with links."
via:andrew-gelman  privacy  database  computing  statistics  imputed-data 
10 weeks ago by arthegall
Bret Victor - Inventing on Principle on Vimeo
Yessss.... you know what should work like this? R, obviously.
R  data  visualization  graphics  statistics  interactivity  video  vimeo  via:andy 
february 2012 by arthegall
Zhi, Chen "Statistical Guidance for Experimental Design and Data Analysis of Mutation Detection in Rare Monogenic Mendelian Diseases by Exome Sequencing" (PLoS ONE)
"... we present a statistical modeling framework to calculate the power, the probability of identifying truly disease-causing genes, under various inheritance models and experimental conditions, providing guidance for both proper experimental design and data analysis."
sequencing  plos  research-article  genomics  mendelian-disease  statistics  to-read 
february 2012 by arthegall
RStudio
Open-source R IDE. Worth looking in to.
R  programming  ide  statistics 
january 2012 by arthegall
German tank problem - Wikipedia
Estimating population sizes based on observations of (say) serial numbers of samples. I'm forgetting where I picked up this link, but I have the feeling that this could be used explicitly in some bioinformatics/sequencing applications.
statistics  estimation  idea  wikipedia  populations  from delicious
january 2011 by arthegall
"The pre-season AP poll is great." (the kenpom.com blog)
"It’s informed groupthink at its finest." -- A great quote, but I really like the whole discussion.
sports  polling  bias  groupthink  wisdom-of-crowds  averaging  statistics 
november 2010 by arthegall
Bento, Ibrahimi, Montanari, "Learning Networks of Stochastic Differential Equations" (arXiv)
"We consider linear models for stochastic dynamics. To any such model can be associated a network (namely a directed graph) describing which degrees of freedom interact under the dynamics. We tackle the problem of learning such a network from observation of the system trajectory over a time interval $T$."
symbolic-methods  stochastic-processes  differential-equations  graphs  research-article  learning  arxiv  nips  statistics  information-theory  via:ded_maxim 
november 2010 by arthegall
"SNPwatch: Uncertainty Surrounds Longevity GWAS" (The Spittoon)
The Spittoon (the blog from 23andme) does the follow-up on that fishy-smelling "here are a bunch of genetic variants associated with longevity" study that got all that press a few months ago.
23andme  spittoon  review  longevity  gwas  statistics  uncertainty  genomics  genetics  science  news 
august 2010 by arthegall
"Megan McArdle’s Hack Post on Elizabeth Warren’s Scholarship" (Rortybomb)
"If you made it this far, I feel terrible for you. I feel like Virgil leading you through a Glibertarian Inferno." -- I'm glad that Konczal is writing about this, because otherwise I'd have to read Levenson's post, which would make me want to claw my own eyes out.
mike-konczal  megan-mcardle  elizabeth-warren  bankruptcy  statistics  research  politics 
july 2010 by arthegall
Leibler, Kussell, "Individual histories and selection in heterogeneous populations" PNAS
"Using “individual histories”—temporal sequences of all reproduction events and phenotypic changes of individuals and their ancestors—we present an alternative approach to quantifying selection in diverse experimental settings..."
via:cshalizi  pnas  selection  history  population-effects  genomics  research-article  statistics 
july 2010 by arthegall
Robins, Richardson "Alternative Graphical Causal Models and the Identification of Direct Effects." (PDF)
"There are two common approaches to the construction of causal models. The first ap- proach posits unobserved fixed ‘potential’ or ‘counterfactual’ outcomes for each unit under different possible joint treatments or exposures. The second approach posits relationships between the population distribution of outcomes under experimental interventions (with full compliance) to the set of (conditional) distributions that would be observed under passive observation (i.e., from observational data). We will refer to the former as ‘counterfactual’ causal models and the latter as ‘agnostic’ causal models (Spirtes et al., 1993), as the second approach is agnostic as to whether unit-specific counterfactual outcomes exist, be they fixed or stochastic.
The primary difference between the two approaches is ontological..."
causality  ontology  graphical-models  direct-effects  pdf  research-article  statistics  social-science 
july 2010 by arthegall
Halmos, Savage. "Application of the Radon-Nikodym Theorem to the Theory of Sufficient Statistics." Ann. Math. Statist. (1949)
"The purpose of generality here is not to solve immediate practical problems, but rather to capture the logical essence of an important concept (sufficient statistic), and in particular to disentangle that concept from such ideas as Euclidean space, dimensionality, partial differentiation, and the distinction between continuous and discrete distributions, which seem to us extraneous."
paul-halmos  leonard-savage  statistics  mathematics  sufficient-statistics  geometry  radon-nikodym-theorem  history 
july 2010 by arthegall
David Blackwell has passed away « An Ergodic Walk
"I’ll always remember what [Blackwell] told me when I handed him a draft of my thesis. “The best thing about Bayesians is that they’re always right.”"
humor  bayesian-methods  david-blackwell  quote  statistics 
july 2010 by arthegall
"Psychic Octopus picks Germany to beat Argentina" (Washington Post)
Awesome in so many ways. At least they're publishing the next prediction! (Via Tom L. on Twitter.)
out-of-sample-error  statistics  humor  stopped-clock  psychic-octopus  soccer  sports  prediction 
june 2010 by arthegall
Praxis and Ideology in Bayesian Data Analysis
"Edward Prescott forms a noteworthy exception: under the rubric of "calibration", he has elevated his conviction that his prior guesses are never wrong into a new principle of statistical estimation." -- Cosma's snark about statisticians and economists is the funniest snark around. We could all aspire to have wits that were as subtle and dry as his... (seriously).
by:cshalizi  bayesian-methods  andrew-gelman  statistics  econometrics  edward-prescott  humor 
june 2010 by arthegall
Allman et al. "Parameter identifiability in a class of random graph mixture models" (arXiv)
"We prove identifiability of parameters for a broad class of random graph mixture models. These models are characterized by a partition of the set of graph nodes into latent (unobservable) groups. The connectivities between nodes are independent random variables when conditioned on the groups of the nodes being connected. In the binary random graph case, in which edges are either present or absent, these models are known as stochastic blockmodels and have been widely used in the social sciences and, more recently, in biology. " -- To read, in the context of the blockmodeling paper from a few weeks back.
blockmodeling  graph  statistics  arxiv  research-article  parameters  identifiability  via:cshalizi 
june 2010 by arthegall
[1005.4274] This is SPIRAL-TAP: Sparse Poisson Intensity Reconstruction ALgorithms - Theory and Practice
This looks like the kind of thing I wanted, oh so long ago (pre-graduation), for thinking about delicious link frequencies. Also: great paper title.
via:Vaguery  poisson-process  statistics  estimation  images  image-analysis  optimization  arxiv  research-article 
may 2010 by arthegall
McCormick, Salganik, and Zheng "How many people do you know? : Efficiently estimating personal network size"
Got this via Andrew Gelman's blog 1+ years ago. (For some reason, I remember the room I was sitting in when I first read it.)
pdf  research-article  social-networks  estimation  statistics  friends  sociology 
april 2010 by arthegall
Efron & Thisted, "Estimating the number of unseen species: How many words did Shakespeare know?" (JSTOR)
[JSTOR: Biometrika, Vol. 63, No. 3 (Dec., 1976), pp. 435-447] Did I really not save a link to this already? There's an analogy to be drawn here with those social network papers that try to estimate the total number of X in a population by asking different people, "how many X do you know," etc.
networks  bradley-efron  jstor  statistics  research-article  estimation  shakespeare  language 
april 2010 by arthegall
"Finally joining the revolution" (Bill Simmons)
In the guise of an article about sports and baseball and fandom, I feel like Bill Simmons has actually written a great little piece about statistics and data mining in general. Take the way he looks at each statistics, picks apart each pro and con, and then figures out how to actually *use* it ... this is something that you could get a lot of high school or college statistics students to read.
statistics  data  data-mining  sports  baseball  sabermetrics  bill-simmons  espn  history 
april 2010 by arthegall
ISWC-triplerank-revised-version.pdf (application/pdf Object)
When I say 'graph data', you say 'spectral analysis'! (Graphdata...spectralanalysis!) "TripleRank: Ranking Semantic Web Data by Tensor Decomposition" -- presented at ISWC this year. Sam and I were just talking about this (general problem) yesterday. But is all the notation in terms of tensors really necessary? At first glance this looks like something that's (usually) presented as a "matrix factorization", no?
semanticweb  pdf  research-article  ranking  web  triples  rdf  via:inkdroid  statistics 
october 2009 by arthegall
Devroye and Lugosi, "Bin width selection in multivariate histograms by the combinatorial method" (TEST, 2004)
There's a rich vein of papers about bin-width selection in histograms -- some of them are pretty interesting, actually.
histograms  research-article  optimization  statistics  luc-devroye 
october 2009 by arthegall
Gary King - Zelig Software Website
"Zelig comes with detailed, self-contained documentation that minimizes startup costs for Zelig and R, automates graphics and summaries for all models, and, with only three simple commands required, generally makes the power of R accessible for all users." --- heading in the right direction, useability-wise. Gary King is sort of an inspirational figure, if you think about him in the right way. (Like Ben Schneiderman, no?)
gary-king  via:kinggary  software  R  statistics  tool  spss  stata 
october 2009 by arthegall
Quick-R: Home Page
"R for ... SPSS users." Need to send this to Rachel...
r  via:cshalizi  tutorial  statistics  software  reference  programming 
october 2009 by arthegall
"What’s Wrong with Probability Notation?" (LingPipe Blog)
It's a reasonable explication -- but I feel like every first-year CS graduate student has a similar personal revelation about notation when he or she first comes into contact with the machine learning or probabilistic AI literature. "This notation is horrible, but lambda calculus [or, more generally, a functional language, or some other technique I learned in my proglang course] will come to the rescue and clear all this up!" And then they work it out in the same way, have exactly the same realization (this is hard, people have tried and failed at it before), and move on. Or maybe they're Avi Pfeffer and they actually do something about it. But either way, it's a worthwhile exercise!
machinelearning  notation  programminglanguages  statistics  probability  computerscience 
october 2009 by arthegall
Efron, "Are a set of microarrays independent of each other?" (Annals of Applied Statistics, 2008)
When Bradley Efron talks about microarrays... (But where is the Cabernet of k-Means and heat maps? Where is the Burgundy of Support-Vector Machines?)
dean-young  bradley-efron  poetry  humor  statistics  obscurely-referential  microarrays 
september 2009 by arthegall
Poon et al. "Parsing Social Network Survey Data from Hidden Populations Using Stochastic Context-Free Grammars" (PLoS ONE)
"Here, we develop a new methodology based on stochastic context-free grammars (SCFGs), which are well-suited to modeling tree-like structure of the RDS recruitment process. We apply this methodology to an RDS case study of injection drug users (IDUs) in Tijuana, México, a hidden population at high risk of blood-borne and sexually-transmitted infections (i.e., HIV, hepatitis C virus, syphilis). ... We identified significant latent variability in the recruitment process that violates assumptions of Markov chain-based methods for RDS analysis: firstly, IDUs tended to emulate the recruitment behavior of their own recruiter; and secondly, the recruitment of like peers (homophily) was dependent on the number of recruits."
social-networks  plos-one  research-article  scfgs  networks  markov-models  statistics  epidemiology  network-models 
september 2009 by arthegall
"Genes and Income" (Rortybomb)
"So this strikes me as a major problem for the graph. Your Mom guessing, in $20,000 increments, what your income is not the best proxy for actual income, and it seems like a rather blunt sword to use to declare the knot of “Nature/Nuture” cut. I think the lack of granularity among the categories alone could easily noise out that $1,600, no?" --- It turns out to be a "your mom" joke after all! That's hilarious.
humor  genetics  income  mankiw  graphing  statistics  causality 
september 2009 by arthegall
"What's the difference between Bayesian and classical statistics" (Statistical Modeling, Causal Inference, and Social Science)
"One problem with finding statistical resources on the web, I think, is that a webpage on a technical issue is likely to have been written by a computer scientist. And what computer scientists do with data and models is often much different from what we do." -- A golden quote. [He says that like it's a *bad* thing!]
humor  andrew-gelman  quote  statistics  computerscience  web 
september 2009 by arthegall
RANDOM NUMBER GENERATION (LUC DEVROYE)
Devroye's publications on random number generation (goes hand in hand with his book).
random-numbers  publications  researcher  statistics 
september 2009 by arthegall
Ye, "On Measuring and Correcting the Effects of Data Mining and Model Selection" (JASA, 1998)
[JSTOR: Journal of the American Statistical Association, Vol. 93, No. 441 (Mar., 1998), pp. 120-131]
jstor  data-mining  statistics  degrees-of-freedom  research-article 
august 2009 by arthegall
Comment by cshalizi at "FDL Book Salon Welcomes Scott Page: The Difference"
"Maybe it’s the statistician in me, but I am warming to the idea of randomization as a way of introducing diversity."
humor  diversity  randomization  statistics 
august 2009 by arthegall
Robert Berk, "Limiting Behavior of Posterior Distributions when the Model is Incorrect"
Can I ask a question? Is it the case that Brad DeLong's "Rosencrantz and Guildenstern Flip a Coin" example (http://delong.typepad.com/sdj/2009/03/cosma-shalizi-takes-me-to-probability-school-or-is-it-philosophy-school.html) is a particular instance of the example given in the last paragraph of this paper? (the answer, I am subsequently told, is "yes.")
via:cshalizi  baysian-methods  inference  modeling  research-article  statistics 
august 2009 by arthegall
"A little more about random configurations" (Quomodocumque)
"My impression is that statisticians are pretty good at distinguishing between a normal distribution and a superimposition of some small finite set of normal distributions. But I think it’s much harder to look at a giant cloud of points in R^100 and say “aha — this is actually a random sample from a normal distribution centered on the union of a surface of genus 2 sitting over here, and these ten disjoint circles sitting over there.""
geometry  data  statistics  configurations  physics  sampling  mcmc-sampling 
july 2009 by arthegall
« earlier      

related tags

23andme  academics  advice  ai  aic  air-travel  algebra  algebraic-geometry  algegraic-statistics  algorithms  amazon  analysis  andrew-gelman  anova  anthology  approximations  april-fools  architecture  archive  art  article  artificial-intelligence  artxiv  arxiv  auc  autism  averaging  babies  bankruptcy  baseball  basketball  bayes  bayesian-methods  bayesian-networks  bayesian-probability  baysian-methods  bell-curve  ben-bolstad  berkeley  bias  big-data  bill-james  bill-simmons  bioinformatics  biology  bitterness  blockmodeling  blog  blogging  book  book-review  books  boosting  bounds  brad-delong  bradley-efron  breast-cancer  british-petroleum  bugs  burn-in  by:cshalizi  cancer  casuality  causal-networks  causality  cdc  cellular-automata  census  change-point  children  chinese-restaurant-process  citation  citeseer  class  classification  clothing  clustering  cmu  college  college-admissions  comic  comics  comment  common-mistakes  competition  complaint  compressed-sensing  compression  computerscience  computing  concepts  conditional-entropy  confidence-intervals  configurations  congress  consistency  continuations  convex-optimization  cooking  correlation  cosma-shalizi  counterfactuals  counting  course  course-notes  cowen  crime  criticism  culture  dartmouth  data  data-analysis  data-mining  database  datamining  dataset  datastream  david-blackwell  david-freedman  dbpedia  dean-young  deborah-mayo  decision-theory  degrees-of-freedom  del.icio.us  demographics  density-estimation  depression  design  diagnosis  dialogue  differential-equations  differential-geometry  direct-effects  disaster  discussion  disease  diversity  dna  double-counting  draft  drug-testing  drugs  each-answer-leads-to-more-questions  ecology  econometrics  economics  education  edward-prescott  eisenstein  elections  electronic-records  elizabeth-warren  email  empirical-bayes  employment  energy  energy-star  engineering  english  environment  epa  epidemiology  epistemology  error  espn  essay  estimation  evaluation  evidence  evolution  excel  experimental-design  experimentation  exponential-distributions  exponential-families  expression-analysis  factor-analysis  faithfulness  fantasy-baseball  faq  fbi  feature-selection  features  fernando-pereira  file-drawer-problem  finance  flash  food  food-for-thought  football  free  french  frequentist-methods  friends  functional-genomics  functionalprogramming  futurism  gambling  gamma-function  gary-king  gaussian-copula  gaussianprocesses  geek  gelman  gender  generalized-linear-models  genetic-association-studies  genetics  genomics  geometry  google  government  grades  grammar  graph  graphical-models  graphics  graphing  graphs  groupthink  gwas  h1b-visas  happiness  haptics  harvard  harvey-mansfield  haskell  health  heavy-tailed-distributions  hedge-funds  hierarchical-models  histograms  history  homepage  hume  humor  hypothesis-testing  ide  idea  identifiability  idiocy  ij-good  image-analysis  images  immigration  imputed-data  income  income-distribution  independence  index  indus-script  inequality  inference  information  information-criteria  information-geometry  information-theory  infotheory  intelligence  interaction-effects  interactivity  interface  internet  interview  investing  iq  iraq  item-response-model  item-response-models  jamie-robins  japan  java  john-yoo  journal  journal-article  journalism  journamalism  jstor  jun-liu  kd-trees  kelly-criterion  kottke  la-times  language  lasso  law  law-of-large-numbers  lawyers  le-cam  learning  lecture  lectures  leonard-savage  library  linear-algebra  linear-regression  linguistics  links  linus-pauling  list  loess  log5  longevity  luc-devroye  machine-learning  machinelearning  magazine-article  malcolm-gladwell  mankiw  maps  mark-johnson  markov-chain  markov-models  martingales  matching  math  mathematics  matlab  maximum-entropy  mcmc  mcmc-sampling  measurement  median  medicine  megan-mcardle  mendelian-disease  meta-analysis  methods  michael-lewis  microarray  microarray-analysis  microarrays  mike-konczal  mit  mja  model  model-selection  modeling  models  molecular-biology  moment-methods  monads  money  moneyball  movies  multi-modal  multilevel-modeling  multiple-testing  n+1  nature  nba  nber  nclb  neoformix  nested-clade-analysis  netflix  network-models  networks  neural-networks  news  news-article  neyman  neyman-pearson  nfl  nips  nlp  nonparametric-methods  nonparametric-statistics  nonparametrics  normalization  notation  notes  null-hypotheses  nutrition  nyt  obituary  obscurely-referential  occams-razor  oil  online-optimization  ontology  opendata  opensource  opinion  ops  optimization  order-statistics  out-of-sample-error  overfitting  p-value  p-values  papers  parameters  parenting  pathways  paul-halmos  pca  pdf  peer-review  permutations  personal  peter-norvig  pew-research-center  pharmaceuticals  philosophy  philosophy-of-science  phylogenetics  physics  placebo-effect  planning  plos  plos-one  pnas  poetry  poisson-process  policy  political  political-science  politics  polling  polls  population-effects  population-genetics  populations  poster  postgres  poverty  power  power-analysis  power-law  prediction  presentation  priors  privacy  private-beta  prize  probability  probability-theory  programming  programminglanguages  project-euclid  projects  proof  psychic-octopus  psychology  psychometrics  public-health  publication-bias  publications  publishing  puerile-humor  pyschology  python  question  queueing-theory  quote  r  r-programming  race  radford-neal  radon-nikodym-theorem  rambo  random-numbers  random-process  random-processes  randomization  rank-tests  ranking  rankings  rant  ratings-agencies  rationality  rdf  recipe  record-keeping  reference  regression  regression-trees  relationships  report  research  research-article  research-articles  researcher  researchers  research_article  resesarch-article  resources  response  review  review-article  risk-analysis  rma  rmse  rnai  roc  sabermetrics  sabremetrics  salaries  samping  sample-size  sampling  sat  scale-free-networks  scfgs  science  science-commons  screening  sea-otters  search  selection  self-testing  semantics  semanticweb  semi-parametric-methods  sensory-perception  sequence-analysis  sequencing  seth-roberts  shakespeare  shane-battier  shrinkage  signal-processing  significance-testing  simulation  site  slate  slice-sampling  slides  soccer  social  social-networks  social-science  socialscience  sociology  sociology-of-data  software  sparse-regression  speech  spittoon  sports  spss  standard-error  stata  state-space-models  statistical-mechanics  statistics  statnet  steins-method  stochastic-calculus  stochastic-processes  stopped-clock  story  structured-data  subset-selection  sufficient-statistics  survey  swf  symbolic-methods  t-shirts  table-of-contents  talk  teaching  technology  terence-speed  terence-tao  test-error  testing  tests  text  the-corner  thesis  this-american-life  time-series  tips  to-read  to-watch  to:buy  tool  tools  torture  traffic  transcription  trees  triples  tukey  tutorial  twitter  type-systems  typetheory  uncertainty  united-states  university  variable-selection  variation  variational-methods  venn-diagram  via:andrew-gelman  via:andy  via:arsyed  via:chl  via:cshalizi  via:cshalizi?  via:ded_maxim  via:erindanielson  via:inkdroid  via:kinggary  via:marcua  via:mja  via:s0n2  via:vaguery  via:vielmetti  via:WanderingAengus  video  vimeo  violence  virtue  vision  visualization  vitamin-c  von-neumann  voting  wages-of-wins  wall-street  want  war  web  weekend-programming  weird  wikipedia  wildlife  william-feller  winbugs  wisdom-of-crowds  wordpress  words  xkcd  your-mom-jokes  yudkowsky 

Copy this bookmark:



description:


tags: