cshalizi + via:arthegall   33

[1203.3504] On Measurement Bias in Causal Inference
"This paper addresses the problem of measurement errors in causal inference and highlights several algebraic and graphical methods for eliminating systematic bias induced by such errors. In particulars, the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models."
to:NB  causal_inference  inference_to_latent_objects  pearl.judea  to_teach:undergrad-ADA  statistics  error_in_variables  via:arthegall 
18 days ago by cshalizi
On the History of the Transportation and Maximum Flow Problems
"We review two papers that are of historical interest for combinatorial optimization: an article of A.N. Tolsto˘ı from 1930, in which the transportation problem is studied, and a negative cycle criterion is developed and applied to solve a (for that time) large-scale (10 × 68) transportation problem to optimality; and an, until recently secret, RAND report of T.E. Harris and F.S. Ross from 1955, that Ford and Fulkerson mention as motivation to study the maximum flow problem. The papers have in common that they both apply their methods to the Soviet railway network." --- In a wonderful illustration of the power of duality, one of the papers was about optimizing the flow through the network, and the other was about keeping anything at all from flowing through it...
optimization  networks  ussr  history_of_mathematics  planning  cold_war  via:arthegall 
august 2009 by cshalizi
[0905.3369] Learning Nonlinear Dynamic Models
... from a quick scan (the abstract is completely uninformative), this seems to be yet another near-reinvention of Knight's "prediction process". To be read, and to act as a goad for me to finish CSSR II w/ KLK. (I confess I am somewhat boggled at the idea that all HMMs are linear.)

Update: After a careful read, this really is just a rediscovery of predictive representations, with the trick of using regression to learn the state-transition and emission functions. On the one hand, I feel kind of burned by seeing them calling this "entirely new" (never mind me or my teachers, Littman, Sutton & Singh should be very upset; so should Knight if he were still alive). On the other hand, they got it _done_, which is a very real virtue.

Also: You need to put **** error bars on your average performance plots. (Yes, I realize I'm nit-picking because I'm jealous.)
prediction  statistics  machine_learning  time_series  markov_models  state-space_models  via:arthegall  re:AoS_project  langford.john  zhang.tong  salakhutdinov.ruslan  have_read 
june 2009 by cshalizi
Ton's Interdependent Thoughts: WolframAlpha, Getting Less Impressed Upon Closer Look
Nice: "For all its coolness on the front of WolframAlpha, on the back end this sounds like it's the mechanical turk of the semantic web."`
information_retrieval  wolfram.stephen  wolfram_alpha  via:arthegall 
may 2009 by cshalizi
The Future is Yesterday | Messy Matters
If you want to predict next week's flu cases, an AR(2) model beats search-engine snooping. Of course this relies crucially on the CDC actually generating reliable data!

(I'm curious, though, why AR(2) rather than some other autoregressive order? Something related to the length of the infectious and incubation periods?)
epidemiology  time_series  via:arthegall  statistics 
march 2009 by cshalizi
All we want are the facts, ma'am
When I wrote about Chris Anderson's idiotic piece back in the spring, I didn't say anything about the quote from Norvig, because it sounded very strange and not at all like Norvig. And, indeed, he now says "That's a silly statement, I didn't say it, and I disagree with it." Ah, Wired!
why_oh_why_cant_we_have_a_better_press_corps  anderson.chris  statistics  modeling  data_mining  norvig.peter  machine_learning  bad_science_journalism  fact_checking  via:arthegall  via:shivak 
february 2009 by cshalizi
Stacked generalization
I read this a long time ago, and then forgot about it (except for vague comments to students).
ensemble_methods  machine_learning  wolpert.david  via:arthegall  to_teach:data-mining 
february 2009 by cshalizi
Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models (Johnson, Griffiths and Goldwater)
"introduces adaptor grammars, a class of probabilistic models of language that generalize probabilistic context-free grammars (PCFGs). ... ars augment the ... rules of PCFGs with “adaptors” that can induce dependencies among successive uses. With a particular choice of adaptor, based on the Pitman-Yor process, nonparametric Bayesian models of language using Dirichlet processes and hierarchical Dirichlet processes can be written as simple grammars. We present a general-purpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrate how several existing nonparametric Bayesian models can be expressed within this framework." --- Looking at posterior or predictive consistency here would I think be interesting, but hard.
grammar_induction  statistics  context-free_grammars  nonparametrics  machine_learning  via:arthegall  statistical_inference_for_stochastic_processes 
february 2009 by cshalizi
Margaret Ackerman and Shai Ben-David, "Measures of Clustering Quality: A Working Set of Axioms for Clustering"
A rebuttal to Kleinberg's impossibility theory for clustering (bookmarked earlier). There are measures of _cluster quality_ which satisfy all the natural axioms, which is good enough.
clustering  to_teach:data-mining  via:arthegall  via:vielmetti  data_mining  ackerman.margaret  ben-david.shai  kleinberg.jon 
december 2008 by cshalizi
Beyond the Hoax: Science, Philosophy and Culture by Alan Sokal, reviewed by Simon Blackburn
Query to self: does this sort of deflation of the claim "science gives us the truth" (by using Tarski to turn that into the OR of lots of claims like "science says that the earth circles the sun, and it does") still work counterfactually? That is, if the Sun _did_ go around the Earth, presumably scientists could figure that out... (Cf. Kevin Kelly, _Logic of Reliable Inquiry_.)
book_reviews  sokal.alan  blackburn.simon  the_french_disease  philosophy_of_science  epistemology  via:arthegall  truth 
august 2008 by cshalizi
Novembre and Stephens, "Interpreting principal component analyses of spatial population genetic variation" (Nature Genetics)
"We find that gradients and waves observed in ... maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events."
genetics  human_genetics  statistics  principal_components  spatial_statistics  stepping_stone_model  cavalli-sforza  via:arthegall  bad_data_analysis  to_teach:data-mining  to:NB  to_teach:undergrad-ADA 
may 2008 by cshalizi
"Cat Proximity" (xkcd)
Classic (& thanks for the reminder). An accurate depiction of my domestic life.
cats  comics  funny:malicious  via:arthegall  story_of_my_life 
may 2008 by cshalizi
Jackboots and Whole Foods
... in which Michael Tomasky's time is wasted reading and reviewing garbage. Perhaps this was the point?
book_reviews  evisceration  tomasky.michael  goldberg.jonah  liberalism  fascism  utter_stupidity  via:arthegall 
february 2008 by cshalizi

related tags

ackerman.margaret  anderson.chris  autism  bad_data_analysis  bad_science_journalism  ben-david.shai  blackburn.simon  blogged  blogs  books:noted  books:recommended  book_reviews  bootstrap  cats  causal_inference  cavalli-sforza  classifiers  clustering  cold_war  comics  computational_statistics  context-free_grammars  cross-validation  data  databases  data_mining  devroye.luc  dimension_reduction  dinosaur_comics  econometrics  ensemble_methods  epidemiology  epistemology  error_in_variables  estimation  evisceration  fact_checking  falkenstein.eric  fascism  finance  fry.ben  funny:geeky  funny:malicious  funny:tasteless  gelman.andrew  genetics  genomics  goldbarth.arthur  goldberg.jonah  grammar_induction  graphical_models  harris.zellig  have_read  hierarchical_models  hill.jennifer  history_of_mathematics  human_genetics  inference_to_latent_objects  information_retrieval  instrumental_variables  intellectual_property  interface_design  iq  kleinberg.jon  langford.john  learning_theory  liberalism  linear_regression  linguistics  lolcats  machine_learning  market_bubbles  markov_models  mathematical_logic  mathematics  medicine  modeling  multiple_comparisons  neal.radford  networks  network_data_analysis  nonparametrics  norvig.peter  occams_razor  optimization  pearl.judea  philosophy_of_science  photos  planning  please_give_me_strength  poetry  prediction  principal_components  re:AoS_project  re:g_paper  regression  risk_assessment  risk_vs_uncertainty  salakhutdinov.ruslan  sex_differences  sloths  social_networks  sokal.alan  spatial_statistics  state-space_models  statistical_inference_for_stochastic_processes  statistics  stepping_stone_model  stochastic_processes  story_of_my_life  television  the_french_disease  the_nightmare_from_which_we_are_trying_to_awake  time_series  to:blog  to:NB  tomasky.michael  to_read  to_teach  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:statcomp  to_teach:undergrad-ADA  truth  tutorials  unintended_consequences  ussr  utter_stupidity  vapnik.vladimir  via:arsyed  via:arthegall  via:shivak  via:vielmetti  web  why_oh_why_cant_we_have_a_better_academic_publishing_system  why_oh_why_cant_we_have_a_better_press_corps  wolfram.stephen  wolfram_alpha  wolpert.david  yajima.masano  zhang.tong 

Copy this bookmark:



description:


tags: