cshalizi + causal_inference   95

Estimating the Causal Effects of Social Interaction with Endogenous Networks
"Identifying causal effects attributable to network membership is a key challenge in empirical studies of social networks. In this article, we examine the consequences of endogeneity for inferences about the effects of networks on network members’ behavior. Using the House office lottery (in which newly elected members select their office spaces in a randomly chosen order) as an instrumental variable to estimate the causal impact of legislative networks on roll call behavior and cosponsorship decisions in the 105th–112th Houses, we find no evidence that office proximity affects patterns of legislative behavior. These results contrast with decades of congressional scholarship and recent empirical studies. Our analysis demonstrates the importance of accounting for selection processes and omitted variables in estimating the causal impact of networks."
to:NB  causal_inference  re:critique_of_diffusion  social_influence  congress  network_data_analysis  social_networks  homophily  re:homophily_and_confounding 
13 days ago by cshalizi
[1205.0241] Counterfactual Graphical Models for Mediation Analysis via Path-Specific Effects
"Potential outcome counterfactuals represent variation in the outcome of interest after a hypothetical treatment or intervention is performed. Causal graphical models are a concise, intuitive way of representing causal assumptions, including independence constraints among such counterfactuals. Much of modern causal inference is concerned with expressing cause effect relationships of interest in counterfactual form, showing how the resulting counterfactuals can be identified (that is expressed in terms of available data, using domain-specific causal assumptions), and subsequently estimated using statistical methods. In this paper we will use causal graphical models to analyze the identification problem of the so-called emph{path-specific effects}, that is effects of treatment on outcome along certain specified causal paths. Such effects arise in mediation analysis settings where it's important to distinguish direct and indirect effects of treatment. We review existing results on path-specific effects in the fully observable, static treatment setting, and extend them to settings with time-varying treatments, and latent variables."
to:NB  causal_inference  shpister.ilya  graphical_models 
17 days ago by cshalizi
[1203.3504] On Measurement Bias in Causal Inference
"This paper addresses the problem of measurement errors in causal inference and highlights several algebraic and graphical methods for eliminating systematic bias induced by such errors. In particulars, the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models."
to:NB  causal_inference  inference_to_latent_objects  pearl.judea  to_teach:undergrad-ADA  statistics  error_in_variables  via:arthegall 
18 days ago by cshalizi
Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies
"We present methods able to predict the presence and strength of conditional and unconditional dependencies (correlations) between two variables Y and Z never jointly measured on the same samples, based on multiple data sets measuring a set of common variables. The algorithms are specializations of prior work on learning causal structures from overlapping variable sets. This problem has also been addressed in the field of statistical matching. The proposed methods are applied to a wide range of domains and are shown to accurately predict the presence of thousands of dependencies. Compared against prototypical statistical matching algorithms and within the scope of our experiments, the proposed algorithms make predictions that are better correlated with the sample estimates of the unknown parameters on test data ; this is particularly the case when the number of commonly measured variables is low.
"The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causally-inspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org."
to:NB  to_read  causal_inference  graphical_models  to_teach:undergrad-ADA 
25 days ago by cshalizi
Colombo , Maathuis , Kalisch , Richardson : Learning high-dimensional directed acyclic graphs with latent and selection variables
"We consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables. The FCI (Fast Causal Inference) algorithm has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose the new RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct in the asymptotic limit. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."

--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
in_NB  have_read  statistics  graphical_models  causal_inference  sparsity  to_teach:undergrad-ADA 
7 weeks ago by cshalizi
[no title]
"Conditional independence relations involving latent variables do not necessarily imply observable independences. They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsification and identification. The literature on computing such constraints often involve a deterministic underlying data generating process in a counterfactual framework. If an analyst is ignorant of the nature of the underlying mechanisms then they may wish to use a model which allows the underlying mechanisms to be probabilistic. A method of computation for a weaker model without any determinism is given here and demonstrated for the instrumental variable model, though applicable to other models. The approach is based on the analysis of mappings with convex polytopes in a decision theoretic framework and can be implemented in readily available polyhedral computation software. Well known constraints and bounds are replicated in a probabilistic model and novel ones are computed for instrumental variable models without non-deterministic versions of the randomization, exclusion restriction and monotonicity assumptions respectively."

(From a quick scan, this looks too heavy to actually teach in ADAfaEPoV, but it's so tagged to remind me to include a reference.)
to:NB  causal_inference  partial_identification  statistics  instrumental_variables  to_teach:undergrad-ADA 
7 weeks ago by cshalizi
Taylor & Francis Online :: Bayesian Nonparametric Modeling for Causal Inference - Journal of Computational and Graphical Statistics - Volume 20, Issue 1
"Researchers have long struggled to identify causal effects in nonexperimental settings. Many recently proposed strategies assume ignorability of the treatment assignment mechanism and require fitting two models—one for the assignment mechanism and one for the response surface. This article proposes a strategy that instead focuses on very flexibly modeling just the response surface using a Bayesian nonparametric modeling procedure, Bayesian Additive Regression Trees (BART). BART has several advantages: it is far simpler to use than many recent competitors, requires less guesswork in model fitting, handles a large number of predictors, yields coherent uncertainty intervals, and fluidly handles continuous treatment variables and missing data for the outcome variable. BART also naturally identifies heterogeneous treatment effects. BART produces more accurate estimates of average treatment effects compared to propensity score matching, propensity-weighted estimators, and regression adjustment in the nonlinear simulation situations examined. Further, it is highly competitive in linear settings with the “correct” model, linear regression. Supplemental materials including code and data to replicate simulations and examples from the article as well as methods for population inference are available online."
to:NB  regression  causal_inference  nonparametrics  statistics  hill.jennifer 
8 weeks ago by cshalizi
Rainfall and Conflict - Heather Sarsons
"Starting with Miguel, Satyanath, and Sergenti (2004), a large literature has used rainfall variation as an instrument to study the impacts of income shocks on civil war and conáict. These studies argue that in agriculturally-dependent regions, negative rain shocks lower income levels, which in turn incites violence. This identiÖcation strategy relies on the assumption that rainfall shocks a§ect conáict only through their impacts on income. I evaluate this exclusion restriction by identifying districts that are downstream from dams in India. In downstream districts, income is much less sensitive to rainfall áuctuations. However, rain shocks remain equally strong predictors of riot incidence in these districts. These results suggest that rainfall a§ects rioting through a channel other than income and cast doubt on the conclusion that income shocks incite riots."

Cute.
to:NB  have_read  instrumental_variables  causal_inference  statistics  to_teach:undergrad-ADA  sociology  to:blog 
11 weeks ago by cshalizi
[1202.3775] Kernel-based Conditional Independence Test and Application in Causal Discovery
"Conditional independence testing is an important problem, especially in Bayesian network learning and causal discovery. Due to the curse of dimensionality, testing for conditional independence of continuous variables is particularly challenging. We propose a Kernel-based Conditional Independence test (KCI-test), by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis of conditional independence. The proposed method is computationally efficient and easy to implement. Experimental results show that it outperforms other methods, especially when the conditioning set is large or the sample size is not very large, in which case other methods encounter difficulties."
statistics  kernel_estimators  independence_testing  hypothesis_testing  causal_inference  in_NB  have_read  to:blog  to_teach:undergrad-ADA 
12 weeks ago by cshalizi
[math/0410271] Statistical modeling of causal effects in continuous time
"This article studies the estimation of the causal effect of a time-varying treatment on time-to-an-event or on some other continuously distributed outcome. The paper applies to the situation where treatment is repeatedly adapted to time-dependent patient characteristics. The treatment effect cannot be estimated by simply conditioning on these time-dependent patient characteristics, as they may themselves be indications of the treatment effect. This time-dependent confounding is common in observational studies. Robins [(1992) Biometrika 79 321--334, (1998b) Encyclopedia of Biostatistics 6 4372--4389] has proposed the so-called structural nested models to estimate treatment effects in the presence of time-dependent confounding. In this article we provide a conceptual framework and formalization for structural nested models in continuous time. We show that the resulting estimators are consistent and asymptotically normal. Moreover, as conjectured in Robins [(1998b) Encyclopedia of Biostatistics 6 4372--4389], a test for whether treatment affects the outcome of interest can be performed without specifying a model for treatment effect. We illustrate the ideas in this article with an example."
to:NB  statistics  causal_inference  time_series 
12 weeks ago by cshalizi
"Trygve Haavelmo and the Emergence of Causal Calculus" (Judea Pearl, 2011)
"Haavelmo was the first to recognize the capacity of economic models to guide poli- cies. This paper describes some of the barriers that Haavelmo’s ideas have had (and still have) to overcome, and lays out a logical framework for capturing the relationships between theory, data and policy questions. The mathematical tools that emerge from this framework now enable investigators to answer complex policy and counterfactual questions using embarrassingly simple routines, some by mere inspection of the model’s structure. Several such problems are illustrated by examples, including misspecification tests, identification, mediation and introspection."
to:NB  causal_inference  economics  econometrics  haavelmo.trygve  pearl.judea  graphical_models  to_read 
february 2012 by cshalizi
Plausibly Exogenous
"Instrumental variable (IV) methods are widely used to identify causal effects in models with endogenous explanatory variables. Often the instrument exclusion restriction that underlies the validity of the usual IV inference is suspect; that is, instruments are only plausibly exogenous. We present practical methods for performing inference while relaxing the exclusion restriction. We illustrate the approaches with empirical examples that examine the effect of 401(k) participation on asset accumulation, price elasticity of demand for margarine, and returns to schooling. We find that inference is informative even with a substantial relaxation of the exclusion restriction in two of the three cases."
to:NB  to_read  causal_inference  regression  statistics  economics  social_science_methodology  instrumental_variables  to_teach:undergrad-ADA  hansen.christian 
february 2012 by cshalizi
[1201.0224] Estimation of Treatment Effects with High-Dimensional Controls
"We propose methods for inference on the average effect of a treatment on a scalar outcome in the presence of very many controls. Our setting is a partially linear regression model containing the treatment/policy variable and a large number $p$ of controls or series terms, with $p$ that is possibly much larger than the sample size $n$, but where only $s < n$ unknown controls or series terms are needed to approximate the regression function accurately. The latter sparsity condition makes it possible to estimate the entire regression function as well as the average treatment effect by selecting an approximately the right set of controls using Lasso and related methods. We develop estimation and inference methods for the average treatment effect in this setting, proposing a novel "post double selection" method that provides attractive inferential and estimation properties. In our analysis, in order to cover realistic applications, we expressly allow for imperfect selection of the controls and account for the impact of selection errors on estimation and inference. In order to cover typical applications in economics, we employ the selection methods designed to deal with non-Gaussian and heteroscedastic disturbances. We illustrate the use of new methods with numerical simulations and an application to the effect of abortion on crime rates."
to:NB  to_teach:undergrad-ADA  regression  causal_inference  lasso  sparsity  econometrics  instrumental_variables  hansen.christian 
january 2012 by cshalizi
If correlation doesn’t imply causation, then what does? | DDI
Michael preaches the Gospel According to Pearl; and very nicely too. (I would dispute however that DAGs don't give us a handle on mechanisms.)
causal_inference  graphical_models  statistics  causality  nielsen.michael  kith_and_kin 
january 2012 by cshalizi
Mechanisms, Types, and Abstractions
"Machamer, Darden, and Craver’s account of the nature and role of mechanisms in the special sciences has been very influential. Unfortunately, a confusing array of ontic, epistemic, and pragmatic distinctions is required to individuate their mechanisms, mechanism schemata, and mechanism sketches. I diagnose this as a conflation of token-level causal relations with type-level relations. I propose instead that a mechanism is an abstraction that relates entity types and activity types on the model of a directed graph. Mechanisms have an ontic status distinct from the causal chains of token entities and token activities that instantiate them."
to:NB  explanation_by_mechanisms  causal_inference  philosophy_of_science  to_teach:complexity-and-inference 
january 2012 by cshalizi
The Problem of Piecemeal Induction - JSTOR: Philosophy of Science, Vol. 78, No. 5 (December <span class="smallcaps">2011</span>), pp. 864-874
"I argue that, in causal inference from many observational studies, the piecemeal collection of data can cause underdetermination, even if arbitrarily large amounts of reliable data are available. Two theorems reveal that, for any variable set V, there are causal theories over V that can be distinguished if and only if all variables are simultaneously measured. These results entail that, a priori, one cannot know which observational studies will be most informative with respect to the true causal theory describing V. Hence, scientific institutions may need to play a larger role in coordinating differing research programs."
to:NB  kith_and_kin  causal_inference  philosophy_of_science  mayo-wilson.conor 
january 2012 by cshalizi
Instruments, Randomization, and Learning about Development (Deaton, 2010)
"There is currently much debate about the effectiveness of foreign aid and about what kind of projects can engender economic development. There is skepticism about the ability of econometric analysis to resolve these issues or of development agencies to learn from their own experience. In response, there is increasing use in development economics of randomized controlled trials (RCTs) to accumulate credible knowl- edge of what works, without overreliance on questionable theory or statistical meth- ods. When RCTs are not possible, the proponents of these methods advocate quasi- randomization through instrumental variable (IV) techniques or natural experiments. I argue that many of these applications are unlikely to recover quantities that are use- ful for policy or understanding: two key issues are the misunderstanding of exogeneity and the handling of heterogeneity. I illustrate from the literature on aid and growth. Actual randomization faces similar problems as does quasi-randomization, notwith- standing rhetoric to the contrary. I argue that experiments have no special ability to produce more credible knowledge than other methods, and that actual experiments are frequently subject to practical problems that undermine any claims to statisti- cal or epistemic superiority. I illustrate using prominent experiments in development and elsewhere. As with IV methods, RCT-based evaluation of projects, without guid- ance from an understanding of underlying mechanisms, is unlikely to lead to scientific progress in the understanding of economic development. I welcome recent trends in development experimentation away from the evaluation of projects and toward the evaluation of theoretical mechanisms."
causal_inference  experimental_economics  experimental_sociology  economics  development_economics  social_science_methodology  explanation_by_mechanisms  to_teach:undergrad-ADA  instrumental_variables  have_read  evisceration  in_NB  randomization  to:blog 
december 2011 by cshalizi
Improving Causal Inference: Strengths and Limitations of Natural Experiments (Dunning, 2008)
"Social scientists increasingly exploit natural experiments in their research. This article surveys recent applications in political science, with the goal of illustrating the inferential advantages provided by this research design. When treat- ment assignment is less than “as if” random, studies may be something less than natural experiments, and familiar threats to valid causal inference in observational settings can arise. The author proposes a continuum of plausibility for natural experiments, defined by the extent to which treatment assignment is plausibly “as if” random, and locates several leading studies along this continuum."
in_NB  causal_inference  social_science_methodology  to_teach:undergrad-ADA  instrumental_variables 
december 2011 by cshalizi
Process Tracing and Causal Inference - PhilSci-Archive
"How should we judge competing explanatory claims in social science research? How can we make inferences about which alternative explanations are more convincing, in what ways, and to what degree? Case study methods—especially methods of within-case analysis such as process tracing— are an indispensable part of the answer to these questions (George and Bennett 2005: chap. 10). This chapter offers an overview of process tracing as a tool for causal inference, focusing on the study of international relations, an area rich with examples of this approach. In contrast to the subsequent two chapters in this volume (chaps. 11 and 12), where Freedman and Brady analyze micro-level examples, the present chapter explores process tracing in macro studies."
to:NB  causal_inference 
november 2011 by cshalizi
Six problems for causal inference from fMRI. [Neuroimage. 2010] - PubMed - NCBI
"Neuroimaging (e.g. fMRI) data are increasingly used to attempt to identify not only brain regions of interest (ROIs) that are especially active during perception, cognition, and action, but also the qualitative causal relations among activity in these regions (known as effective connectivity; Friston, 1994). Previous investigations and anatomical and physiological knowledge may somewhat constrain the possible hypotheses, but there often remains a vast space of possible causal structures. To find actual effective connectivity relations, search methods must accommodate indirect measurements of nonlinear time series dependencies, feedback, multiple subjects possibly varying in identified regions of interest, and unknown possible location-dependent variations in BOLD response delays. We describe combinations of procedures that under these conditions find feed-forward sub-structure characteristic of a group of subjects. The method is illustrated with an empirical data set and confirmed with simulations of time series of non-linear, randomly generated, effective connectivities, with feedback, subject to random differences of BOLD delays, with regions of interest missing at random for some subjects, measured with noise approximating the signal to noise ratio of the empirical data." PDF: http://psychology.rutgers.edu/~jose/Six_problems.pdf
fmri  causal_inference  neural_data_analysis  hanson.stephen_jose  glymour.clark  re:functional_communities 
october 2011 by cshalizi
Robustification of the PC Algorithm for Directed Acyclic Graphs
"The PC-algorithm was shown to be a powerful method for estimating the equivalence class of a potentially very high-dimensional acyclic directed graph (DAG) with the corresponding Gaussian distribution. Here we propose a computationally eficient robustification of the PC-algorithm and prove its consistency. Furthermore, we compare the robustified and standard version of the PC-algorithm on simulated data using the new corresponding R package pcalg."
statistics  causal_inference  graphical_models  buhlmann.peter  in_NB  to_read  to_teach:data-mining  to_teach:undergrad-ADA  kalisch.markus 
october 2011 by cshalizi
[1110.0718] Directed information and Pearl's causal calculus
"Probabilistic graphical models are a fundamental tool in statistics, machine learning, signal processing, and control. When such a model is defined on a directed acyclic graph (DAG), one can assign a partial ordering to the events occurring in the corresponding stochastic system. Based on the work of Judea Pearl and others, these DAG-based "causal factorizations" of joint probability measures have been used for characterization and inference of functional dependencies (causal links). This mostly expository paper focuses on several connections between Pearl's formalism (and in particular his notion of "intervention") and information-theoretic notions of causality and feedback (such as causal conditioning, directed stochastic kernels, and directed information). As an application, we show how conditional directed information can be used to develop an information-theoretic version of Pearl's "back-door" criterion for identifiability of causal effects from passive observations. This suggests that the back-door criterion can be thought of as a causal analog of statistical sufficiency."
graphical_models  causality  causal_inference  information_theory  statistics  raginsky.maxim  in_NB  to_read  kith_and_kin  sufficiency 
october 2011 by cshalizi
Causal Analysis in Theory and Practice » Comments on an article by Grice, Shlimgen and Barrett (GSB): “Regarding Causation and Judea Pearl’s Mediation Formula”
Uncle Judea sounds a bit testy in this one, but no doubt anyone would be if they had to keep swatting down such pathetic misunderstandings passing for objections.
causality  structural_equations  causal_inference  pearl.judea 
october 2011 by cshalizi
[1104.5617] Learning high-dimensional directed acyclic graphs with latent and selection variables
"We consider the problem of learning causal information between random variables in directed acyclic graph (DAGs) when allowing arbitrarily many latent and selection variables. The FCI algorithm (Spirtes et al., 1999) has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose a new algorithm, the RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."
have_read  to_teach:undergrad-ADA  graphical_models  causal_inference  in_NB  kalisch.markus  richardson.thomas_s. 
september 2011 by cshalizi
ACIC 2011 - a set on Flickr
We all look far more intellectual in B&W than I remember feeling.  And we keep talking with our hands!
conferences  causal_inference  photos 
june 2011 by cshalizi
Reason Foundation - No Booze? You May Lose
Exercise for the student: Devise at least two reasons why the causality might run from high income to frequent social drinking, rather than vice versa.  (This is I think too elementary to make a good problem for ADA.)
bad_data_analysis  booze  via:tony_lin  causal_inference  to_teach:undergrad-ADA 
april 2011 by cshalizi
Revisiting the Value of Elite Colleges - NYTimes.com
Conversation with Kristina suggests an alternative hypothesis: going to a Big Name school raises your income in every profession; students have a target income level, and would rather do something socially redeeming/fun if it doesn't cost them too much; therefore students who go to Big Name schools don't earn more, on average, but might have a broader range of jobs.  Testable... Anyway, there are obviously big problems of self-selection and self-regulation involved here.
education  academia  economics  class_struggles_in_america  causal_inference  via:klk 
february 2011 by cshalizi
CRAN - Package MatchIt
"MatchIt preprocesses data by selecting approximate matched samples of the treated and control groups with similar covariate distributions, drawing on a large variety of matching methods. After preprocessing data with MatchIt, whatever standard parametric technique one might have used without preprocessing can be used, but the results will be far less model dependent."
I want to teach _some_ matching methods in 402, but I definitely don't want the kids to program them.  This might work...
matching  causal_inference  statistics  to_teach:undergrad-ADA 
february 2011 by cshalizi
[1010.5720] Information-theoretic inference of common ancestors
"A directed acyclic graph (DAG) partially represents the conditional independence structure among observations of a system if the local Markov condition holds .... In general, there is a whole class of DAGs that represents a given set of conditional independence relations... properties of this class that can be derived from observations of a subsystem only... we prove an information theoretic inequality that allows for the inference of common ancestors of observed parts in ... some unknown larger system.. a large amount of dependence in terms of mutual information among the observations implies the existence of a common ancestor that distributes this information... our result can be seen as a quantitative extension of Reichenbach's Principle of Common Cause... Our conclusions are valid also for non-probabilistic observations such as binary strings, since we state the proof for an axiomatized notion of mutual information that includes the stochastic as well as the algorithmic version."
graphical_models  information_theory  causal_inference  statistics  ay.nihat  have_read  algorithmic_information_theory 
november 2010 by cshalizi
Social Science Statistics Blog: Can matching solve endogeneity?
" people who like matching methods ... tend to believe that most confounders can be measured ... and that there aren't a lot of lurking unobservables. ... [P]eople ... who are skeptical of matching ... argue that there will always be problematic unobservables lurking ... [and they] prefer instrumental variables approaches .... [T]he same people who tell me that lurking unobservables are everywhere tend to be fairly comfortable making the ... exclusion restrictions that make IV approaches work. The crazy thing is that just like matching, these assumptions [are] about unobservable causal pathways. The claim that an instrumental variable is valid is the claim that there are no unobserved (or observed) variables linking the instrument to the outcome except through the path of the instrumented variable. ... [P]eople who think that lurking unobservables are everywhere in matching somehow think that all these lurking uobservables go away as soon as you call something an instrument..."
causal_inference  instrumental_variables  matching  to:blog 
october 2010 by cshalizi
Samantha Kleinberg
"I have developed a new approach to 1) identifying complex temporal causal relationships from observational time series data 2) finding causes of particular events. The approach centers on representation of causal relationships using probabilistic temporal logic formulas. At the type level, this allows explicit description of the time between cause and effect and automated testing of arbitrarily complex relationships using methods I developed for testing formulas directly in traces (without first inferring a model). After computing the average impact of a cause on its effect, we can use techniques for false discovery control to help determine which of the inferred causes are significant. At the token level, I have recently shown that we may use the significance of the general (type-level) relationships to reason about and assess the significance of potential token causes in a way that allows for incomplete information."
causal_inference  machine_learning  logic  track_down_references  via:albers 
october 2010 by cshalizi
A Cautionary Note on the Use of Matching to Estimate Causal Effects: An Empirical Example Comparing Matching Estimates to an Experimental Benchmark — Sociological Methods Research
"...social scientists have increasingly turned to matching [to draw] causal inferences from observational data. Matching compares those who receive a treatment to those with similar background attributes who do not receive a treatment. ... Drawing on a randomized voter mobilization experiment ... compare matching [estimates] to an experimental benchmark. ... enormous sample size .... exactly match each treated subject to 40 untreated subjects. Matching greatly exaggerates the effectiveness of pre-election phone calls encouraging voter participation. ... Matching suggests that another pre-election phone call that encouraged people to wear their seat belts also generated huge increases in voter turnout. ... caution is warranted when applying matching estimators to observational data, particularly when one is uncertain about the potential for biased inference." Ouch!
have_read  to_teach:data-mining  causal_inference  matching  experimental_political_science  evisceration  to:blog  to_teach:undergrad-ADA 
october 2010 by cshalizi
[1009.3243] The "Unfriending" Problem: The Consequences of Homophily in Friendship Retention for Causal Estimates of Social Influence
"An increasing number of scholars are using longitudinal social network data to try to obtain estimates of peer or social influence effects. These data may provide additional statistical leverage, but they can introduce new inferential problems. In particular, while the confounding effects of homophily in friendship formation are widely appreciated, homophily in friendship retention may also confound causal estimates of social influence in longitudinal network data. We provide evidence for this claim in a Monte Carlo analysis of the statistical model used by Christakis, Fowler, and their colleagues in numerous articles estimating "contagion" effects in social networks. Our results indicate that homophily in friendship retention induces significant upward bias and decreased coverage levels in the Christakis and Fowler model if there is non-negligible friendship attrition over time."
have_read  social_networks  contagion  influence  network_data_analysis  statistics  causal_inference  nyhan.brendan  noel.hans  re:homophily_and_confounding  in_NB 
september 2010 by cshalizi
Inferring deterministic causal relations
Best Student Paper at UAI 2010. What would happen if you used this on sequences of values from the Arnold cat map? Could it learn the direction of time?
causal_inference  information_theory  to_read  heard_the_talk 
july 2010 by cshalizi
Journal of Econometrics : Identification of peer effects through social networks
Of course, saying "we assume that correlated effects are absent" is, in this context at least, very much a "we assume we have a can opener" move.
network_data_analysis  re:homophily_and_confounding  via:iqss  causal_inference  social_networks  econometrics  re:critique_of_diffusion  have_read 
may 2010 by cshalizi
The Industrial Organization of Rebellion: The Logic of Forced Labor and Child Soldiering
"We investigate one of the world’s most pernicious forms of exploitation: child soldiering. Most theories can be captured by a principal-agent model that incorporates punishments, indoctrination, and age-varying productivity. For rebel leaders ... it is almost always optimal to coerce rather than reward children ... leaders will ... forcibly recruit children when punishment and supervision are cheap, when children’s outside options are poor, and when rebel leaders are resource-constrained. To see which mechanisms dominate in practice, we interview and survey former members of Uganda’s Lord’s Resistance Army, who provide a cruel natural experiment that reveals how children and adults respond to coercive incentives... children are more easily indoctrinated and disoriented than adults, but are less effective guerrillas; hence the optimal targets of coercion are young adolescents. We confirm predications of the model on a new “cross-rebel” dataset and suggest policy solutions."
child_labor  child_soldiering  civil_war  rebellion  political_economy  sociology  depressing  via:henry_farrell  principal-agent  institutions  organizations  causal_inference 
april 2010 by cshalizi
Superstar Extinction
"We estimate the magnitude of spillovers generated by 112 academic “superstars” who died prematurely and unexpectedly, thus providing an exogenous source of variation in the structure of their collaborators' coauthorship networks. Following the death of a superstar, we find that collaborators experience, on average, a lasting 5% to 8% decline in their quality-adjusted publication rates. By exploring interactions of the treatment effect with a variety of star, coauthor, and star/coauthor dyad characteristics, we seek to adjudicate between plausible mechanisms that might explain this finding. Taken together, our results suggest that spillovers are circumscribed in idea space, but less so in physical or social space. In particular, superstar extinction reveals the boundaries of the scientific field to which the star contributes—the “invisible college.”"
sociology_of_science  bibliometry  causal_inference  social_life_of_the_mind 
april 2010 by cshalizi
[1003.1513] On the trasductive arguments in statistics
"The paper argues that a part of the current statistical discussion is not based on the standard firm foundations of the field. Among the examples we consider are prediction into the future, semi-supervised classification, and causality inference based on observational data." --- I have read this paper, but do not pretend to understand it. (For instance, I really don't get what he's saying about time series.)
statistics  prediction  causal_inference  time_series  have_read 
march 2010 by cshalizi
"Wives and Ex-Wives: A New Test for Homogamy Bias in the Widowhood Effect" (Elwert and Christakis)
Clever! But what if wife-at-time-of-death is more similar to the husband than the ex-wife was? (Or had more important common environments.)
causal_inference  have_read  re:homophily_and_confounding  elwert.felix  christakis.nicholas  homogamy 
february 2010 by cshalizi
Reversals of fortune: path dependency, problem solving, and temporal cases
"Historical reversals highlight a basic methodological problem: is it possible to treat two successive periods both as independent cases to compare for causal analysis and as parts of a single historical sequence? I argue that one strategy for doing so, using models of path dependency, imposes serious limits on explanation. An alternative model which treats successive periods as contrasting solutions for recurrent problems offers two advantages. First, it more effectively combines analytical comparisons of different periods with narratives of causal sequences spanning two or more periods. Second, it better integrates scholarly accounts of historical reversals with actors’ own narratives of the past."
social_science_methodology  path_dependence  historical_explanation  causal_inference 
january 2010 by cshalizi
Improving the Reliability of Causal Discovery from Small Data Sets Using Argumentation
"We address the problem of improving the reliability of independence-based causal discovery algorithms that results from the execution of statistical independence tests on small data sets, which typically have low reliability. We model the problem as a knowledge base containing a set of independence facts that are related through Pearl's well-known axioms. Statistical tests on finite data sets may result in errors in these tests and inconsistencies in the knowledge base. We resolve these inconsistencies through the use of an instance of the class of defeasible logics called argumentation, augmented with a preference function, that is used to reason about and possibly correct errors in these tests. This results in a more robust conditional independence test, called an argumentative independence test. Our experimental evaluation shows clear positive improvements in the accuracy of argumentative over purely statistical tests."
logic  causal_inference  graphical_models  machine_learning  statistics  to_read 
december 2009 by cshalizi
Bayesian Network Structure Learning by Recursive Autonomy Identification
"We propose the recursive autonomy identification (RAI) algorithm for constraint-based (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous sub-structures. The sequence of operations is performed recursively for each autonomous sub-structure while simultaneously increasing the order of the CI test. While other CB algorithms d-separate structures and then direct the resulted undirected graph, the RAI algorithm combines the two processes from the outset and along the procedure. By this means and due to structure decomposition, learning a structure using RAI requires a smaller number of CI tests of high orders. This reduces the complexity and run-time of the algorithm and increases the accuracy by diminishing the curse-of-dimensionality."
graphical_models  causal_inference  statistics  machine_learning  to_read  to_teach:data-mining  to_teach:undergrad-ADA 
december 2009 by cshalizi
On a Class of Bias-Amplifying Covariates that Endanger Effect Estimates
Those would be _instrumental_ variables. Implications for the collected scholarly works of S. Levitt left as an exercise for the reader.
causal_inference  regression  instrumental_variables  pearl.judea 
november 2009 by cshalizi
[0911.0280] Causal Inference on Discrete Data using Additive Noise Models
Extending the idea that forward problems are easier than inverse problems, I presume. (Probably won't actually teach that stuff in 350 this year, owing to time and student level.)
causal_inference  to_read  to_teach:data-mining  janzing.dominik 
november 2009 by cshalizi
Causal Inference in Statistics: An Overview (Pearl, 2009)
Described by Uncle Judea as "A new survey paper, gently summarizing everything I know about causation (in only 43 pages)".
causality  causal_inference  statistics  pearl.judea  blogged  have_read 
september 2009 by cshalizi
[0908.3177] The impact factor's Matthew effect: a natural experiment in bibliometrics
Cute, yet depressing: "Using an original method for controlling the intrinsic value of papers--identical duplicate papers published in different journals with different impact factors--this paper shows that the journal in which papers are published have a strong influence on their citation rates, as duplicate papers published in high impact journals obtain, on average, twice as much citations as their identical counterparts published in journals with lower impact factors. The intrinsic value of a paper is thus not the only reason a given paper gets cited or not; there is a specific Matthew effect attached to journals and this gives to paper published there an added value over and above their intrinsic quality."
bibliometry  matthew_effect  citation_networks  sociology_of_science  causal_inference  to:blog  in_NB 
august 2009 by cshalizi
Estimating the Impact of the Hajj
Caveat: _in Pakistan_. "... compares successful and unsuccessful applicants in a lottery used by Pakistan to allocate Hajj visas. ... We find that participation in the Hajj increases observance of global Islamic practices, such as prayer and fasting, while decreasing participation in localized practices and beliefs, such as the use of amulets and dowry. It increases belief in equality and harmony among ethnic groups and Islamic sects and leads to more favorable attitudes toward women, including greater acceptance of female education and employment. Increased unity within the Islamic world is not accompanied by antipathy toward non-Muslims. Instead, Hajjis show increased belief in peace, and in equality and harmony among adherents of different religions. The evidence suggests that these changes are likely due to exposure to and interaction with Hajjis from around the world, rather than to a changed social role of pilgrims upon return."
islam  causal_inference  pakistan  randomization  in_NB  to_read 
august 2009 by cshalizi
Social Interactions and Schooling Decisions
"The aim of this paper is to study whether a child's schooling choices are affected by the schooling choices of other children. Identification is based on a randomized targeted intervention that grants a cash subsidy conditional on school attendance to a subgroup of eligible children within small rural villages in Mexico (PROGRESA). This policy change spills over to ineligible children if social interactions are relevant. Results indicate that the eligible children tend to attend school more frequently, and the ineligible children acquire more schooling when the subsidy is introduced in their local village. Moreover, the overall effect of PROGRESA on eligible children is the sum of a direct effect due to cash transfers and an indirect effect due to changes in peer group schooling. Interestingly, the social interactions effect is almost as important as the direct effect."
social_networks  contagion  causal_inference  education  experimental_sociology  in_NB  to_teach:complexity-and-inference 
july 2009 by cshalizi
« earlier      

related tags

academia  algorithmic_information_theory  anthropology  artificial_intelligence  autism  ay.nihat  bad_data_analysis  bibliometry  blogged  books:noted  book_reviews  bootstrap  booze  buhlmann.peter  campaign_finance  causality  causal_inference  child_labor  child_soldiering  christakis.nicholas  chu.tianjiao  citation_networks  civil_war  classifiers  class_struggles_in_america  climatology  cognitive_development  cognitive_science  colbert.stephen  comparative_methods  conferences  confidence_sets  confounding  congress  contagion  coveted  depressing  development_economics  didelez.vanessa  dinardo.john  dynamical_systems  econometrics  economics  economic_policy  education  elwert.felix  epidemiology  error_in_variables  ethnography  evisceration  experimental_economics  experimental_political_science  experimental_psychology  experimental_sociology  explanation_by_mechanisms  exponential_families  feature_selection  fmri  fowler.james  freedman.david  friedman.nir  funny  funny:academic  glymour.clark  granger_causality  graphical_models  greenland.sander  haavelmo.trygve  hansen.christian  hanson.stephen_jose  have_read  heard_the_talk  heuristics  hill.jennifer  historical_explanation  history  homogamy  homophily  hoyer.patrik  hypothesis_testing  ideas_behind_their_time  identifiability  independence_testing  independent_component_analysis  indonesia  inference_to_latent_objects  influence  information_theory  institutions  instrumental_variables  interpretation  in_NB  islam  janzing.dominik  kalisch.markus  kernel_estimators  kith_and_kin  koller.daphne  lasso  latent_variables  levitt.steven  levy.jacob  linear_regression  logic  logistic_regression  machine_learning  markov_models  matching  matthew_effect  mayo-wilson.conor  multiple_testing  networks  network_data_analysis  neural_data_analysis  neville.jennifer  nielsen.michael  noel.hans  nonparametrics  nyhan.brendan  organizations  pakistan  partial_identification  particle_filters  path_dependence  pearl.judea  philosophy_of_science  photos  please_give_me_strength  political_economy  popular_social_science  prediction  principal-agent  propensity_scores  raginsky.maxim  randomization  random_fields  re:critique_of_diffusion  re:functional_communities  re:homophily_and_confounding  re:stacs  re:your_favorite_dsge_sucks  rebellion  regression  rhetoric  richardson.thomas_s.  robins.james  rubin.donald  selection_bias  self-centered  shpister.ilya  social_influence  social_life_of_the_mind  social_networks  social_science_methodology  sociology  sociology_of_science  sparsity  state-space_models  statistics  stochastic_differential_equations  stochastic_processes  structural_equations  sufficiency  support_vector_machines  television  tetrad  time_series  to:blog  to:NB  to_be_shot_after_a_fair_trial  to_read  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:undergrad-ADA  track_down_references  us_politics  variable_selection  via:?  via:albers  via:arthegall  via:dsquared  via:erindanielson  via:henry_farrell  via:iqss  via:judea_pearl  via:kevin_drum  via:klk  via:neuroanthropology  via:tony_lin  voter_model  wasserman.larry  wermuth.nanny 

Copy this bookmark:



description:


tags: