cshalizi + causal_inference 95
Estimating the Causal Effects of Social Interaction with Endogenous Networks
13 days ago by cshalizi
"Identifying causal effects attributable to network membership is a key challenge in empirical studies of social networks. In this article, we examine the consequences of endogeneity for inferences about the effects of networks on network members’ behavior. Using the House office lottery (in which newly elected members select their office spaces in a randomly chosen order) as an instrumental variable to estimate the causal impact of legislative networks on roll call behavior and cosponsorship decisions in the 105th–112th Houses, we find no evidence that office proximity affects patterns of legislative behavior. These results contrast with decades of congressional scholarship and recent empirical studies. Our analysis demonstrates the importance of accounting for selection processes and omitted variables in estimating the causal impact of networks."
to:NB
causal_inference
re:critique_of_diffusion
social_influence
congress
network_data_analysis
social_networks
homophily
re:homophily_and_confounding
13 days ago by cshalizi
[1205.0241] Counterfactual Graphical Models for Mediation Analysis via Path-Specific Effects
17 days ago by cshalizi
"Potential outcome counterfactuals represent variation in the outcome of interest after a hypothetical treatment or intervention is performed. Causal graphical models are a concise, intuitive way of representing causal assumptions, including independence constraints among such counterfactuals. Much of modern causal inference is concerned with expressing cause effect relationships of interest in counterfactual form, showing how the resulting counterfactuals can be identified (that is expressed in terms of available data, using domain-specific causal assumptions), and subsequently estimated using statistical methods. In this paper we will use causal graphical models to analyze the identification problem of the so-called emph{path-specific effects}, that is effects of treatment on outcome along certain specified causal paths. Such effects arise in mediation analysis settings where it's important to distinguish direct and indirect effects of treatment. We review existing results on path-specific effects in the fully observable, static treatment setting, and extend them to settings with time-varying treatments, and latent variables."
to:NB
causal_inference
shpister.ilya
graphical_models
17 days ago by cshalizi
[1203.3504] On Measurement Bias in Causal Inference
18 days ago by cshalizi
"This paper addresses the problem of measurement errors in causal inference and highlights several algebraic and graphical methods for eliminating systematic bias induced by such errors. In particulars, the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models."
to:NB
causal_inference
inference_to_latent_objects
pearl.judea
to_teach:undergrad-ADA
statistics
error_in_variables
via:arthegall
18 days ago by cshalizi
Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies
25 days ago by cshalizi
"We present methods able to predict the presence and strength of conditional and unconditional dependencies (correlations) between two variables Y and Z never jointly measured on the same samples, based on multiple data sets measuring a set of common variables. The algorithms are specializations of prior work on learning causal structures from overlapping variable sets. This problem has also been addressed in the field of statistical matching. The proposed methods are applied to a wide range of domains and are shown to accurately predict the presence of thousands of dependencies. Compared against prototypical statistical matching algorithms and within the scope of our experiments, the proposed algorithms make predictions that are better correlated with the sample estimates of the unknown parameters on test data ; this is particularly the case when the number of commonly measured variables is low.
"The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causally-inspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org."
to:NB
to_read
causal_inference
graphical_models
to_teach:undergrad-ADA
"The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causally-inspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org."
25 days ago by cshalizi
Colombo , Maathuis , Kalisch , Richardson : Learning high-dimensional directed acyclic graphs with latent and selection variables
7 weeks ago by cshalizi
"We consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables. The FCI (Fast Causal Inference) algorithm has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose the new RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct in the asymptotic limit. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."
--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
in_NB
have_read
statistics
graphical_models
causal_inference
sparsity
to_teach:undergrad-ADA
--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
7 weeks ago by cshalizi
[no title]
7 weeks ago by cshalizi
"Conditional independence relations involving latent variables do not necessarily imply observable independences. They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsification and identification. The literature on computing such constraints often involve a deterministic underlying data generating process in a counterfactual framework. If an analyst is ignorant of the nature of the underlying mechanisms then they may wish to use a model which allows the underlying mechanisms to be probabilistic. A method of computation for a weaker model without any determinism is given here and demonstrated for the instrumental variable model, though applicable to other models. The approach is based on the analysis of mappings with convex polytopes in a decision theoretic framework and can be implemented in readily available polyhedral computation software. Well known constraints and bounds are replicated in a probabilistic model and novel ones are computed for instrumental variable models without non-deterministic versions of the randomization, exclusion restriction and monotonicity assumptions respectively."
(From a quick scan, this looks too heavy to actually teach in ADAfaEPoV, but it's so tagged to remind me to include a reference.)
to:NB
causal_inference
partial_identification
statistics
instrumental_variables
to_teach:undergrad-ADA
(From a quick scan, this looks too heavy to actually teach in ADAfaEPoV, but it's so tagged to remind me to include a reference.)
7 weeks ago by cshalizi
Taylor & Francis Online :: Bayesian Nonparametric Modeling for Causal Inference - Journal of Computational and Graphical Statistics - Volume 20, Issue 1
8 weeks ago by cshalizi
"Researchers have long struggled to identify causal effects in nonexperimental settings. Many recently proposed strategies assume ignorability of the treatment assignment mechanism and require fitting two models—one for the assignment mechanism and one for the response surface. This article proposes a strategy that instead focuses on very flexibly modeling just the response surface using a Bayesian nonparametric modeling procedure, Bayesian Additive Regression Trees (BART). BART has several advantages: it is far simpler to use than many recent competitors, requires less guesswork in model fitting, handles a large number of predictors, yields coherent uncertainty intervals, and fluidly handles continuous treatment variables and missing data for the outcome variable. BART also naturally identifies heterogeneous treatment effects. BART produces more accurate estimates of average treatment effects compared to propensity score matching, propensity-weighted estimators, and regression adjustment in the nonlinear simulation situations examined. Further, it is highly competitive in linear settings with the “correct” model, linear regression. Supplemental materials including code and data to replicate simulations and examples from the article as well as methods for population inference are available online."
to:NB
regression
causal_inference
nonparametrics
statistics
hill.jennifer
8 weeks ago by cshalizi
Rainfall and Conflict - Heather Sarsons
11 weeks ago by cshalizi
"Starting with Miguel, Satyanath, and Sergenti (2004), a large literature has used rainfall variation as an instrument to study the impacts of income shocks on civil war and conáict. These studies argue that in agriculturally-dependent regions, negative rain shocks lower income levels, which in turn incites violence. This identiÖcation strategy relies on the assumption that rainfall shocks a§ect conáict only through their impacts on income. I evaluate this exclusion restriction by identifying districts that are downstream from dams in India. In downstream districts, income is much less sensitive to rainfall áuctuations. However, rain shocks remain equally strong predictors of riot incidence in these districts. These results suggest that rainfall a§ects rioting through a channel other than income and cast doubt on the conclusion that income shocks incite riots."
Cute.
to:NB
have_read
instrumental_variables
causal_inference
statistics
to_teach:undergrad-ADA
sociology
to:blog
Cute.
11 weeks ago by cshalizi
[1202.3775] Kernel-based Conditional Independence Test and Application in Causal Discovery
12 weeks ago by cshalizi
"Conditional independence testing is an important problem, especially in Bayesian network learning and causal discovery. Due to the curse of dimensionality, testing for conditional independence of continuous variables is particularly challenging. We propose a Kernel-based Conditional Independence test (KCI-test), by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis of conditional independence. The proposed method is computationally efficient and easy to implement. Experimental results show that it outperforms other methods, especially when the conditioning set is large or the sample size is not very large, in which case other methods encounter difficulties."
statistics
kernel_estimators
independence_testing
hypothesis_testing
causal_inference
in_NB
have_read
to:blog
to_teach:undergrad-ADA
12 weeks ago by cshalizi
[math/0410271] Statistical modeling of causal effects in continuous time
12 weeks ago by cshalizi
"This article studies the estimation of the causal effect of a time-varying treatment on time-to-an-event or on some other continuously distributed outcome. The paper applies to the situation where treatment is repeatedly adapted to time-dependent patient characteristics. The treatment effect cannot be estimated by simply conditioning on these time-dependent patient characteristics, as they may themselves be indications of the treatment effect. This time-dependent confounding is common in observational studies. Robins [(1992) Biometrika 79 321--334, (1998b) Encyclopedia of Biostatistics 6 4372--4389] has proposed the so-called structural nested models to estimate treatment effects in the presence of time-dependent confounding. In this article we provide a conceptual framework and formalization for structural nested models in continuous time. We show that the resulting estimators are consistent and asymptotically normal. Moreover, as conjectured in Robins [(1998b) Encyclopedia of Biostatistics 6 4372--4389], a test for whether treatment affects the outcome of interest can be performed without specifying a model for treatment effect. We illustrate the ideas in this article with an example."
to:NB
statistics
causal_inference
time_series
12 weeks ago by cshalizi
"Trygve Haavelmo and the Emergence of Causal Calculus" (Judea Pearl, 2011)
february 2012 by cshalizi
"Haavelmo was the first to recognize the capacity of economic models to guide poli- cies. This paper describes some of the barriers that Haavelmo’s ideas have had (and still have) to overcome, and lays out a logical framework for capturing the relationships between theory, data and policy questions. The mathematical tools that emerge from this framework now enable investigators to answer complex policy and counterfactual questions using embarrassingly simple routines, some by mere inspection of the model’s structure. Several such problems are illustrated by examples, including misspecification tests, identification, mediation and introspection."
to:NB
causal_inference
economics
econometrics
haavelmo.trygve
pearl.judea
graphical_models
to_read
february 2012 by cshalizi
Plausibly Exogenous
february 2012 by cshalizi
"Instrumental variable (IV) methods are widely used to identify causal effects in models with endogenous explanatory variables. Often the instrument exclusion restriction that underlies the validity of the usual IV inference is suspect; that is, instruments are only plausibly exogenous. We present practical methods for performing inference while relaxing the exclusion restriction. We illustrate the approaches with empirical examples that examine the effect of 401(k) participation on asset accumulation, price elasticity of demand for margarine, and returns to schooling. We find that inference is informative even with a substantial relaxation of the exclusion restriction in two of the three cases."
to:NB
to_read
causal_inference
regression
statistics
economics
social_science_methodology
instrumental_variables
to_teach:undergrad-ADA
hansen.christian
february 2012 by cshalizi
[1201.0224] Estimation of Treatment Effects with High-Dimensional Controls
january 2012 by cshalizi
"We propose methods for inference on the average effect of a treatment on a scalar outcome in the presence of very many controls. Our setting is a partially linear regression model containing the treatment/policy variable and a large number $p$ of controls or series terms, with $p$ that is possibly much larger than the sample size $n$, but where only $s < n$ unknown controls or series terms are needed to approximate the regression function accurately. The latter sparsity condition makes it possible to estimate the entire regression function as well as the average treatment effect by selecting an approximately the right set of controls using Lasso and related methods. We develop estimation and inference methods for the average treatment effect in this setting, proposing a novel "post double selection" method that provides attractive inferential and estimation properties. In our analysis, in order to cover realistic applications, we expressly allow for imperfect selection of the controls and account for the impact of selection errors on estimation and inference. In order to cover typical applications in economics, we employ the selection methods designed to deal with non-Gaussian and heteroscedastic disturbances. We illustrate the use of new methods with numerical simulations and an application to the effect of abortion on crime rates."
to:NB
to_teach:undergrad-ADA
regression
causal_inference
lasso
sparsity
econometrics
instrumental_variables
hansen.christian
january 2012 by cshalizi
If correlation doesn’t imply causation, then what does? | DDI
january 2012 by cshalizi
Michael preaches the Gospel According to Pearl; and very nicely too. (I would dispute however that DAGs don't give us a handle on mechanisms.)
causal_inference
graphical_models
statistics
causality
nielsen.michael
kith_and_kin
january 2012 by cshalizi
Mechanisms, Types, and Abstractions
january 2012 by cshalizi
"Machamer, Darden, and Craver’s account of the nature and role of mechanisms in the special sciences has been very influential. Unfortunately, a confusing array of ontic, epistemic, and pragmatic distinctions is required to individuate their mechanisms, mechanism schemata, and mechanism sketches. I diagnose this as a conflation of token-level causal relations with type-level relations. I propose instead that a mechanism is an abstraction that relates entity types and activity types on the model of a directed graph. Mechanisms have an ontic status distinct from the causal chains of token entities and token activities that instantiate them."
to:NB
explanation_by_mechanisms
causal_inference
philosophy_of_science
to_teach:complexity-and-inference
january 2012 by cshalizi
The Problem of Piecemeal Induction - JSTOR: Philosophy of Science, Vol. 78, No. 5 (December <span class="smallcaps">2011</span>), pp. 864-874
january 2012 by cshalizi
"I argue that, in causal inference from many observational studies, the piecemeal collection of data can cause underdetermination, even if arbitrarily large amounts of reliable data are available. Two theorems reveal that, for any variable set V, there are causal theories over V that can be distinguished if and only if all variables are simultaneously measured. These results entail that, a priori, one cannot know which observational studies will be most informative with respect to the true causal theory describing V. Hence, scientific institutions may need to play a larger role in coordinating differing research programs."
to:NB
kith_and_kin
causal_inference
philosophy_of_science
mayo-wilson.conor
january 2012 by cshalizi
OMFG Exogenous Variation! Or, Can You Find Good Nails When You Find an Indonesian Politics Hammer | Indolaysia
indonesia causal_inference political_economy instrumental_variables development_economics social_science_methodology to_teach:undergrad-ADA via:henry_farrell in_NB to:blog
december 2011 by cshalizi
indonesia causal_inference political_economy instrumental_variables development_economics social_science_methodology to_teach:undergrad-ADA via:henry_farrell in_NB to:blog
december 2011 by cshalizi
Instruments, Randomization, and Learning about Development (Deaton, 2010)
december 2011 by cshalizi
"There is currently much debate about the effectiveness of foreign aid and about what kind of projects can engender economic development. There is skepticism about the ability of econometric analysis to resolve these issues or of development agencies to learn from their own experience. In response, there is increasing use in development economics of randomized controlled trials (RCTs) to accumulate credible knowl- edge of what works, without overreliance on questionable theory or statistical meth- ods. When RCTs are not possible, the proponents of these methods advocate quasi- randomization through instrumental variable (IV) techniques or natural experiments. I argue that many of these applications are unlikely to recover quantities that are use- ful for policy or understanding: two key issues are the misunderstanding of exogeneity and the handling of heterogeneity. I illustrate from the literature on aid and growth. Actual randomization faces similar problems as does quasi-randomization, notwith- standing rhetoric to the contrary. I argue that experiments have no special ability to produce more credible knowledge than other methods, and that actual experiments are frequently subject to practical problems that undermine any claims to statisti- cal or epistemic superiority. I illustrate using prominent experiments in development and elsewhere. As with IV methods, RCT-based evaluation of projects, without guid- ance from an understanding of underlying mechanisms, is unlikely to lead to scientific progress in the understanding of economic development. I welcome recent trends in development experimentation away from the evaluation of projects and toward the evaluation of theoretical mechanisms."
causal_inference
experimental_economics
experimental_sociology
economics
development_economics
social_science_methodology
explanation_by_mechanisms
to_teach:undergrad-ADA
instrumental_variables
have_read
evisceration
in_NB
randomization
to:blog
december 2011 by cshalizi
Improving Causal Inference: Strengths and Limitations of Natural Experiments (Dunning, 2008)
december 2011 by cshalizi
"Social scientists increasingly exploit natural experiments in their research. This article surveys recent applications in political science, with the goal of illustrating the inferential advantages provided by this research design. When treat- ment assignment is less than “as if” random, studies may be something less than natural experiments, and familiar threats to valid causal inference in observational settings can arise. The author proposes a continuum of plausibility for natural experiments, defined by the extent to which treatment assignment is plausibly “as if” random, and locates several leading studies along this continuum."
in_NB
causal_inference
social_science_methodology
to_teach:undergrad-ADA
instrumental_variables
december 2011 by cshalizi
Process Tracing and Causal Inference - PhilSci-Archive
november 2011 by cshalizi
"How should we judge competing explanatory claims in social science research? How can we make inferences about which alternative explanations are more convincing, in what ways, and to what degree? Case study methods—especially methods of within-case analysis such as process tracing— are an indispensable part of the answer to these questions (George and Bennett 2005: chap. 10). This chapter offers an overview of process tracing as a tool for causal inference, focusing on the study of international relations, an area rich with examples of this approach. In contrast to the subsequent two chapters in this volume (chaps. 11 and 12), where Freedman and Brady analyze micro-level examples, the present chapter explores process tracing in macro studies."
to:NB
causal_inference
november 2011 by cshalizi
Randomization Tests for Distinguishing Social Influence and Homophily Effects
october 2011 by cshalizi
Assumes all homophilous traits are measured, I believe.
re:homophily_and_confounding
homophily
social_influence
causal_inference
network_data_analysis
have_read
neville.jennifer
in_NB
re:stacs
to_teach:complexity-and-inference
bootstrap
october 2011 by cshalizi
Six problems for causal inference from fMRI. [Neuroimage. 2010] - PubMed - NCBI
october 2011 by cshalizi
"Neuroimaging (e.g. fMRI) data are increasingly used to attempt to identify not only brain regions of interest (ROIs) that are especially active during perception, cognition, and action, but also the qualitative causal relations among activity in these regions (known as effective connectivity; Friston, 1994). Previous investigations and anatomical and physiological knowledge may somewhat constrain the possible hypotheses, but there often remains a vast space of possible causal structures. To find actual effective connectivity relations, search methods must accommodate indirect measurements of nonlinear time series dependencies, feedback, multiple subjects possibly varying in identified regions of interest, and unknown possible location-dependent variations in BOLD response delays. We describe combinations of procedures that under these conditions find feed-forward sub-structure characteristic of a group of subjects. The method is illustrated with an empirical data set and confirmed with simulations of time series of non-linear, randomly generated, effective connectivities, with feedback, subject to random differences of BOLD delays, with regions of interest missing at random for some subjects, measured with noise approximating the signal to noise ratio of the empirical data." PDF: http://psychology.rutgers.edu/~jose/Six_problems.pdf
fmri
causal_inference
neural_data_analysis
hanson.stephen_jose
glymour.clark
re:functional_communities
october 2011 by cshalizi
Robustification of the PC Algorithm for Directed Acyclic Graphs
october 2011 by cshalizi
"The PC-algorithm was shown to be a powerful method for estimating the equivalence class of a potentially very high-dimensional acyclic directed graph (DAG) with the corresponding Gaussian distribution. Here we propose a computationally eficient robustification of the PC-algorithm and prove its consistency. Furthermore, we compare the robustified and standard version of the PC-algorithm on simulated data using the new corresponding R package pcalg."
statistics
causal_inference
graphical_models
buhlmann.peter
in_NB
to_read
to_teach:data-mining
to_teach:undergrad-ADA
kalisch.markus
october 2011 by cshalizi
[1110.0718] Directed information and Pearl's causal calculus
october 2011 by cshalizi
"Probabilistic graphical models are a fundamental tool in statistics, machine learning, signal processing, and control. When such a model is defined on a directed acyclic graph (DAG), one can assign a partial ordering to the events occurring in the corresponding stochastic system. Based on the work of Judea Pearl and others, these DAG-based "causal factorizations" of joint probability measures have been used for characterization and inference of functional dependencies (causal links). This mostly expository paper focuses on several connections between Pearl's formalism (and in particular his notion of "intervention") and information-theoretic notions of causality and feedback (such as causal conditioning, directed stochastic kernels, and directed information). As an application, we show how conditional directed information can be used to develop an information-theoretic version of Pearl's "back-door" criterion for identifiability of causal effects from passive observations. This suggests that the back-door criterion can be thought of as a causal analog of statistical sufficiency."
graphical_models
causality
causal_inference
information_theory
statistics
raginsky.maxim
in_NB
to_read
kith_and_kin
sufficiency
october 2011 by cshalizi
Causal Analysis in Theory and Practice » Comments on an article by Grice, Shlimgen and Barrett (GSB): “Regarding Causation and Judea Pearl’s Mediation Formula”
october 2011 by cshalizi
Uncle Judea sounds a bit testy in this one, but no doubt anyone would be if they had to keep swatting down such pathetic misunderstandings passing for objections.
causality
structural_equations
causal_inference
pearl.judea
october 2011 by cshalizi
[1104.5617] Learning high-dimensional directed acyclic graphs with latent and selection variables
september 2011 by cshalizi
"We consider the problem of learning causal information between random variables in directed acyclic graph (DAGs) when allowing arbitrarily many latent and selection variables. The FCI algorithm (Spirtes et al., 1999) has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose a new algorithm, the RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."
have_read
to_teach:undergrad-ADA
graphical_models
causal_inference
in_NB
kalisch.markus
richardson.thomas_s.
september 2011 by cshalizi
ACIC 2011 - a set on Flickr
june 2011 by cshalizi
We all look far more intellectual in B&W than I remember feeling. And we keep talking with our hands!
conferences
causal_inference
photos
june 2011 by cshalizi
Reason Foundation - No Booze? You May Lose
april 2011 by cshalizi
Exercise for the student: Devise at least two reasons why the causality might run from high income to frequent social drinking, rather than vice versa. (This is I think too elementary to make a good problem for ADA.)
bad_data_analysis
booze
via:tony_lin
causal_inference
to_teach:undergrad-ADA
april 2011 by cshalizi
Natural "Natural Experiments" in Economics
april 2011 by cshalizi
Shorter: I am sickened by the weakness of your instruments.
instrumental_variables
causal_inference
to_teach:undergrad-ADA
have_read
in_NB
economics
april 2011 by cshalizi
Revisiting the Value of Elite Colleges - NYTimes.com
february 2011 by cshalizi
Conversation with Kristina suggests an alternative hypothesis: going to a Big Name school raises your income in every profession; students have a target income level, and would rather do something socially redeeming/fun if it doesn't cost them too much; therefore students who go to Big Name schools don't earn more, on average, but might have a broader range of jobs. Testable... Anyway, there are obviously big problems of self-selection and self-regulation involved here.
education
academia
economics
class_struggles_in_america
causal_inference
via:klk
february 2011 by cshalizi
CRAN - Package MatchIt
february 2011 by cshalizi
"MatchIt preprocesses data by selecting approximate matched samples of the treated and control groups with similar covariate distributions, drawing on a large variety of matching methods. After preprocessing data with MatchIt, whatever standard parametric technique one might have used without preprocessing can be used, but the results will be far less model dependent."
I want to teach _some_ matching methods in 402, but I definitely don't want the kids to program them. This might work...
matching
causal_inference
statistics
to_teach:undergrad-ADA
I want to teach _some_ matching methods in 402, but I definitely don't want the kids to program them. This might work...
february 2011 by cshalizi
Didelez , Kreiner , Keiding : Graphical Models for Inference Under Outcome-Dependent Sampling
january 2011 by cshalizi
Probably way too advanced to actually teach in 402... == http://arxiv.org/abs/1101.0901
selection_bias
graphical_models
causal_inference
statistics
to_teach:undergrad-ADA
didelez.vanessa
january 2011 by cshalizi
Ideas behind their time: formal causal inference? | Ready-to-hand
december 2010 by cshalizi
Reichenbach's _The Direction of Time_ is exhibit A for this thesis.
causal_inference
ideas_behind_their_time
december 2010 by cshalizi
[1010.5720] Information-theoretic inference of common ancestors
november 2010 by cshalizi
"A directed acyclic graph (DAG) partially represents the conditional independence structure among observations of a system if the local Markov condition holds .... In general, there is a whole class of DAGs that represents a given set of conditional independence relations... properties of this class that can be derived from observations of a subsystem only... we prove an information theoretic inequality that allows for the inference of common ancestors of observed parts in ... some unknown larger system.. a large amount of dependence in terms of mutual information among the observations implies the existence of a common ancestor that distributes this information... our result can be seen as a quantitative extension of Reichenbach's Principle of Common Cause... Our conclusions are valid also for non-probabilistic observations such as binary strings, since we state the proof for an axiomatized notion of mutual information that includes the stochastic as well as the algorithmic version."
graphical_models
information_theory
causal_inference
statistics
ay.nihat
have_read
algorithmic_information_theory
november 2010 by cshalizi
Social Science Statistics Blog: Can matching solve endogeneity?
october 2010 by cshalizi
" people who like matching methods ... tend to believe that most confounders can be measured ... and that there aren't a lot of lurking unobservables. ... [P]eople ... who are skeptical of matching ... argue that there will always be problematic unobservables lurking ... [and they] prefer instrumental variables approaches .... [T]he same people who tell me that lurking unobservables are everywhere tend to be fairly comfortable making the ... exclusion restrictions that make IV approaches work. The crazy thing is that just like matching, these assumptions [are] about unobservable causal pathways. The claim that an instrumental variable is valid is the claim that there are no unobserved (or observed) variables linking the instrument to the outcome except through the path of the instrumented variable. ... [P]eople who think that lurking unobservables are everywhere in matching somehow think that all these lurking uobservables go away as soon as you call something an instrument..."
causal_inference
instrumental_variables
matching
to:blog
october 2010 by cshalizi
Samantha Kleinberg
october 2010 by cshalizi
"I have developed a new approach to 1) identifying complex temporal causal relationships from observational time series data 2) finding causes of particular events. The approach centers on representation of causal relationships using probabilistic temporal logic formulas. At the type level, this allows explicit description of the time between cause and effect and automated testing of arbitrarily complex relationships using methods I developed for testing formulas directly in traces (without first inferring a model). After computing the average impact of a cause on its effect, we can use techniques for false discovery control to help determine which of the inferred causes are significant. At the token level, I have recently shown that we may use the significance of the general (type-level) relationships to reason about and assess the significance of potential token causes in a way that allows for incomplete information."
causal_inference
machine_learning
logic
track_down_references
via:albers
october 2010 by cshalizi
A Cautionary Note on the Use of Matching to Estimate Causal Effects: An Empirical Example Comparing Matching Estimates to an Experimental Benchmark — Sociological Methods Research
october 2010 by cshalizi
"...social scientists have increasingly turned to matching [to draw] causal inferences from observational data. Matching compares those who receive a treatment to those with similar background attributes who do not receive a treatment. ... Drawing on a randomized voter mobilization experiment ... compare matching [estimates] to an experimental benchmark. ... enormous sample size .... exactly match each treated subject to 40 untreated subjects. Matching greatly exaggerates the effectiveness of pre-election phone calls encouraging voter participation. ... Matching suggests that another pre-election phone call that encouraged people to wear their seat belts also generated huge increases in voter turnout. ... caution is warranted when applying matching estimators to observational data, particularly when one is uncertain about the potential for biased inference." Ouch!
have_read
to_teach:data-mining
causal_inference
matching
experimental_political_science
evisceration
to:blog
to_teach:undergrad-ADA
october 2010 by cshalizi
[1009.3243] The "Unfriending" Problem: The Consequences of Homophily in Friendship Retention for Causal Estimates of Social Influence
september 2010 by cshalizi
"An increasing number of scholars are using longitudinal social network data to try to obtain estimates of peer or social influence effects. These data may provide additional statistical leverage, but they can introduce new inferential problems. In particular, while the confounding effects of homophily in friendship formation are widely appreciated, homophily in friendship retention may also confound causal estimates of social influence in longitudinal network data. We provide evidence for this claim in a Monte Carlo analysis of the statistical model used by Christakis, Fowler, and their colleagues in numerous articles estimating "contagion" effects in social networks. Our results indicate that homophily in friendship retention induces significant upward bias and decreased coverage levels in the Christakis and Fowler model if there is non-negligible friendship attrition over time."
have_read
social_networks
contagion
influence
network_data_analysis
statistics
causal_inference
nyhan.brendan
noel.hans
re:homophily_and_confounding
in_NB
september 2010 by cshalizi
Graphical Gaussian modelling of multivariate time series with latent variables
august 2010 by cshalizi
Just in time to drop on the head of some stupid neuroscientists I'm refereeing.
time_series
causal_inference
graphical_models
granger_causality
august 2010 by cshalizi
Inferring deterministic causal relations
july 2010 by cshalizi
Best Student Paper at UAI 2010. What would happen if you used this on sequences of values from the Arnold cat map? Could it learn the direction of time?
causal_inference
information_theory
to_read
heard_the_talk
july 2010 by cshalizi
Journal of Econometrics : Identification of peer effects through social networks
may 2010 by cshalizi
Of course, saying "we assume that correlated effects are absent" is, in this context at least, very much a "we assume we have a can opener" move.
network_data_analysis
re:homophily_and_confounding
via:iqss
causal_inference
social_networks
econometrics
re:critique_of_diffusion
have_read
may 2010 by cshalizi
Homophily and Contagion Are Generically Confounded in Observational Social Network Studies (Shalizi and Thomas, 2010)
re:homophily_and_confounding blogged social_networks network_data_analysis causal_inference graphical_models contagion homophily voter_model social_influence confounding identifiability self-centered re:critique_of_diffusion
april 2010 by cshalizi
re:homophily_and_confounding blogged social_networks network_data_analysis causal_inference graphical_models contagion homophily voter_model social_influence confounding identifiability self-centered re:critique_of_diffusion
april 2010 by cshalizi
The Industrial Organization of Rebellion: The Logic of Forced Labor and Child Soldiering
april 2010 by cshalizi
"We investigate one of the world’s most pernicious forms of exploitation: child soldiering. Most theories can be captured by a principal-agent model that incorporates punishments, indoctrination, and age-varying productivity. For rebel leaders ... it is almost always optimal to coerce rather than reward children ... leaders will ... forcibly recruit children when punishment and supervision are cheap, when children’s outside options are poor, and when rebel leaders are resource-constrained. To see which mechanisms dominate in practice, we interview and survey former members of Uganda’s Lord’s Resistance Army, who provide a cruel natural experiment that reveals how children and adults respond to coercive incentives... children are more easily indoctrinated and disoriented than adults, but are less effective guerrillas; hence the optimal targets of coercion are young adolescents. We confirm predications of the model on a new “cross-rebel” dataset and suggest policy solutions."
child_labor
child_soldiering
civil_war
rebellion
political_economy
sociology
depressing
via:henry_farrell
principal-agent
institutions
organizations
causal_inference
april 2010 by cshalizi
Superstar Extinction
april 2010 by cshalizi
"We estimate the magnitude of spillovers generated by 112 academic “superstars” who died prematurely and unexpectedly, thus providing an exogenous source of variation in the structure of their collaborators' coauthorship networks. Following the death of a superstar, we find that collaborators experience, on average, a lasting 5% to 8% decline in their quality-adjusted publication rates. By exploring interactions of the treatment effect with a variety of star, coauthor, and star/coauthor dyad characteristics, we seek to adjudicate between plausible mechanisms that might explain this finding. Taken together, our results suggest that spillovers are circumscribed in idea space, but less so in physical or social space. In particular, superstar extinction reveals the boundaries of the scientific field to which the star contributes—the “invisible college.”"
sociology_of_science
bibliometry
causal_inference
social_life_of_the_mind
april 2010 by cshalizi
[1003.1513] On the trasductive arguments in statistics
march 2010 by cshalizi
"The paper argues that a part of the current statistical discussion is not based on the standard firm foundations of the field. Among the examples we consider are prediction into the future, semi-supervised classification, and causality inference based on observational data." --- I have read this paper, but do not pretend to understand it. (For instance, I really don't get what he's saying about time series.)
statistics
prediction
causal_inference
time_series
have_read
march 2010 by cshalizi
"Wives and Ex-Wives: A New Test for Homogamy Bias in the Widowhood Effect" (Elwert and Christakis)
february 2010 by cshalizi
Clever! But what if wife-at-time-of-death is more similar to the husband than the ex-wife was? (Or had more important common environments.)
causal_inference
have_read
re:homophily_and_confounding
elwert.felix
christakis.nicholas
homogamy
february 2010 by cshalizi
Reversals of fortune: path dependency, problem solving, and temporal cases
january 2010 by cshalizi
"Historical reversals highlight a basic methodological problem: is it possible to treat two successive periods both as independent cases to compare for causal analysis and as parts of a single historical sequence? I argue that one strategy for doing so, using models of path dependency, imposes serious limits on explanation. An alternative model which treats successive periods as contrasting solutions for recurrent problems offers two advantages. First, it more effectively combines analytical comparisons of different periods with narratives of causal sequences spanning two or more periods. Second, it better integrates scholarly accounts of historical reversals with actors’ own narratives of the past."
social_science_methodology
path_dependence
historical_explanation
causal_inference
january 2010 by cshalizi
Improving the Reliability of Causal Discovery from Small Data Sets Using Argumentation
december 2009 by cshalizi
"We address the problem of improving the reliability of independence-based causal discovery algorithms that results from the execution of statistical independence tests on small data sets, which typically have low reliability. We model the problem as a knowledge base containing a set of independence facts that are related through Pearl's well-known axioms. Statistical tests on finite data sets may result in errors in these tests and inconsistencies in the knowledge base. We resolve these inconsistencies through the use of an instance of the class of defeasible logics called argumentation, augmented with a preference function, that is used to reason about and possibly correct errors in these tests. This results in a more robust conditional independence test, called an argumentative independence test. Our experimental evaluation shows clear positive improvements in the accuracy of argumentative over purely statistical tests."
logic
causal_inference
graphical_models
machine_learning
statistics
to_read
december 2009 by cshalizi
Bayesian Network Structure Learning by Recursive Autonomy Identification
december 2009 by cshalizi
"We propose the recursive autonomy identification (RAI) algorithm for constraint-based (CB) Bayesian network structure learning. The RAI algorithm learns the structure by sequential application of conditional independence (CI) tests, edge direction and structure decomposition into autonomous sub-structures. The sequence of operations is performed recursively for each autonomous sub-structure while simultaneously increasing the order of the CI test. While other CB algorithms d-separate structures and then direct the resulted undirected graph, the RAI algorithm combines the two processes from the outset and along the procedure. By this means and due to structure decomposition, learning a structure using RAI requires a smaller number of CI tests of high orders. This reduces the complexity and run-time of the algorithm and increases the accuracy by diminishing the curse-of-dimensionality."
graphical_models
causal_inference
statistics
machine_learning
to_read
to_teach:data-mining
to_teach:undergrad-ADA
december 2009 by cshalizi
On a Class of Bias-Amplifying Covariates that Endanger Effect Estimates
november 2009 by cshalizi
Those would be _instrumental_ variables. Implications for the collected scholarly works of S. Levitt left as an exercise for the reader.
causal_inference
regression
instrumental_variables
pearl.judea
november 2009 by cshalizi
[0911.0280] Causal Inference on Discrete Data using Additive Noise Models
november 2009 by cshalizi
Extending the idea that forward problems are easier than inverse problems, I presume. (Probably won't actually teach that stuff in 350 this year, owing to time and student level.)
causal_inference
to_read
to_teach:data-mining
janzing.dominik
november 2009 by cshalizi
Causal Inference in Statistics: An Overview (Pearl, 2009)
september 2009 by cshalizi
Described by Uncle Judea as "A new survey paper, gently summarizing everything I know about causation (in only 43 pages)".
causality
causal_inference
statistics
pearl.judea
blogged
have_read
september 2009 by cshalizi
[0908.3177] The impact factor's Matthew effect: a natural experiment in bibliometrics
august 2009 by cshalizi
Cute, yet depressing: "Using an original method for controlling the intrinsic value of papers--identical duplicate papers published in different journals with different impact factors--this paper shows that the journal in which papers are published have a strong influence on their citation rates, as duplicate papers published in high impact journals obtain, on average, twice as much citations as their identical counterparts published in journals with lower impact factors. The intrinsic value of a paper is thus not the only reason a given paper gets cited or not; there is a specific Matthew effect attached to journals and this gives to paper published there an added value over and above their intrinsic quality."
bibliometry
matthew_effect
citation_networks
sociology_of_science
causal_inference
to:blog
in_NB
august 2009 by cshalizi
Estimating the Impact of the Hajj
august 2009 by cshalizi
Caveat: _in Pakistan_. "... compares successful and unsuccessful applicants in a lottery used by Pakistan to allocate Hajj visas. ... We find that participation in the Hajj increases observance of global Islamic practices, such as prayer and fasting, while decreasing participation in localized practices and beliefs, such as the use of amulets and dowry. It increases belief in equality and harmony among ethnic groups and Islamic sects and leads to more favorable attitudes toward women, including greater acceptance of female education and employment. Increased unity within the Islamic world is not accompanied by antipathy toward non-Muslims. Instead, Hajjis show increased belief in peace, and in equality and harmony among adherents of different religions. The evidence suggests that these changes are likely due to exposure to and interaction with Hajjis from around the world, rather than to a changed social role of pilgrims upon return."
islam
causal_inference
pakistan
randomization
in_NB
to_read
august 2009 by cshalizi
Social Interactions and Schooling Decisions
july 2009 by cshalizi
"The aim of this paper is to study whether a child's schooling choices are affected by the schooling choices of other children. Identification is based on a randomized targeted intervention that grants a cash subsidy conditional on school attendance to a subgroup of eligible children within small rural villages in Mexico (PROGRESA). This policy change spills over to ineligible children if social interactions are relevant. Results indicate that the eligible children tend to attend school more frequently, and the ineligible children acquire more schooling when the subsidy is introduced in their local village. Moreover, the overall effect of PROGRESA on eligible children is the sum of a direct effect due to cash transfers and an indirect effect due to changes in peer group schooling. Interestingly, the social interactions effect is almost as important as the direct effect."
social_networks
contagion
causal_inference
education
experimental_sociology
in_NB
to_teach:complexity-and-inference
july 2009 by cshalizi
Tetrad Project Homepage
june 2009 by cshalizi
Have I really not bookmarked this before?
tetrad
causal_inference
graphical_models
machine_learning
statistics
philosophy_of_science
latent_variables
june 2009 by cshalizi
related tags
academia ⊕ algorithmic_information_theory ⊕ anthropology ⊕ artificial_intelligence ⊕ autism ⊕ ay.nihat ⊕ bad_data_analysis ⊕ bibliometry ⊕ blogged ⊕ books:noted ⊕ book_reviews ⊕ bootstrap ⊕ booze ⊕ buhlmann.peter ⊕ campaign_finance ⊕ causality ⊕ causal_inference ⊖ child_labor ⊕ child_soldiering ⊕ christakis.nicholas ⊕ chu.tianjiao ⊕ citation_networks ⊕ civil_war ⊕ classifiers ⊕ class_struggles_in_america ⊕ climatology ⊕ cognitive_development ⊕ cognitive_science ⊕ colbert.stephen ⊕ comparative_methods ⊕ conferences ⊕ confidence_sets ⊕ confounding ⊕ congress ⊕ contagion ⊕ coveted ⊕ depressing ⊕ development_economics ⊕ didelez.vanessa ⊕ dinardo.john ⊕ dynamical_systems ⊕ econometrics ⊕ economics ⊕ economic_policy ⊕ education ⊕ elwert.felix ⊕ epidemiology ⊕ error_in_variables ⊕ ethnography ⊕ evisceration ⊕ experimental_economics ⊕ experimental_political_science ⊕ experimental_psychology ⊕ experimental_sociology ⊕ explanation_by_mechanisms ⊕ exponential_families ⊕ feature_selection ⊕ fmri ⊕ fowler.james ⊕ freedman.david ⊕ friedman.nir ⊕ funny ⊕ funny:academic ⊕ glymour.clark ⊕ granger_causality ⊕ graphical_models ⊕ greenland.sander ⊕ haavelmo.trygve ⊕ hansen.christian ⊕ hanson.stephen_jose ⊕ have_read ⊕ heard_the_talk ⊕ heuristics ⊕ hill.jennifer ⊕ historical_explanation ⊕ history ⊕ homogamy ⊕ homophily ⊕ hoyer.patrik ⊕ hypothesis_testing ⊕ ideas_behind_their_time ⊕ identifiability ⊕ independence_testing ⊕ independent_component_analysis ⊕ indonesia ⊕ inference_to_latent_objects ⊕ influence ⊕ information_theory ⊕ institutions ⊕ instrumental_variables ⊕ interpretation ⊕ in_NB ⊕ islam ⊕ janzing.dominik ⊕ kalisch.markus ⊕ kernel_estimators ⊕ kith_and_kin ⊕ koller.daphne ⊕ lasso ⊕ latent_variables ⊕ levitt.steven ⊕ levy.jacob ⊕ linear_regression ⊕ logic ⊕ logistic_regression ⊕ machine_learning ⊕ markov_models ⊕ matching ⊕ matthew_effect ⊕ mayo-wilson.conor ⊕ multiple_testing ⊕ networks ⊕ network_data_analysis ⊕ neural_data_analysis ⊕ neville.jennifer ⊕ nielsen.michael ⊕ noel.hans ⊕ nonparametrics ⊕ nyhan.brendan ⊕ organizations ⊕ pakistan ⊕ partial_identification ⊕ particle_filters ⊕ path_dependence ⊕ pearl.judea ⊕ philosophy_of_science ⊕ photos ⊕ please_give_me_strength ⊕ political_economy ⊕ popular_social_science ⊕ prediction ⊕ principal-agent ⊕ propensity_scores ⊕ raginsky.maxim ⊕ randomization ⊕ random_fields ⊕ re:critique_of_diffusion ⊕ re:functional_communities ⊕ re:homophily_and_confounding ⊕ re:stacs ⊕ re:your_favorite_dsge_sucks ⊕ rebellion ⊕ regression ⊕ rhetoric ⊕ richardson.thomas_s. ⊕ robins.james ⊕ rubin.donald ⊕ selection_bias ⊕ self-centered ⊕ shpister.ilya ⊕ social_influence ⊕ social_life_of_the_mind ⊕ social_networks ⊕ social_science_methodology ⊕ sociology ⊕ sociology_of_science ⊕ sparsity ⊕ state-space_models ⊕ statistics ⊕ stochastic_differential_equations ⊕ stochastic_processes ⊕ structural_equations ⊕ sufficiency ⊕ support_vector_machines ⊕ television ⊕ tetrad ⊕ time_series ⊕ to:blog ⊕ to:NB ⊕ to_be_shot_after_a_fair_trial ⊕ to_read ⊕ to_teach:complexity-and-inference ⊕ to_teach:data-mining ⊕ to_teach:undergrad-ADA ⊕ track_down_references ⊕ us_politics ⊕ variable_selection ⊕ via:? ⊕ via:albers ⊕ via:arthegall ⊕ via:dsquared ⊕ via:erindanielson ⊕ via:henry_farrell ⊕ via:iqss ⊕ via:judea_pearl ⊕ via:kevin_drum ⊕ via:klk ⊕ via:neuroanthropology ⊕ via:tony_lin ⊕ voter_model ⊕ wasserman.larry ⊕ wermuth.nanny ⊕Copy this bookmark: