cshalizi + to_teach:undergrad-ada 151
[1205.3208] A New Family of Generalized 3D Cat Maps
12 days ago by cshalizi
"Since the 1990s chaotic cat maps are widely used in data encryption, for their very complicated dynamics within a simple model and desired characteristics related to requirements of cryptography. The number of cat map parameters and the map period length after discretization are two major concerns in many applications for security reasons. In this paper, we propose a new family of 36 distinctive 3D cat maps with different spatial configurations taking existing 3D cat maps [1]-[4] as special cases. Our analysis and comparisons show that this new 3D cat maps family has more independent map parameters and much longer averaged period lengths than existing 3D cat maps. The presented cat map family can be extended to higher dimensional cases."
(to_teach tags for clsses which use the cat map as an example)
to:NB
cat_map
dynamical_systems
cryptography
to_teach:complexity-and-inference
to_teach:statcomp
to_teach:undergrad-ADA
(to_teach tags for clsses which use the cat map as an example)
12 days ago by cshalizi
Likelihood inference for discriminating between long-memory and change-point models - Yau - 2012 - Journal of Time Series Analysis - Wiley Online Library
13 days ago by cshalizi
"We develop a likelihood ratio (LR) test procedure for discriminating between a short-memory time series with a change-point (CP) and a long-memory (LM) time series. Under the null hypothesis, the time series consists of two segments of short-memory time series with different means and possibly different covariance functions. The location of the shift in the mean is unknown. Under the alternative, the time series has no shift in mean but rather is LM. The LR statistic is defined as the normalized log-ratio of the Whittle likelihood between the CP model and the LM model, which is asymptotically normally distributed under the null. The LR test provides a parametric alternative to the CUSUM test proposed by Berkes et al. (2006). Moreover, the LR test is more general than the CUSUM test in the sense that it is applicable to changes in other marginal or dependence features other than a change-in-mean. We show its good performance in simulations and apply it to two data examples."
to:NB
time_series
change-point_problem
long-range_dependence
statistics
to_teach:undergrad-ADA
hypothesis_testing
13 days ago by cshalizi
[1203.3504] On Measurement Bias in Causal Inference
18 days ago by cshalizi
"This paper addresses the problem of measurement errors in causal inference and highlights several algebraic and graphical methods for eliminating systematic bias induced by such errors. In particulars, the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models."
to:NB
causal_inference
inference_to_latent_objects
pearl.judea
to_teach:undergrad-ADA
statistics
error_in_variables
via:arthegall
18 days ago by cshalizi
Clarke , Clarke : Prediction in several conventional contexts
20 days ago by cshalizi
"We review predictive techniques from several traditional branches of statistics. Starting with prediction based on the normal model and on the empirical distribution function, we proceed to techniques for various forms of regression and classification. Then, we turn to time series, longitudinal data, and survival analysis. Our focus throughout is on the mechanics of prediction more than on the properties of predictors."
(to_teach tags are tentative.)
to:NB
prediction
statistics
classifiers
regression
to_teach:undergrad-ADA
to_teach:data-mining
(to_teach tags are tentative.)
20 days ago by cshalizi
Testing parametric conditional distributions using the nonparametric smoothing method
22 days ago by cshalizi
"This paper proposes a new goodness-of-fit test for parametric conditional probability distributions using the nonparametric smoothing methodology. An asymptotic normal distribution is established for the test statistic under the null hypothesis of correct specification of the parametric distribution. The test is shown to have power against local alternatives converging to the null at certain rates. The test can be applied to testing for possible misspecifications in a wide variety of parametric models. A bootstrap procedure is provided for obtaining more accurate critical values for the test. Monte Carlo simulations show that the test has good power against some common alternatives."
to:NB
misspecification
density_estimation
smoothing
statistics
to_teach:undergrad-ADA
22 days ago by cshalizi
Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies
25 days ago by cshalizi
"We present methods able to predict the presence and strength of conditional and unconditional dependencies (correlations) between two variables Y and Z never jointly measured on the same samples, based on multiple data sets measuring a set of common variables. The algorithms are specializations of prior work on learning causal structures from overlapping variable sets. This problem has also been addressed in the field of statistical matching. The proposed methods are applied to a wide range of domains and are shown to accurately predict the presence of thousands of dependencies. Compared against prototypical statistical matching algorithms and within the scope of our experiments, the proposed algorithms make predictions that are better correlated with the sample estimates of the unknown parameters on test data ; this is particularly the case when the number of commonly measured variables is low.
"The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causally-inspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org."
to:NB
to_read
causal_inference
graphical_models
to_teach:undergrad-ADA
"The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causally-inspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org."
25 days ago by cshalizi
"The huge Package for High-dimensional Undirected Graph Estimation in R"
25 days ago by cshalizi
"We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efficiency."
to:NB
to_teach:undergrad-ADA
graphical_models
statistics
kith_and_kin
wasserman.larry
roeder.kathryn
liu.han
25 days ago by cshalizi
README: installing Rgraphviz
27 days ago by cshalizi
Install graphviz, then Rgraphviz, then (?) re-start R. Or at least that worked with the student in office hours. (I swear it's painless on a Mac.)
to_teach:undergrad-ADA
27 days ago by cshalizi
Attractive Models - Kieran Healy
29 days ago by cshalizi
Have I really not bookmarked this before?
p-values
statistics
political_science
social_science_methodology
bad_data_analysis
to_teach:undergrad-ADA
to_teach:data-mining
re:neutral_model_of_inquiry
healy.kieran
29 days ago by cshalizi
Assessing gross domestic product and inflation probability forecasts derived from Bank of England fan charts - Galbraith - 2011 - Journal of the Royal Statistical Society: Series A (Statistics in Society) - Wiley Online Library
6 weeks ago by cshalizi
"Density forecasts, including the pioneering Bank of England ‘fan charts’, are often used to produce forecast probabilities of a particular event. We use the Bank of England's forecast densities to calculate the forecast probability that annual rates of inflation and output growth exceed given thresholds. We subject these implicit probability forecasts to graphical and numerical diagnostic checks. We measure both their calibration and their resolution, providing both statistical and graphical interpretations of the results. The results reinforce earlier evidence on limitations of these forecasts and provide new evidence on their information content and on the relative performance of inflation and gross domestic product growth forecasts. In particular, gross domestic product forecasts show little or no ability to predict periods of low growth beyond the current quarter, in part because of the important role of data revisions."
to:NB
prediction
statistics
calibration
macroeconomics
to_teach:undergrad-ADA
6 weeks ago by cshalizi
[math/0603130] Nonparametric methods for inference in the presence of instrumental variables
6 weeks ago by cshalizi
"We suggest two nonparametric approaches, based on kernel methods and orthogonal series to estimating regression functions in the presence of instrumental variables. For the first time in this class of problems, we derive optimal convergence rates, and show that they are attained by particular estimators. In the presence of instrumental variables the relation that identifies the regression function also defines an ill-posed inverse problem, the ``difficulty'' of which depends on eigenvalues of a certain integral operator which is determined by the joint density of endogenous and instrumental variables. We delineate the role played by problem difficulty in determining both the optimal convergence rate and the appropriate choice of smoothing parameter."
to:NB
to_read
regression
statistics
instrumental_variables
nonparametrics
to_teach:undergrad-ADA
6 weeks ago by cshalizi
Colombo , Maathuis , Kalisch , Richardson : Learning high-dimensional directed acyclic graphs with latent and selection variables
7 weeks ago by cshalizi
"We consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables. The FCI (Fast Causal Inference) algorithm has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose the new RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct in the asymptotic limit. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."
--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
in_NB
have_read
statistics
graphical_models
causal_inference
sparsity
to_teach:undergrad-ADA
--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
7 weeks ago by cshalizi
The benchden Package: Benchmark Densities for Nonparametric Density Estimation
7 weeks ago by cshalizi
"This article describes the benchden package which implements a set of 28 example densities for nonparametric density estimation in R. In addition to the usual functions that evaluate the density, distribution and quantile functions or generate random variates, a function designed to be specifically useful for larger simulation studies has been added. After describing the set of densities and the usage of the package, a small toy example of a simulation study conducted using the benchden package is given."
to:NB
computational_statistics
R
density_estimation
nonparametrics
to_teach:undergrad-ADA
7 weeks ago by cshalizi
[no title]
7 weeks ago by cshalizi
"Conditional independence relations involving latent variables do not necessarily imply observable independences. They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsification and identification. The literature on computing such constraints often involve a deterministic underlying data generating process in a counterfactual framework. If an analyst is ignorant of the nature of the underlying mechanisms then they may wish to use a model which allows the underlying mechanisms to be probabilistic. A method of computation for a weaker model without any determinism is given here and demonstrated for the instrumental variable model, though applicable to other models. The approach is based on the analysis of mappings with convex polytopes in a decision theoretic framework and can be implemented in readily available polyhedral computation software. Well known constraints and bounds are replicated in a probabilistic model and novel ones are computed for instrumental variable models without non-deterministic versions of the randomization, exclusion restriction and monotonicity assumptions respectively."
(From a quick scan, this looks too heavy to actually teach in ADAfaEPoV, but it's so tagged to remind me to include a reference.)
to:NB
causal_inference
partial_identification
statistics
instrumental_variables
to_teach:undergrad-ADA
(From a quick scan, this looks too heavy to actually teach in ADAfaEPoV, but it's so tagged to remind me to include a reference.)
7 weeks ago by cshalizi
[0803.0402] A note on sensitivity of principal component subspaces and the efficient detection of influential observations in high dimensions
8 weeks ago by cshalizi
"In this paper we introduce an influence measure based on second order expansion of the RV and GCD measures for the comparison between unperturbed and perturbed eigenvectors of a symmetric matrix estimator. Example estimators are considered to highlight how this measure compliments recent influence analysis. Importantly, we also show how a sample based version of this measure can be used to accurately and efficiently detect influential observations in practice."
to:NB
principal_components
statistics
to_teach:undergrad-ADA
8 weeks ago by cshalizi
Taylor & Francis Online :: Graphical Diagnostics for Markov Models for Categorical Data - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
8 weeks ago by cshalizi
"Markov models are widely used as a method for describing categorical data that exhibit stationary and nonstationary autocorrelation. However, diagnostic methods are a largely overlooked topic for Markov models. We introduce two types of residuals for this purpose: one for assessing the length of runs between state changes, and the other for assessing the frequency with which the process moves from any given state to the other states. Methods for calculating the sampling distribution of both types of residuals are presented, enabling objective interpretation through graphical summaries. The graphical summaries are formed using a modification of the probability integral transformation that is applicable for discrete data. Residuals from simulated datasets are presented to demonstrate when the model is, and is not, adequate for the data. The two types of residuals are used to highlight inadequacies of a model posed for real data on seabed fauna from the marine environment."
to:NB
visual_display_of_quantitative_information
statistics
markov_models
to_teach:undergrad-ADA
8 weeks ago by cshalizi
Stock Market Behavior Predicted by Rat Neurons
8 weeks ago by cshalizi
"We here report for the first time, to the best of our knowledge, rat motor cortex neurons predicting the behavior of the American stock market. We implanted the motor cortex of the brains of rats with silicon electrodes. Using the correlation technique, we monitored the activity of neurons in our rats while simultaneously tracking the activity of stocks in the U.S. stock market."
have_read
to:NB
neuroscience
finance
statistics
prediction
multiple_testing
bad_data_analysis
funny:geeky
funny:malicious
via:mejn
to:blog
to_teach:undergrad-ADA
8 weeks ago by cshalizi
Greetings, Philosophers - Kieran Healy
10 weeks ago by cshalizi
But what _kind_ of bootstrap? It's clustered data (raters x schools), which raises interesting technical issues!
philosophy
academia
data_analysis
healy.kieran
bootstrap
to_teach:undergrad-ADA
10 weeks ago by cshalizi
[0803.2963] Consistency of cross validation for comparing regression procedures
11 weeks ago by cshalizi
"Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property."
to:NB
statistics
to_read
cross-validation
model_selection
nonparametrics
to_teach:undergrad-ADA
re:stacs
11 weeks ago by cshalizi
[0803.2984] Conditional density estimation in a regression setting
11 weeks ago by cshalizi
"Regression problems are traditionally analyzed via univariate characteristics like the regression function, scale function and marginal density of regression errors. These characteristics are useful and informative whenever the association between the predictor and the response is relatively simple. More detailed information about the association can be provided by the conditional density of the response given the predictor. For the first time in the literature, this article develops the theory of minimax estimation of the conditional density for regression settings with fixed and random designs of predictors, bounded and unbounded responses and a vast set of anisotropic classes of conditional densities. The study of fixed design regression is of special interest and novelty because the known literature is devoted to the case of random predictors. For the aforementioned models, the paper suggests a universal adaptive estimator which (i) matches performance of an oracle that knows both an underlying model and an estimated conditional density; (ii) is sharp minimax over a vast class of anisotropic conditional densities; (iii) is at least rate minimax when the response is independent of the predictor and thus a bivariate conditional density becomes a univariate density; (iv) is adaptive to an underlying design (fixed or random) of predictors."
in_NB
statistics
nonparametrics
regression
density_estimation
minimax
to_read
to_teach:undergrad-ADA
11 weeks ago by cshalizi
Rainfall and Conflict - Heather Sarsons
11 weeks ago by cshalizi
"Starting with Miguel, Satyanath, and Sergenti (2004), a large literature has used rainfall variation as an instrument to study the impacts of income shocks on civil war and conáict. These studies argue that in agriculturally-dependent regions, negative rain shocks lower income levels, which in turn incites violence. This identiÖcation strategy relies on the assumption that rainfall shocks a§ect conáict only through their impacts on income. I evaluate this exclusion restriction by identifying districts that are downstream from dams in India. In downstream districts, income is much less sensitive to rainfall áuctuations. However, rain shocks remain equally strong predictors of riot incidence in these districts. These results suggest that rainfall a§ects rioting through a channel other than income and cast doubt on the conclusion that income shocks incite riots."
Cute.
to:NB
have_read
instrumental_variables
causal_inference
statistics
to_teach:undergrad-ADA
sociology
to:blog
Cute.
11 weeks ago by cshalizi
Analyzing Released NYC Value-Added Data Part 3 | Gary Rubinstein's Blog
11 weeks ago by cshalizi
This actually looks more like a job for nonparametric regression, or even relative distribution comparisons, but still...
bad_data_analysis
education
evisceration
to_teach:undergrad-ADA
via:mathbabe
11 weeks ago by cshalizi
Analyzing Released NYC Value-Added Data Part 2 | Gary Rubinstein's Blog
11 weeks ago by cshalizi
It's the comparison of the same teacher in the same year on the same subject but in different grades which clinches the model being an EPIC FAIL.
bad_data_analysis
education
evisceration
to_teach:undergrad-ADA
via:mathbabe
11 weeks ago by cshalizi
Analyzing Released NYC Value-Added Data Part 1 | Gary Rubinstein's Blog
11 weeks ago by cshalizi
To be clear, the bad data analysis is on the part of whatever hacks came p with the value added model being used here. These results are insane.
bad_data_analysis
evisceration
education
via:mathbabe
to_teach:undergrad-ADA
11 weeks ago by cshalizi
[0805.2490] Using statistical smoothing to date medieval manuscripts
12 weeks ago by cshalizi
"We discuss the use of multivariate kernel smoothing methods to date manuscripts dating from the 11th to the 15th centuries, in the English county of Essex. The dataset consists of some 3300 dated and 5000 undated manuscripts, and the former are used as a training sample for imputing dates for the latter. It is assumed that two manuscripts that are ``close'', in a sense that may be defined by a vector of measures of distance for documents, will have close dates. Using this approach, statistical ideas are used to assess ``similarity'', by smoothing among distance measures, and thus to estimate dates for the 5000 undated manuscripts by reference to the dated ones."
Can we get data?
to:NB
statistics
smoothing
kernel_estimators
medieval_european_history
text_mining
to_teach:undergrad-ADA
Can we get data?
12 weeks ago by cshalizi
[1202.3775] Kernel-based Conditional Independence Test and Application in Causal Discovery
12 weeks ago by cshalizi
"Conditional independence testing is an important problem, especially in Bayesian network learning and causal discovery. Due to the curse of dimensionality, testing for conditional independence of continuous variables is particularly challenging. We propose a Kernel-based Conditional Independence test (KCI-test), by constructing an appropriate test statistic and deriving its asymptotic distribution under the null hypothesis of conditional independence. The proposed method is computationally efficient and easy to implement. Experimental results show that it outperforms other methods, especially when the conditioning set is large or the sample size is not very large, in which case other methods encounter difficulties."
statistics
kernel_estimators
independence_testing
hypothesis_testing
causal_inference
in_NB
have_read
to:blog
to_teach:undergrad-ADA
12 weeks ago by cshalizi
[0808.1010] Confidence bands in nonparametric time series regression
12 weeks ago by cshalizi
"We consider nonparametric estimation of mean regression and conditional variance (or volatility) functions in nonlinear stochastic regression models. Simultaneous confidence bands are constructed and the coverage probabilities are shown to be asymptotically correct. The imposed dependence structure allows applications in many linear and nonlinear auto-regressive processes. The results are applied to the S&P 500 Index data."
to:NB
statistics
regression
time_series
confidence_sets
to_teach:undergrad-ADA
12 weeks ago by cshalizi
[0805.3032] Testing earthquake predictions
12 weeks ago by cshalizi
"Statistical tests of earthquake predictions require a null hypothesis to model occasional chance successes. To define and quantify `chance success' is knotty. Some null hypotheses ascribe chance to the Earth: Seismicity is modeled as random. The null distribution of the number of successful predictions -- or any other test statistic -- is taken to be its distribution when the fixed set of predictions is applied to random seismicity. Such tests tacitly assume that the predictions do not depend on the observed seismicity. Conditioning on the predictions in this way sets a low hurdle for statistical significance. Consider this scheme: When an earthquake of magnitude 5.5 or greater occurs anywhere in the world, predict that an earthquake at least as large will occur within 21 days and within an epicentral distance of 50 km. We apply this rule to the Harvard centroid-moment-tensor (CMT) catalog for 2000--2004 to generate a set of predictions. The null hypothesis is that earthquake times are exchangeable conditional on their magnitudes and locations and on the predictions--a common ``nonparametric'' assumption in the literature. We generate random seismicity by permuting the times of events in the CMT catalog. We consider an event successfully predicted only if (i) it is predicted and (ii) there is no larger event within 50 km in the previous 21 days. The $P$-value for the observed success rate is $<0.001$: The method successfully predicts about 5% of earthquakes, far better than `chance,' because the predictor exploits the clustering of earthquakes -- occasional foreshocks -- which the null hypothesis lacks. Rather than condition on the predictions and use a stochastic model for seismicity, it is preferable to treat the observed seismicity as fixed, and to compare the success rate of the predictions to the success rate of simple-minded predictions like those just described. If the proffered predictions do no better than a simple scheme, they have little value."
have_read
to:NB
statistics
geology
prediction
earthquakes
to_teach:undergrad-ADA
to_teach:data-mining
12 weeks ago by cshalizi
[0801.0327] Nonparametric sequential prediction of time series
february 2012 by cshalizi
"Time series prediction covers a vast field of every-day statistical applications in medical, environmental and economic domains. In this paper we develop nonparametric prediction strategies based on the combination of a set of 'experts' and show the universal consistency of these strategies under a minimum of conditions. We perform an in-depth analysis of real-world data sets and show that these nonparametric strategies are more flexible, faster and generally outperform ARMA methods in terms of normalized cumulative prediction error."
in_NB
time_series
nonparametrics
prediction
statistics
to_teach:undergrad-ADA
re:growing_ensemble_project
february 2012 by cshalizi
Bootstrapping clustered data - Field - 2007 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
february 2012 by cshalizi
"Various bootstraps have been proposed for bootstrapping clustered data from one-way arrays. The simulation results in the literature suggest that some of these methods work quite well in practice; the theoretical results are limited and more mixed in their conclusions. For example, McCullagh reached negative conclusions about the use of non-parametric bootstraps for one-way arrays. The purpose of this paper is to extend our understanding of the issues by discussing the effect of different ways of modelling clustered data, the criteria for successful bootstraps used in the literature and extending the theory from functions of the sample mean to include functions of the between and within sums of squares and non-parametric bootstraps to include model-based bootstraps. We determine that the consistency of variance estimates for a bootstrap method depends on the choice of model with the residual bootstrap giving consistency under the transformation model whereas the cluster bootstrap gives consistent estimates under both the transformation and the random-effect model. In addition we note that the criteria based on the distribution of the bootstrap observations are not really useful in assessing consistency."
in_NB
have_read
statistics
bootstrap
to_teach:undergrad-ADA
hierarchical_models
february 2012 by cshalizi
“Economic Shocks and Conflict: The (Absence of?) Evidence from Commodity Prices
february 2012 by cshalizi
"Replication files":
http://www.chrisblattman.com/documents/data/shocks-conflict/Bazzi-Blattman.zip?9d7bd4
to:NB
statistics
to_read
data_analysis
economics
political_economy
war
violence
political_science
blattman.chris
to_teach:undergrad-ADA
http://www.chrisblattman.com/documents/data/shocks-conflict/Bazzi-Blattman.zip?9d7bd4
february 2012 by cshalizi
An Alternative Asymptotic Analysis of Residual-Based Statistics
february 2012 by cshalizi
"This paper presents an alternative method to derive the limiting distribution of residual-based statistics. Our method does not impose an explicit assumption of (asymptotic) smoothness of the statistic of interest with respect to the model's parameters and thus is especially useful in cases where such smoothness is difficult to establish. Instead, we use a locally uniform convergence in distribution condition, which is automatically satisfied by residual-based specification test statistics. To illustrate, we derive the limiting distribution of a new functional form specification test for discrete choice models, as well as a runs-based tests for conditional symmetry in dynamic volatility models." (To-teach tag is tentative.)
in_NB
statistics
regression
model-checking
to_teach:undergrad-ADA
february 2012 by cshalizi
Plausibly Exogenous
february 2012 by cshalizi
"Instrumental variable (IV) methods are widely used to identify causal effects in models with endogenous explanatory variables. Often the instrument exclusion restriction that underlies the validity of the usual IV inference is suspect; that is, instruments are only plausibly exogenous. We present practical methods for performing inference while relaxing the exclusion restriction. We illustrate the approaches with empirical examples that examine the effect of 401(k) participation on asset accumulation, price elasticity of demand for margarine, and returns to schooling. We find that inference is informative even with a substantial relaxation of the exclusion restriction in two of the three cases."
to:NB
to_read
causal_inference
regression
statistics
economics
social_science_methodology
instrumental_variables
to_teach:undergrad-ADA
hansen.christian
february 2012 by cshalizi
Empirical Legal Studies: How the "Cravath System" Created the Bi-Modal Distribution
february 2012 by cshalizi
See if the analysis holds up after tracking down paper and if data is available; if so may make it an assignment (or even an exam?) for uADA.
law
inequality
economics
track_down_references
to_teach:undergrad-ADA
via:unfogged
february 2012 by cshalizi
On a New Method of Graduation
january 2012 by cshalizi
Whittaker introduces spline smoothing in 1922, complete with the Bayesian derivation. Does not use the word "spline", however --- when did that come in?
in_NB
to_teach:undergrad-ADA
splines
smoothing
regression
statistics
have_read
january 2012 by cshalizi
[1201.0224] Estimation of Treatment Effects with High-Dimensional Controls
january 2012 by cshalizi
"We propose methods for inference on the average effect of a treatment on a scalar outcome in the presence of very many controls. Our setting is a partially linear regression model containing the treatment/policy variable and a large number $p$ of controls or series terms, with $p$ that is possibly much larger than the sample size $n$, but where only $s < n$ unknown controls or series terms are needed to approximate the regression function accurately. The latter sparsity condition makes it possible to estimate the entire regression function as well as the average treatment effect by selecting an approximately the right set of controls using Lasso and related methods. We develop estimation and inference methods for the average treatment effect in this setting, proposing a novel "post double selection" method that provides attractive inferential and estimation properties. In our analysis, in order to cover realistic applications, we expressly allow for imperfect selection of the controls and account for the impact of selection errors on estimation and inference. In order to cover typical applications in economics, we employ the selection methods designed to deal with non-Gaussian and heteroscedastic disturbances. We illustrate the use of new methods with numerical simulations and an application to the effect of abortion on crime rates."
to:NB
to_teach:undergrad-ADA
regression
causal_inference
lasso
sparsity
econometrics
instrumental_variables
hansen.christian
january 2012 by cshalizi
[1201.0220] Inference for High-Dimensional Sparse Econometric Models
january 2012 by cshalizi
"This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on $ell_1$-penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression."
to:NB
regression
sparsity
instrumental_variables
econometrics
to_teach:undergrad-ADA
lasso
hansen.christian
january 2012 by cshalizi
Improved Predictions of Lynx Trappings Using a Biological Model
january 2012 by cshalizi
Sweet. (Bayesian estimation seems like overkill here however, especially since predictions are just made from point estimates.)
in_NB
have_read
to_teach:undergrad-ADA
to_teach:complexity-and-inference
re:stacs
dynamical_systems
stochastic_processes
statistical_inference_for_stochastic_processes
statistics
time_series
via:gelman
january 2012 by cshalizi
A Method of Handling Curvilinear Correlation for Any Number of Variables (Ezekiel, 1924)
january 2012 by cshalizi
Additive regression models as a general statistical method, complete with a successive-approximation algorithm that's really damn close to modern back-fitting, and a plea for economists to use it. In 1924!
in_NB
to_teach:undergrad-ADA
regression
additive_models
statistics
have_read
january 2012 by cshalizi
"Sinners in the hands of an angry God": Jonathan Edwards, 1741
january 2012 by cshalizi
"The God that holds you over the pit of hell, much as one holds a spider, or some loathsome insect over the fire, abhors you, and is dreadfully provoked: his wrath towards you burns like fire; he looks upon you as worthy of nothing else, but to be cast into the fire; he is of purer eyes than to bear to have you in his sight; you are ten thousand times more abominable in his eyes, than the most hateful venomous serpent is in ours. You have offended him infinitely more than ever a stubborn rebel did his prince; and yet it is nothing but his hand that holds you from falling into the fire every moment. It is to be ascribed to nothing else, that you did not go to hell the last night; that you was suffered to awake again in this world, after you closed your eyes to sleep. And there is no other reason to be given, why you have not dropped into hell since you arose in the morning, but that God's hand has held you up. There is no other reason to be given why you have not gone to hell, since you have sat here in the house of God, provoking his pure eyes by your sinful wicked manner of attending his solemn worship. Yea, there is nothing else that is to be given as a reason why you do not this very moment drop down into hell."
christianity
edwards.jonathan
something_about_america
preaching_to_the_choir
to_teach:undergrad-ADA
january 2012 by cshalizi
Nonlinear Models of Measurement Errors
december 2011 by cshalizi
"Measurement errors in economic data are pervasive and nontrivial in size. The presence of measurement errors causes biased and inconsistent parameter estimates and leads to erroneous conclusions to various degrees in economic analysis. While linear errors-in-variables models are usually handled with well-known instrumental variable methods, this article provides an overview of recent research papers that derive estimation methods that provide consistent estimates for nonlinear models with measurement errors. We review models with both classical and nonclassical measurement errors, and with misclassification of discrete variables. For each of the methods surveyed, we describe the key ideas for identification and estimation, and discuss its application whenever it is currently available." (Not read, reconsider to_teach tag later.)
to:NB
statistics
latent_variables
inference_to_latent_objects
instrumental_variables
econometrics
to_teach:undergrad-ADA
december 2011 by cshalizi
OMFG Exogenous Variation! Or, Can You Find Good Nails When You Find an Indonesian Politics Hammer | Indolaysia
indonesia causal_inference political_economy instrumental_variables development_economics social_science_methodology to_teach:undergrad-ADA via:henry_farrell in_NB to:blog
december 2011 by cshalizi
indonesia causal_inference political_economy instrumental_variables development_economics social_science_methodology to_teach:undergrad-ADA via:henry_farrell in_NB to:blog
december 2011 by cshalizi
Instruments, Randomization, and Learning about Development (Deaton, 2010)
december 2011 by cshalizi
"There is currently much debate about the effectiveness of foreign aid and about what kind of projects can engender economic development. There is skepticism about the ability of econometric analysis to resolve these issues or of development agencies to learn from their own experience. In response, there is increasing use in development economics of randomized controlled trials (RCTs) to accumulate credible knowl- edge of what works, without overreliance on questionable theory or statistical meth- ods. When RCTs are not possible, the proponents of these methods advocate quasi- randomization through instrumental variable (IV) techniques or natural experiments. I argue that many of these applications are unlikely to recover quantities that are use- ful for policy or understanding: two key issues are the misunderstanding of exogeneity and the handling of heterogeneity. I illustrate from the literature on aid and growth. Actual randomization faces similar problems as does quasi-randomization, notwith- standing rhetoric to the contrary. I argue that experiments have no special ability to produce more credible knowledge than other methods, and that actual experiments are frequently subject to practical problems that undermine any claims to statisti- cal or epistemic superiority. I illustrate using prominent experiments in development and elsewhere. As with IV methods, RCT-based evaluation of projects, without guid- ance from an understanding of underlying mechanisms, is unlikely to lead to scientific progress in the understanding of economic development. I welcome recent trends in development experimentation away from the evaluation of projects and toward the evaluation of theoretical mechanisms."
causal_inference
experimental_economics
experimental_sociology
economics
development_economics
social_science_methodology
explanation_by_mechanisms
to_teach:undergrad-ADA
instrumental_variables
have_read
evisceration
in_NB
randomization
to:blog
december 2011 by cshalizi
Improving Causal Inference: Strengths and Limitations of Natural Experiments (Dunning, 2008)
december 2011 by cshalizi
"Social scientists increasingly exploit natural experiments in their research. This article surveys recent applications in political science, with the goal of illustrating the inferential advantages provided by this research design. When treat- ment assignment is less than “as if” random, studies may be something less than natural experiments, and familiar threats to valid causal inference in observational settings can arise. The author proposes a continuum of plausibility for natural experiments, defined by the extent to which treatment assignment is plausibly “as if” random, and locates several leading studies along this continuum."
in_NB
causal_inference
social_science_methodology
to_teach:undergrad-ADA
instrumental_variables
december 2011 by cshalizi
[1111.6201] Learning a Factor Model via Regularized PCA
december 2011 by cshalizi
"We consider the problem of learning a linear factor model with an unknown number of factors. We propose a regularized form of principal component analysis (PCA) and demonstrate through experiments with synthetic and real data the superiority of resulting estimates to those produced by pre-existing factor analysis approaches. We also establish theoretical results that elucidate the manner in which our algorithm corrects biases induced by conventional PCA. An important feature of our algorithm is its computational efficiency, which is close to that of PCA, which enjoys wide use in large part due to its efficiency."
to:NB
factor_analysis
principal_components
statistics
have_read
to_teach:undergrad-ADA
van_roy.benjamin
december 2011 by cshalizi
Prediction-based regularization using data augmented regression - Statistics and Computing, Volume 22, Number 1
december 2011 by cshalizi
"The role of regularization is to control fitted model complexity and variance by penalizing (or constraining) models to be in an area of model space that is deemed reasonable, thus facilitating good predictive performance. This is typically achieved by penalizing a parametric or non-parametric representation of the model. In this paper we advocate instead the use of prior knowledge or expectations about the predictions of models for regularization. This has the twofold advantage of allowing a more intuitive interpretation of penalties and priors and explicitly controlling model extrapolation into relevant regions of the feature space. This second point is especially critical in high-dimensional modeling situations, where the curse of dimensionality implies that new prediction points usually require extrapolation. We demonstrate that prediction-based regularization can, in many cases, be stochastically implemented by simply augmenting the dataset with Monte Carlo pseudo-data. We investigate the range of applicability of this implementation. An asymptotic analysis of the performance of Data Augmented Regression (DAR) in parametric and non-parametric linear regression, and in nearest neighbor regression, clarifies the regularizing behavior of DAR. We apply DAR to simulated and real data, and show that it is able to control the variance of extrapolation, while maintaining, and often improving, predictive accuracy."
in_NB
to_read
statistics
prediction
estimation
hooker.giles
regression
to_teach:undergrad-ADA
to_teach:data-mining
curse_of_dimensionality
december 2011 by cshalizi
Nonparametric estimation of the link function including variable selection - Gerhard Tutz and Sebastian Petry - Statistics and Computing, Volume 22, Number 2
december 2011 by cshalizi
"Nonparametric methods for the estimation of the link function in generalized linear models are able to avoid bias in the regression parameters. But for the estimation of the link typically the full model, which includes all predictors, has been used. When the number of predictors is large these methods fail since the full model cannot be estimated. In the present article a boosting type method is proposed that simultaneously selects predictors and estimates the link function. The method performs quite well in simulations and real data examples." (The "to teach" tag is conjectural.)
in_NB
regression
variable_selection
statistics
nonparametrics
to_read
to_teach:undergrad-ADA
december 2011 by cshalizi
Lai , Gross , Shen : Evaluating probability forecasts
november 2011 by cshalizi
"Probability forecasts of events are routinely used in climate predictions, in forecasting default probabilities on bank loans or in estimating the probability of a patient’s positive response to treatment. Scoring rules have long been used to assess the efficacy of the forecast probabilities after observing the occurrence, or nonoccurrence, of the predicted events. We develop herein a statistical theory for scoring rules and propose an alternative approach to the evaluation of probability forecasts. This approach uses loss functions relating the predicted to the actual probabilities of the events and applies martingale theory to exploit the temporal structure between the forecast and the subsequent occurrence or nonoccurrence of the event."
in_NB
statistics
prediction
calibration
to_read
to_teach:undergrad-ADA
november 2011 by cshalizi
The World Top Incomes Database - G-MonD, PSE-Paris School of Economics
october 2011 by cshalizi
Possible computational project: code up estimating a Pareto tail for income (all sources) from these statistics, and tracking evolution over time (and perhaps across countries).
Or, an ADA project, suggested by conversation with John B.: look for correlation between (lack of) progressive taxation and job creation, as predicted by the usual right-wing suspects.
inequality
economics
data_sets
to_teach:undergrad-ADA
to_teach:statcomp
Or, an ADA project, suggested by conversation with John B.: look for correlation between (lack of) progressive taxation and job creation, as predicted by the usual right-wing suspects.
october 2011 by cshalizi
Population Value Decomposition, a Framework for the Analysis of Image Populations - Journal of the American Statistical Association - 106(495):775
october 2011 by cshalizi
"Images, often stored in multidimensional arrays, are fast becoming ubiquitous in medical and public health research. Analyzing populations of images is a statistical problem that raises a host of daunting challenges. The most significant challenge is the massive size of the datasets incorporating images recorded for hundreds or thousands of subjects at multiple visits. We introduce the population value decomposition (PVD), a general method for simultaneous dimensionality reduction of large populations of massive images. We show how PVD can be seamlessly incorporated into statistical modeling, leading to a new, transparent, and rapid inferential framework. Our PVD methodology was motivated by and applied to the Sleep Heart Health Study, the largest community-based cohort study of sleep containing more than 85 billion observations on thousands of subjects at two visits. This article has supplementary material online." --- Presumably just some form of SVD for higher-dimensional arrays.
to:NB
principal_components
data_analysis
to_read
to_teach:data-mining
to_teach:undergrad-ADA
october 2011 by cshalizi
Density Estimation in Several Populations With Uncertain Population Membership
october 2011 by cshalizi
"We devise methods to estimate probability density functions of several populations using observations with uncertain population membership, meaning from which population an observation comes is unknown. The probability of an observation being sampled from any given population can be calculated. We develop general estimation procedures and bandwidth selection methods for our setting. We establish large-sample properties and study finite-sample performance using simulation studies. We illustrate our methods with data from a nutrition study."
in_NB
density_estimation
mixture_models
to_teach:undergrad-ADA
to_teach:data-mining
october 2011 by cshalizi
Robustification of the PC Algorithm for Directed Acyclic Graphs
october 2011 by cshalizi
"The PC-algorithm was shown to be a powerful method for estimating the equivalence class of a potentially very high-dimensional acyclic directed graph (DAG) with the corresponding Gaussian distribution. Here we propose a computationally eficient robustification of the PC-algorithm and prove its consistency. Furthermore, we compare the robustified and standard version of the PC-algorithm on simulated data using the new corresponding R package pcalg."
statistics
causal_inference
graphical_models
buhlmann.peter
in_NB
to_read
to_teach:data-mining
to_teach:undergrad-ADA
kalisch.markus
october 2011 by cshalizi
Draw - Google Correlate
october 2011 by cshalizi
So cool: draw a curve free-hand, get the keywords whose time series correlate best with it. I can't go below a correlation of 0.70.
google
information_retrieval
spurious_correlations
to_teach:undergrad-ADA
to_teach:data-mining
to:blog
via:vqv
rademacher_complexity
october 2011 by cshalizi
Reality Checks and Comparisons of Nested Predictive Models - Journal of Business and Economic Statistics - 0(0):1
september 2011 by cshalizi
"This article develops a simple bootstrap method for simulating asymptotic critical values for tests of equal forecast accuracy and encompassing among many nested models. Our method combines elements of fixed regressor and wild bootstraps. We first derive the asymptotic distributions of tests of equal forecast accuracy and encompassing applied to forecasts from multiple models that nest the benchmark model—that is, reality check tests. We then prove the validity of the bootstrap for these tests. Monte Carlo experiments indicate that our proposed bootstrap has better finite-sample size and power than other methods designed for comparison of nonnested models."
statistics
model_checking
model_selection
time_series
bootstrap
to_read
to_teach:undergrad-ADA
encompassing
september 2011 by cshalizi
[1104.5617] Learning high-dimensional directed acyclic graphs with latent and selection variables
september 2011 by cshalizi
"We consider the problem of learning causal information between random variables in directed acyclic graph (DAGs) when allowing arbitrarily many latent and selection variables. The FCI algorithm (Spirtes et al., 1999) has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose a new algorithm, the RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."
have_read
to_teach:undergrad-ADA
graphical_models
causal_inference
in_NB
kalisch.markus
richardson.thomas_s.
september 2011 by cshalizi
http://marketing.wharton.upenn.edu/documents/research/Adoption_Velocity.pdf
august 2011 by cshalizi
Superficial comment, from glancing through the paper: Why oh why would you look at a cloud of data like the scatter plot in Figure 3, and say "This looks like a job for ordinary least squares"? Use a kernel smoother and bootstrap to get confidence bands.
names
diffusion_of_innovations
to_read
sociology
via:gelman
to_teach:undergrad-ADA
august 2011 by cshalizi
"Smooth Regression Analysis" (G. S. Watson, 1964) JSTOR: Sankhyā: The Indian Journal of Statistics, Series A, Vol. 26, No. 4 (Dec., 1964), pp. 359-372
june 2011 by cshalizi
The abstract is great: "Few would deny that the most powerful statistical tool is graph paper. When however there are many observations (and/or many variables) graphical procedures become tedious. It seems to the author that the most characteristic problem for statisticians at the moment is the development of methods for analyzing the data poured out by electronic observing systems. The present paper gives a simple computer method for obtaining a "graph" from a large number of observations."
smoothing
regression
kernel_estimators
data_mining
to_teach:undergrad-ADA
to_teach:data-mining
via:gmg
june 2011 by cshalizi
Principles of Applied Statistics - Academic and Professional Books - Cambridge University Press
may 2011 by cshalizi
"Applied statistics is more than data analysis, but it is easy to lose sight of the big picture. David Cox and Christl Donnelly distil decades of scientific experience into usable principles for the successful application of statistics, showing how good statistical strategy shapes every stage of an investigation. As you advance from research or policy question, to study design, through modelling and interpretation, and finally to meaningful conclusions, this book will be a valuable guide. Over a hundred illustrations from a wide variety of real applications make the conceptual points concrete, illuminating your path and deepening your understanding. This book is essential reading for anyone who makes extensive use of statistical methods in their work."
books:recommended
statistics
data_analysis
to:NB
to_teach:undergrad-ADA
coveted
cox.david_r.
may 2011 by cshalizi
Statistical Prediction Analysis (Aitchison and Dunsmore, 1980)
may 2011 by cshalizi
Ancient, but I should see if there are examples or simple tools worth stealing for ADA.
books:noted
statistics
prediction
to:NB
to_teach:undergrad-ADA
may 2011 by cshalizi
Reason Foundation - No Booze? You May Lose
april 2011 by cshalizi
Exercise for the student: Devise at least two reasons why the causality might run from high income to frequent social drinking, rather than vice versa. (This is I think too elementary to make a good problem for ADA.)
bad_data_analysis
booze
via:tony_lin
causal_inference
to_teach:undergrad-ADA
april 2011 by cshalizi
Dani Rodrik-Research
april 2011 by cshalizi
The "Real Exchange Rate and Economic Growth" paper would make a good exam for undergraduate ADA, but I don't have the time this year to prepare it suitably. Next year.
rodrik.dani
economics
economic_policy
economic_growth
trade
to_teach:undergrad-ADA
april 2011 by cshalizi
Western on Strikes
april 2011 by cshalizi
Missing the union density variable. Wrote to ask about it. Referenced paper is http://www.jstor.org/stable/271022, which seems to me exactly the kind of thing Andy and I should mention in "Philosophy and Practice". --- ETA: Prof. Western wrote back within hours with the union density data, but I'm not sure I can make it public...
to_teach:undergrad-ADA
strikes
data_sets
april 2011 by cshalizi
Natural "Natural Experiments" in Economics
april 2011 by cshalizi
Shorter: I am sickened by the weakness of your instruments.
instrumental_variables
causal_inference
to_teach:undergrad-ADA
have_read
in_NB
economics
april 2011 by cshalizi
[0812.2749] Nonparametric inference of a trend using functional data
april 2011 by cshalizi
I guess I've been more or less presuming this was true. (And I'd have been wrong about the form of the simultaneous CI, actually.) Worth trying to work into the final exam for The Kids?
curve_fitting
gaussian_processes
time_series
statistics
nonparametrics
have_read
confidence_sets
to_teach:undergrad-ADA
april 2011 by cshalizi
related tags
academia ⊕ additive_models ⊕ ahmed.amr ⊕ airoldi.edo ⊕ aligheri.dante ⊕ allometric_scaling ⊕ anderson.norm ⊕ anthropology ⊕ arlot.sylvain ⊕ astrology ⊕ autism ⊕ backfitting ⊕ bad_data_analysis ⊕ blattman.chris ⊕ books:noted ⊕ books:recommended ⊕ bootstrap ⊕ booze ⊕ branching_processes ⊕ buhlmann.peter ⊕ burns.patrick ⊕ calibration ⊕ cat_map ⊕ causality ⊕ causal_inference ⊕ cavalli-sforza ⊕ celisse.alain ⊕ census ⊕ change-point_problem ⊕ christianity ⊕ cities ⊕ classifiers ⊕ clustering ⊕ cobb_douglas_production_function ⊕ computational_statistics ⊕ confidence_sets ⊕ coveted ⊕ cox.david_r. ⊕ cross-validation ⊕ cryptography ⊕ cultural_criticism ⊕ curse_of_dimensionality ⊕ curve_fitting ⊕ data ⊕ data_analysis ⊕ data_mining ⊕ data_sets ⊕ debunking ⊕ decision-making ⊕ delong.brad ⊕ density_estimation ⊕ development_economics ⊕ didelez.vanessa ⊕ diffusion_of_innovations ⊕ dimension_reduction ⊕ dynamical_systems ⊕ earthquakes ⊕ econometrics ⊕ economics ⊕ economic_growth ⊕ economic_history ⊕ economic_policy ⊕ education ⊕ edwards.jonathan ⊕ em_algorithm ⊕ encompassing ⊕ epidemiology ⊕ error_in_variables ⊕ error_statistics ⊕ estimation ⊕ evisceration ⊕ expectation-maximization ⊕ experimental_economics ⊕ experimental_political_science ⊕ experimental_psychology ⊕ experimental_sociology ⊕ explanation_by_mechanisms ⊕ factor_analysis ⊕ finance ⊕ fisher_information ⊕ freedman.david ⊕ freese.jeremy ⊕ funny:academic ⊕ funny:because_its_true ⊕ funny:geeky ⊕ funny:laughing_instead_of_screaming ⊕ funny:malicious ⊕ gailey.jeannine_hall ⊕ gaussian_processes ⊕ generalized_linear_models ⊕ genetics ⊕ geology ⊕ geometry ⊕ goodness-of-fit ⊕ google ⊕ gordon.geoff ⊕ gore.al ⊕ grading ⊕ graphical_models ⊕ great_depression ⊕ handcock.mark ⊕ hansen.bruce ⊕ hansen.christian ⊕ have_read ⊕ hayfield.tristen ⊕ healy.kieran ⊕ heard_the_talk ⊕ heteroskedasticity ⊕ hierarchical_models ⊕ hooker.giles ⊕ human_genetics ⊕ hypothesis_testing ⊕ independence_testing ⊕ indonesia ⊕ inequality ⊕ inference_to_latent_objects ⊕ information_retrieval ⊕ instrumental_variables ⊕ internet ⊕ intro_stats ⊕ in_NB ⊕ kafadar.karen ⊕ kalisch.markus ⊕ kernel_estimators ⊕ kernel_methods ⊕ kith_and_kin ⊕ kolmogorov-smirnov-test ⊕ lafferty.john ⊕ lang.kevin ⊕ lasso ⊕ latent_variables ⊕ law ⊕ learning_theory ⊕ levy.ferdinand ⊕ liberman.mark ⊕ linear_regression ⊕ literary_homage ⊕ liu.han ⊕ logistic_regression ⊕ long-range_dependence ⊕ low-rank_approximation ⊕ machine_learning ⊕ macroeconomics ⊕ markov_models ⊕ matching ⊕ mathematics ⊕ medieval_european_history ⊕ methodological_advice ⊕ methodology ⊕ minimax ⊕ misspecification ⊕ mixture_models ⊕ model-checking ⊕ model_checking ⊕ model_discovery ⊕ model_selection ⊕ morris.martina ⊕ mortgage_crisis ⊕ multiple_testing ⊕ music ⊕ names ⊕ natural_history_of_truthiness ⊕ neuroscience ⊕ neyman_smooth_tests ⊕ nonparametrics ⊕ no_really_via:warrenellis ⊕ occupy_wall_street ⊕ official_statistics ⊕ optimization ⊕ p-values ⊕ partial_identification ⊕ pearl.judea ⊕ philosophy ⊕ photos ⊕ plagiarism ⊕ please_give_me_strength ⊕ poetry ⊕ political_economy ⊕ political_science ⊕ preaching_to_the_choir ⊕ prediction ⊕ principal_components ⊕ programming ⊕ R ⊕ racine.jeffrey ⊕ rademacher_complexity ⊕ randomization ⊕ rauchway.eric ⊕ ravikumar.pradeep ⊕ re:growing_ensemble_project ⊕ re:g_paper ⊕ re:neutral_model_of_inquiry ⊕ re:social-networks-as-sensor-networks ⊕ re:stacs ⊕ reference ⊕ regression ⊕ relative_distributions ⊕ review_papers ⊕ richardson.thomas_s. ⊕ rodrik.dani ⊕ roeder.kathryn ⊕ satire ⊕ selection_bias ⊕ self-promotion ⊕ shanteau.james ⊕ simon.herbert ⊕ sleep ⊕ smola.alex ⊕ smoothing ⊕ snoqualmie_falls ⊕ social_science_methodology ⊕ sociology ⊕ something_about_america ⊕ sparsity ⊕ spatial_statistics ⊕ spectral_methods ⊕ splines ⊕ spurious_correlations ⊕ stability_of_learning ⊕ stark.philip ⊕ statistical_inference_for_stochastic_processes ⊕ statistics ⊕ stepping_stone_model ⊕ stochastic_processes ⊕ strikes ⊕ structural_equations ⊕ survival_analysis ⊕ teaching ⊕ television ⊕ text_mining ⊕ the_continuing_crises ⊕ tibshirani.robert ⊕ tibshirani.ryan ⊕ time_series ⊕ to:blog ⊕ to:NB ⊕ to_read ⊕ to_teach ⊕ to_teach:complexity-and-inference ⊕ to_teach:data-mining ⊕ to_teach:statcomp ⊕ to_teach:undergrad-ADA ⊕ to_teach:undergrad-research ⊕ track_down_references ⊕ trade ⊕ turbulence ⊕ tutorials ⊕ unemployment ⊕ urban_economics ⊕ us_politics ⊕ van_roy.benjamin ⊕ variable_selection ⊕ verzani.john ⊕ via:? ⊕ via:arthegall ⊕ via:erindanielson ⊕ via:fionajay ⊕ via:gelman ⊕ via:gmg ⊕ via:henry_farrell ⊕ via:jhofman ⊕ via:mathbabe ⊕ via:mejn ⊕ via:moritz-heene ⊕ via:nikete ⊕ via:rocha ⊕ via:slaniel ⊕ via:tony_lin ⊕ via:unfogged ⊕ via:vqv ⊕ via:warrenellis ⊕ violence ⊕ visual_display_of_quantitative_information ⊕ volcano ⊕ voting ⊕ war ⊕ wasserman.larry ⊕ world_bank ⊕ yellowstone ⊕Copy this bookmark: