cshalizi + statistics 1020
Wang , Phillips : A specification test for nonlinear nonstationary models
10 days ago by cshalizi
"We provide a limit theory for a general class of kernel smoothed U-statistics that may be used for specification testing in time series regression with nonstationary data. The test framework allows for linear and nonlinear models with endogenous regressors that have autoregressive unit roots or near unit roots. The limit theory for the specification test depends on the self-intersection local time of a Gaussian process. A new weak convergence result is developed for certain partial sums of functions involving nonstationary time series that converges to the intersection local time process. This result is of independent interest and is useful in other applications. Simulations examine the finite sample performance of the test."
to:NB
time_series
non-stationarity
model-checking
statistics
misspecification
10 days ago by cshalizi
Rigollet : Kullback–Leibler aggregation and misspecified generalized linear models
10 days ago by cshalizi
"In a regression setup with deterministic design, we study the pure aggregation problem and introduce a natural extension from the Gaussian distribution to distributions in the exponential family. While this extension bears strong connections with generalized linear models, it does not require identifiability of the parameter or even that the model on the systematic component is true. It is shown that this problem can be solved by constrained and/or penalized likelihood maximization and we derive sharp oracle inequalities that hold both in expectation and with high probability. Finally all the bounds are proved to be optimal in a minimax sense."
to:NB
regression
ensemble_methods
statistics
10 days ago by cshalizi
[1205.3703] Generic chaining and the l1-penalty
11 days ago by cshalizi
"We address the choice of the tuning parameter $lambda$ in $ell_1$-penalized M-estimation. Our main concern is models which are highly nonlinear, such as the Gaussian mixture model. The number of parameters $p$ is moreover large, possibly larger than the number of observations $n$. The generic chaining technique of Talagrand[2005] is tailored for this problem. It leads to the choice $lambda asymp sqrt {log p / n}$, as in the standard Lasso procedure (which concerns the linear model and least squares loss)."
to:NB
to_read
statistics
empirical_processes
high-dimensional_statistics
van_de_geer.sara
11 days ago by cshalizi
Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation - Fearnhead - 2012 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
13 days ago by cshalizi
"Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference."
to:NB
indirect_inference
estimation
statistics
approximate_bayesian_computation
computational_statistics
to_teach:complexity-and-inference
re:stacs
13 days ago by cshalizi
[1205.2609] Which Spatial Partition Trees are Adaptive to Intrinsic Dimension?
13 days ago by cshalizi
"Recent theory work has found that a special type of spatial partition tree - called a random projection tree - is adaptive to the intrinsic dimension of the data from which it is built. Here we examine this same question, with a combination of theory and experiments, for a broader class of trees that includes k-d trees, dyadic trees, and PCA trees. Our motivation is to get a feel for (i) the kind of intrinsic low dimensional structure that can be empirically verified, (ii) the extent to which a spatial partition can exploit such structure, and (iii) the implications for standard statistical tasks such as regression, vector quantization, and nearest neighbor search."
to:NB
decision_trees
prediction
regression
statistics
dimension_reduction
machine_learning
13 days ago by cshalizi
Likelihood inference for discriminating between long-memory and change-point models - Yau - 2012 - Journal of Time Series Analysis - Wiley Online Library
13 days ago by cshalizi
"We develop a likelihood ratio (LR) test procedure for discriminating between a short-memory time series with a change-point (CP) and a long-memory (LM) time series. Under the null hypothesis, the time series consists of two segments of short-memory time series with different means and possibly different covariance functions. The location of the shift in the mean is unknown. Under the alternative, the time series has no shift in mean but rather is LM. The LR statistic is defined as the normalized log-ratio of the Whittle likelihood between the CP model and the LM model, which is asymptotically normally distributed under the null. The LR test provides a parametric alternative to the CUSUM test proposed by Berkes et al. (2006). Moreover, the LR test is more general than the CUSUM test in the sense that it is applicable to changes in other marginal or dependence features other than a change-in-mean. We show its good performance in simulations and apply it to two data examples."
to:NB
time_series
change-point_problem
long-range_dependence
statistics
to_teach:undergrad-ADA
hypothesis_testing
13 days ago by cshalizi
[1205.1828] The Natural Gradient by Analogy to Signal Whitening, and Recipes and Tricks for its Use
18 days ago by cshalizi
"The natural gradient allows for more efficient gradient descent by removing dependencies and biases inherent in a function's parameterization. Several papers present the topic thoroughly and precisely. It remains a very difficult idea to get your head around however. The intent of this note is to provide simple intuition for the natural gradient and its use. We review how an ill conditioned parameter space can undermine learning, introduce the natural gradient by analogy to the more widely understood concept of signal whitening, and present tricks and specific prescriptions for applying the natural gradient to learning problems."
Does this ever mention the phrase "Fisher information"?
to:NB
optimization
statistics
estimation
fisher_information
information_geometry
Does this ever mention the phrase "Fisher information"?
18 days ago by cshalizi
[1203.3504] On Measurement Bias in Causal Inference
18 days ago by cshalizi
"This paper addresses the problem of measurement errors in causal inference and highlights several algebraic and graphical methods for eliminating systematic bias induced by such errors. In particulars, the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models."
to:NB
causal_inference
inference_to_latent_objects
pearl.judea
to_teach:undergrad-ADA
statistics
error_in_variables
via:arthegall
18 days ago by cshalizi
Lai , Huang , Lee : Fixed and random effects selection in nonparametric additive mixed models
19 days ago by cshalizi
"This paper considers the problem of model selection in a nonparametric additive mixed modeling framework. The fixed effects are modeled nonparametrically using truncated series expansions with B-spline basis. Estimation and selection of such nonparametric fixed effects are simultaneously achieved by using the adaptive group lasso methodology, while the random effects are selected by a traditional backward selection mechanism. To facilitate the automatic selection of model dimension, computable expressions for the degrees of freedom for both the fixed and random effects components are derived, and the Bayesian Information criterion (BIC) is used to select the final model choice. Theoretically it is shown that this BIC model selection method is consistent, while computationally a practical algorithm is developed for solving the optimization problem involved. Simulation results show that the proposed methodology is often capable of selecting the correct significant fixed and random effects components, especially when the sample size and/or signal to noise ratio are not too small. The new method is also applied to two real data sets."
to:NB
regression
additive_models
statistics
19 days ago by cshalizi
[1205.1406] Graph Prediction in a Low-Rank and Autoregressive Setting
19 days ago by cshalizi
"We study the problem of prediction for evolving graph data. We formulate the problem as the minimization of a convex objective encouraging sparsity and low-rank of the solution, that reflect natural graph properties. The convex formulation allows to obtain oracle inequalities and efficient solvers. We provide empirical results for our algorithm and comparison with competing methods, and point out two open questions related to compressed sensing and algebra of low-rank and sparse matrices."
to:NB
network_data_analysis
prediction
statistics
low-rank_approximation
19 days ago by cshalizi
Accurately estimating neuronal correlation requires a new spike-sorting paradigm
20 days ago by cshalizi
"Neurophysiology is increasingly focused on identifying coincident activity among neurons. Strong inferences about neural computation are made from the results of such studies, so it is important that these results be accurate. However, the preliminary step in the analysis of such data, the assignment of spike waveforms to individual neurons (“spike-sorting”), makes a critical assumption which undermines the analysis: that spikes, and hence neurons, are independent. We show that this assumption guarantees that coincident spiking estimates such as correlation coefficients are biased. We also show how to eliminate this bias. Our solution involves sorting spikes jointly, which contrasts with the current practice of sorting spikes independently of other spikes. This new “ensemble sorting” yields unbiased estimates of coincident spiking, and permits more data to be analyzed with confidence, improving the quality and quantity of neurophysiological inferences. These results should be of interest outside the context of neuronal correlations studies. Indeed, simultaneous recording of many neurons has become the rule rather than the exception in experiments, so it is essential to spike sort correctly if we are to make valid inferences about any properties of, and relationships between, neurons."
to:NB
heard_the_talk
neuroscience
neural_data_analysis
ventura.valerie
kith_and_kin
statistics
inference_to_latent_objects
20 days ago by cshalizi
Clarke , Clarke : Prediction in several conventional contexts
20 days ago by cshalizi
"We review predictive techniques from several traditional branches of statistics. Starting with prediction based on the normal model and on the empirical distribution function, we proceed to techniques for various forms of regression and classification. Then, we turn to time series, longitudinal data, and survival analysis. Our focus throughout is on the mechanics of prediction more than on the properties of predictors."
(to_teach tags are tentative.)
to:NB
prediction
statistics
classifiers
regression
to_teach:undergrad-ADA
to_teach:data-mining
(to_teach tags are tentative.)
20 days ago by cshalizi
Ehm , Gneiting : Local proper scoring rules of order two
21 days ago by cshalizi
"Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if it encourages truthful reporting. It is local of order k if the score depends on the predictive density only through its value and the values of its derivatives of order up to k at the realizing event. Complementing fundamental recent work by Parry, Dawid and Lauritzen, we characterize the local proper scoring rules of order 2 relative to a broad class of Lebesgue densities on the real line, using a different approach. In a data example, we use local and nonlocal proper scoring rules to assess statistically postprocessed ensemble weather forecasts."
to:NB
prediction
scoring_rules
statistics
gneiting.tilmann
21 days ago by cshalizi
Dawid , Lauritzen , Parry : Proper local scoring rules on discrete sample spaces
21 days ago by cshalizi
"A scoring rule is a loss function measuring the quality of a quoted probability distribution Q for a random variable X, in the light of the realized outcome x of X; it is proper if the expected score, under any distribution P for X, is minimized by quoting Q = P. Using the fact that any differentiable proper scoring rule on a finite sample space is the gradient of a concave homogeneous function, we consider when such a rule can be local in the sense of depending only on the probabilities quoted for points in a nominated neighborhood of x. Under mild conditions, we characterize such a proper local scoring rule in terms of a collection of homogeneous functions on the cliques of an undirected graph on the space . A useful property of such rules is that the quoted distribution Q need only be known up to a scale factor. Examples of the use of such scoring rules include Besag’s pseudo-likelihood and Hyvärinen’s method of ratio matching."
to:NB
prediction
scoring_rules
statistics
lauritzen.steffen
dawid.philip
21 days ago by cshalizi
Parry , Dawid , Lauritzen : Proper local scoring rules
21 days ago by cshalizi
"We investigate proper scoring rules for continuous distributions on the real line. It is known that the log score is the only such rule that depends on the quoted density only through its value at the outcome that materializes. Here we allow further dependence on a finite number m of derivatives of the density at the outcome, and describe a large class of such m-local proper scoring rules: these exist for all even m but no odd m. We further show that for m ≥ 2 all such m-local rules can be computed without knowledge of the normalizing constant of the distribution."
to:NB
prediction
scoring_rules
lauritzen.steffen
dawid.philip
statistics
21 days ago by cshalizi
[0805.1404] Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections
22 days ago by cshalizi
"Given an i.i.d. sample from a distribution $F$ on $mathbb{R}$ with uniformly continuous density $p_0$, purely data-driven estimators are constructed that efficiently estimate $F$ in sup-norm loss and simultaneously estimate $p_0$ at the best possible rate of convergence over H"older balls, also in sup-norm loss. The estimators are obtained by applying a model selection procedure close to Lepski's method with random thresholds to projections of the empirical measure onto spaces spanned by wavelets or $B$-splines. The random thresholds are based on suprema of Rademacher processes indexed by wavelet or spline projection kernels. This requires Bernstein-type analogs of the inequalities in Koltchinskii [Ann. Statist. 34 (2006) 2593-2656] for the deviation of suprema of empirical processes from their Rademacher symmetrizations."
to:NB
density_estimation
wavelets
splines
statistics
empirical_processes
22 days ago by cshalizi
Testing parametric conditional distributions using the nonparametric smoothing method
22 days ago by cshalizi
"This paper proposes a new goodness-of-fit test for parametric conditional probability distributions using the nonparametric smoothing methodology. An asymptotic normal distribution is established for the test statistic under the null hypothesis of correct specification of the parametric distribution. The test is shown to have power against local alternatives converging to the null at certain rates. The test can be applied to testing for possible misspecifications in a wide variety of parametric models. A bootstrap procedure is provided for obtaining more accurate critical values for the test. Monte Carlo simulations show that the test has good power against some common alternatives."
to:NB
misspecification
density_estimation
smoothing
statistics
to_teach:undergrad-ADA
22 days ago by cshalizi
Consistent Model Selection Criteria on High Dimensions
25 days ago by cshalizi
"Asymptotic properties of model selection criteria for high-dimensional regression models are studied where the dimension of covariates is much larger than the sample size. Several sufficient conditions for model selection consistency are provided. Non-Gaussian error distributions are considered and it is shown that the maximal number of covariates for model selection consistency depends on the tail behavior of the error distribution. Also, sufficient conditions for model selection consistency are given when the variance of the noise is neither known nor estimated consistently. Results of simulation studies as well as real data analysis are given to illustrate that finite sample performances of consistent model selection criteria can be quite different."
to:NB
model_selection
statistics
high-dimensional_probability
25 days ago by cshalizi
"The huge Package for High-dimensional Undirected Graph Estimation in R"
25 days ago by cshalizi
"We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efficiency."
to:NB
to_teach:undergrad-ADA
graphical_models
statistics
kith_and_kin
wasserman.larry
roeder.kathryn
liu.han
25 days ago by cshalizi
[1204.6703] Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation
27 days ago by cshalizi
"Topic models can be seen as a generalization of the clustering problem, in that they posit that observations are generated due to multiple latent factors (e.g. the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic probability vectors (the distributions over words for each topic), when only the words are observed and the corresponding topics are hidden.
"We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e. third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on k by k matrices, where k is the number of latent factors (e.g. the number of topics), rather than in the d-dimensional observed space (typically d >> k)."
That's a really remarkable claim, and I'd tag it to_be_shot_after_a_fair_trial if it weren't being made by genuinely serious people.
in_NB
to_read
latent_variables
topic_models
text_mining
mixture_models
statistics
machine_learning
cool_if_true
spectral_clustering
"We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e. third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on k by k matrices, where k is the number of latent factors (e.g. the number of topics), rather than in the d-dimensional observed space (typically d >> k)."
That's a really remarkable claim, and I'd tag it to_be_shot_after_a_fair_trial if it weren't being made by genuinely serious people.
27 days ago by cshalizi
[1204.6265] Statistical inference for dynamical systems: a review
28 days ago by cshalizi
"The topic of statistical inference for dynamical systems has been studied extensively across several fields. In this survey we focus on the problem of parameter estimation for non-linear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research."
to:NB
to_read
statistical_inference_for_stochastic_processes
dynamical_systems
statistics
time_series
state-space_models
state-space_reconstruction
pillai.natesh
via:ded-maxim
28 days ago by cshalizi
Attractive Models - Kieran Healy
29 days ago by cshalizi
Have I really not bookmarked this before?
p-values
statistics
political_science
social_science_methodology
bad_data_analysis
to_teach:undergrad-ADA
to_teach:data-mining
re:neutral_model_of_inquiry
healy.kieran
29 days ago by cshalizi
[1006.1015] Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees
4 weeks ago by cshalizi
"Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960's. In bioinformatics, psychometrics and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and `generalizability' of these summaries. This paper provides an implementation of the geometric distance between trees developed by Billera, Holmes and Vogtmann (2001) [BHV] equally applicable to phylogenetic trees and hieirarchical clustering trees, and shows some of the applications in statistical inference for which this distance can be useful. In particular, since BHV have shown that the space of trees is negatively curved (a CAT(0) space), a natural representation of a collection of trees is a tree. We compare this representation to the Euclidean approximations of treespace made available through Multidimensional Scaling of the matrix of distances between trees. We also provide applications of the distances between trees to hierarchical clustering trees constructed from microarrays. Our method gives a new way of evaluating the influence both of certain columns (positions, variables or genes) and of certain rows (whether species, observations or arrays)."
to:NB
clustering
hierarchical_structure
holmes.susan
data_mining
statistics
to_teach:data-mining
gene_expression_data_analysis
via:ryan_t
4 weeks ago by cshalizi
A Normal Law for the Plug-in Estimator of Entropy
4 weeks ago by cshalizi
"This paper establishes a sufficient condition for the asymptotic normality of the plug-in estimator of Shannon's entropy defined on a countable alphabet. The sufficient condition covers a range of cases with countably infinite alphabets, for which no normality results were previously known."
in_NB
entropy_estimation
statistics
information_theory
4 weeks ago by cshalizi
[1204.5633] Noncentral Limit Theorem and the Bootstrap for Quantiles of Dependent Data
4 weeks ago by cshalizi
"We will show under minimal conditions on differentiability and dependence that the central limit theorem for quantiles holds and that the block bootstrap is weakly consistent. Under slightly stronger conditions, the bootstrap is strongly consistent. Without the differentiability condition, quantiles might have a non-normal asymptotic distribution and the bootstrap might fail."
to:NB
bootstrap
statistics
statistical_inference_for_stochastic_processes
4 weeks ago by cshalizi
[1201.5871] Null models for network data
5 weeks ago by cshalizi
"The analysis of datasets taking the form of simple, undirected graphs continues to gain in importance across a variety of disciplines. Two choices of null model, the logistic-linear model and the implicit log-linear model, have come into common use for analyzing such network data, in part because each accounts for the heterogeneity of network node degrees typically observed in practice. Here we show how these both may be viewed as instances of a broader class of null models, with the property that all members of this class give rise to essentially the same likelihood-based estimates of link probabilities in sparse graph regimes. This facilitates likelihood-based computation and inference, and enables practitioners to choose the most appropriate null model from this family based on application context. Comparative model fits for a variety of network datasets demonstrate the practical implications of our results."
in_NB
network_data_analysis
have_read
statistics
estimation
approximation
re:smoothing_adjacency_matrices
5 weeks ago by cshalizi
[1204.3915] Theory and Inference for a Class of Observation-driven Models with Application to Time Series of Counts
5 weeks ago by cshalizi
"This paper studies theory and inference related to a class of time series models that incorporates nonlinear dynamics. It is assumed that the observations follow a one-parameter exponential family of distributions given an accompanying process that evolves as a function of lagged observations. We employ an iterated random function approach and a special coupling technique to show that, under suitable conditions on the parameter space, the conditional mean process is a geometric moment contracting Markov chain and that the observation process is absolutely regular with geometrically decaying coefficients. Moreover the asymptotic theory of the maximum likelihood estimates of the parameters is established under some mild assumptions. These models are applied to two examples; the first is the number of transactions per minute of Ericsson stock and the second is related to return times of extreme events of Goldman Sachs Group stock."
--- Without reading beyond the abstract, I'm guessing chains with complete connections.
to:NB
time_series
markov_models
statistics
--- Without reading beyond the abstract, I'm guessing chains with complete connections.
5 weeks ago by cshalizi
[1204.3941] A Log-Linear Graphical Model for Inferring Genetic Networks from High-Throughput Sequencing Data
5 weeks ago by cshalizi
"We develop a novel method for estimating high-dimensional Poisson graphical models, the Log-Linear Graphical Model, allowing us to infer networks based on high-throughput sequencing data. Our model assumes that conditional on all other genes, each gene is Poisson, jointly defining a pair-wise Poisson Markov random field. We estimate our genetic networks via neighborhood selection by fitting `1-norm penalized log-linear models, an approach we call the Poisson Graphical Lasso. Additionally, we develop a fast parallel algorithm, permitting us to fit our graphical models to high-dimensional genomic data sets."
in_NB
graphical_models
gene_expression_data_analysis
lasso
network_data_analysis
statistics
regression
5 weeks ago by cshalizi
Xu , McLeod : Further asymptotic properties of the generalized information criterion
5 weeks ago by cshalizi
"Asymptotic properties of the generalized information criterion for model selection are examined and new conditions under which this criterion is overfitting, consistent, or underfitting are derived."
in_NB
model_selection
information_criteria
statistics
5 weeks ago by cshalizi
Ockham's Razor: Foundations - Carnegie Mellon Center for Formal Epistemology
5 weeks ago by cshalizi
Despite my presence on the program, this should actually be really good.
"Scientific theory choice is guided by judgments of simplicity, a bias frequently referred to as "Ockham's Razor". But what is simplicity and how, if at all, does it help science find the truth? Should we view simple theories as means for obtaining accurate predictions, as classical statisticians recommend? Or should we believe the theories themselves, as Bayesian methods seem to justify? The aim of this workshop is to re-examine the foundations of Ockham's razor, with a firm focus on the connections, if any, between simplicity and truth. "
self-promotion
occams_razor
philosophy_of_science
epistemology
kelly.kevin_t.
kith_and_kin
mayo.deborah
vapnik.v.n.
sober.elliott
leeb.hannes
wasserman.larry
model_selection
statistics
complexity
machine_learning
learning_theory
grunwald.peter
"Scientific theory choice is guided by judgments of simplicity, a bias frequently referred to as "Ockham's Razor". But what is simplicity and how, if at all, does it help science find the truth? Should we view simple theories as means for obtaining accurate predictions, as classical statisticians recommend? Or should we believe the theories themselves, as Bayesian methods seem to justify? The aim of this workshop is to re-examine the foundations of Ockham's razor, with a firm focus on the connections, if any, between simplicity and truth. "
5 weeks ago by cshalizi
Xiao , Wu : Covariance matrix estimation for stationary time series
6 weeks ago by cshalizi
"We obtain a sharp convergence rate for banded covariance matrix estimates of stationary processes. A precise order of magnitude is derived for spectral radius of sample covariance matrices. We also consider a thresholded covariance matrix estimator that can better characterize sparsity if the true covariance matrix is sparse. As our main tool, we implement Toeplitz [Math. Ann. 70 (1911) 351–376] idea and relate eigenvalues of covariance matrices to the spectral densities or Fourier transforms of the covariances. We develop a large deviation result for quadratic forms of stationary processes using m-dependence approximation, under the framework of causal representation and physical dependence measures."
to:NB
time_series
statistics
estimation
variance_estimation
6 weeks ago by cshalizi
Arias-Castro , Bubeck , Lugosi : Detection of correlations
6 weeks ago by cshalizi
"We consider the hypothesis testing problem of deciding whether an observed high-dimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. We establish upper and lower bounds for the worst-case (minimax) risk in terms of the size of the correlated subset, the level of correlation, and the structure of the class of possibly correlated sets. We show that some simple tests have near-optimal performance in many cases, while the generalized likelihood ratio test is suboptimal in some important cases."
to:NB
statistics
factor_analysis
6 weeks ago by cshalizi
Bai , Li : Statistical analysis of factor models of high dimension
6 weeks ago by cshalizi
"This paper considers the maximum likelihood estimation of factor models of high dimension, where the number of variables (N) is comparable with or even greater than the number of observations (T). An inferential theory is developed. We establish not only consistency but also the rate of convergence and the limiting distributions. Five different sets of identification conditions are considered. We show that the distributions of the MLE estimators depend on the identification restrictions. Unlike the principal components approach, the maximum likelihood estimator explicitly allows heteroskedasticities, which are jointly estimated with other parameters. Efficiency of MLE relative to the principal components method is also considered."
to:NB
to_read
factor_analysis
statistics
high-dimensional_statistics
6 weeks ago by cshalizi
Space–time modelling of coupled spatiotemporal environmental variables - Ippoliti - 2012 - Journal of the Royal Statistical Society: Series C (Applied Statistics) - Wiley Online Library
6 weeks ago by cshalizi
"dynamic factor model for spatiotemporal coupled environmental variables. The model is proposed in a state space formulation which, through Kalman recursions, allows a unified approach to prediction and estimation. Full probabilistic inference for the model parameters is facilitated by adapting standard Markov chain Monte Carlo algorithms for dynamic linear models to our model formulation. The predictive ability of the model is discussed for two different data sets with variables measured at two different scales. Some possibilities for further research are also outlined."
to:NB
spatial_statistics
state-space_models
statistics
6 weeks ago by cshalizi
Local polynomial regression for symmetric positive definite matrices - Yuan - 2012 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
6 weeks ago by cshalizi
"Local polynomial regression has received extensive attention for the non-parametric estimation of regression functions when both the response and the covariate are in Euclidean space. However, little has been done when the response is in a Riemannian manifold. We develop an intrinsic local polynomial regression estimate for the analysis of symmetric positive definite matrices as responses that lie in a Riemannian manifold with covariate in Euclidean space. The primary motivation and application of the methodology proposed is in computer vision and medical imaging. We examine two commonly used metrics, including the trace metric and the log-Euclidean metric on the space of symmetric positive definite matrices. For each metric, we develop a cross-validation bandwidth selection method, derive the asymptotic bias, variance and normality of the intrinsic local constant and local linear estimators, and compare their asymptotic mean-square errors. Simulation studies are further used to compare the estimators under the two metrics and to examine their finite sample performance. We use our method to detect diagnostic differences between diffusion tensors along fibre tracts in a study of human immunodeficiency virus."
to:NB
variance_estimation
statistics
regression
nonparametrics
kernel_estimators
6 weeks ago by cshalizi
Quantifying the weight of evidence from a forensic fingerprint comparison: a new paradigm - Neumann - 2012 - Journal of the Royal Statistical Society: Series A (Statistics in Society) - Wiley Online Library
6 weeks ago by cshalizi
"The fingerprint has, with considerable justification, come to be regarded as the acme of forensic identification. Over the last century, millions of cases have been resolved world wide because of marks left at crime scenes. The comparison methodology has not evolved greatly during its history and it is universal practice to present fingerprint evidence to a court as a categoric opinion of identification or exclusion, or to classify the evidence as inconclusive and not to report it. There has been a growing movement to supplement the fingerprint examination process by one that has a statistical model, supported by appropriate databases for calculating numerical measures of weight of evidence. The movement calls for the establishment of a logical framework for informing conclusions, based on explicit assumptions and data and open to revision and improvement. The aim is to enable the numerical evaluation of evidence that would currently be reported as a categorical identification and also of evidence that would currently be classified as inconclusive. The paper presents the results of a project carried out by the Forensic Science Service that aims to attain this goal. After a historical review, we describe a formal model for assigning numerical values to configurations of minutiae in fingerprints. We describe how the parameters of the model have been optimized to take account of interoperator variability and distortion of the finger pad, and we present the results of a substantial validation experiment that was based on searches that have been carried out on the US national fingerprint database of approximately 600 million fingerprints."
to:NB
fingerprints
statistics
6 weeks ago by cshalizi
[1204.2043] Unbiased Cultural Transmission in Time-Averaged Archaeological Assemblages
6 weeks ago by cshalizi
"Unbiased models are foundational in the archaeological study of cultural transmission. Applications have as- sumed that archaeological data represent synchronic samples, despite the accretional nature of the archaeological record. I document the circumstances under which time-averaging alters the distribution of model predictions. Richness is inflated in long-duration assemblages, and evenness is "flattened" compared to unaveraged samples. Tests of neutrality, employed to differentiate biased and unbiased models, suffer serious problems with Type I error under time-averaging. Finally, the time-scale over which time-averaging alters predictions is determined by the mean trait lifetime, providing a way to evaluate the impact of these effects upon archaeological samples."
to:NB
archaeology
cultural_evolution
statistics
neutral_models
6 weeks ago by cshalizi
[1204.2296] Co-clustering for Directed Graphs; the Stochastic Co-Blockmodel and a Spectral Algorithm
6 weeks ago by cshalizi
"This paper extends the spectral clustering algorithm to directed networks in a way that co-clusters or bi-clusters the rows and columns of a graph Laplacian. Co-clustering leverages the increased complexity of asymmetric relationships to gain new insight into the structure of the directed network. To understand this algorithm and to study its asymptotic properties in a canonical setting, we propose the Stochastic Co-Blockmodel to encode co-clustering structure. This is the first statistical model of co-clustering and it is derived using the concept of stochastic equivalence that motivated the original Stochastic Blockmodel. Although directed spectral clustering is not derived from the Stochastic Co-Blockmodel, we show that, asymptotically, the algorithm can estimate the blocks in a high dimensional asymptotic setting in which the number of blocks grows with the number of nodes. The algorithm, model, and asymptotic results can all be extended to bipartite graphs."
in_NB
relational_learning
network_data_analysis
statistics
clustering
community_discovery
spectral_clustering
yu.bin
6 weeks ago by cshalizi
[1204.2477] A Simple Explanation of A Spectral Algorithm for Learning Hidden Markov Models
6 weeks ago by cshalizi
"A simple linear algebraic explanation of the algorithm in "A Spectral Algorithm for Learning Hidden Markov Models" (COLT 2009). Most of the content is in Figure 2; the text just makes everything precise in four nearly-trivial claims."
to:NB
to_read
statistics
markov_models
re:AoS_project
spectral_methods
6 weeks ago by cshalizi
[1204.2763] A Cram'er-Rao inequality for non differentiable models
6 weeks ago by cshalizi
"We compute a variance lower bound for unbiased estimators in specified statistical models. The construction of the bound is related to the original Cram'er-Rao bound, although it does not require the differentiability of the model. Moreover, we show our efficiency bound to be always greater than the Cram'er-Rao bound in smooth models, thus providing a sharper result."
to:NB
cramer-rao
statistics
estimation
information_geometry
6 weeks ago by cshalizi
[1204.2762] On the Uniform Asymptotic Validity of Subsampling and the Bootstrap
6 weeks ago by cshalizi
"This paper provides conditions under which subsampling and the bootstrap can be used to construct estimators of the quantiles of the distribution of a root that behave well uniformly over a large class of distributions $mathbf P$. These results are then applied (i) to construct confidence regions that behave well uniformly over $mathbf P$ in the sense that the coverage probability tends to at least the nominal level uniformly over $mathbf P$ and (ii) to construct tests that behave well uniformly over $mathbf P$ in the sense that the size tends to no greater than the nominal level uniformly over $mathbf P$. Without these stronger notions of convergence, the asymptotic approximations to the coverage probability or size may be poor even in very large samples. Specific applications include the multivariate mean, testing moment inequalities, multiple testing, the empirical process, and $U$-statistics."
in_NB
bootstrap
statistics
6 weeks ago by cshalizi
Assessing gross domestic product and inflation probability forecasts derived from Bank of England fan charts - Galbraith - 2011 - Journal of the Royal Statistical Society: Series A (Statistics in Society) - Wiley Online Library
6 weeks ago by cshalizi
"Density forecasts, including the pioneering Bank of England ‘fan charts’, are often used to produce forecast probabilities of a particular event. We use the Bank of England's forecast densities to calculate the forecast probability that annual rates of inflation and output growth exceed given thresholds. We subject these implicit probability forecasts to graphical and numerical diagnostic checks. We measure both their calibration and their resolution, providing both statistical and graphical interpretations of the results. The results reinforce earlier evidence on limitations of these forecasts and provide new evidence on their information content and on the relative performance of inflation and gross domestic product growth forecasts. In particular, gross domestic product forecasts show little or no ability to predict periods of low growth beyond the current quarter, in part because of the important role of data revisions."
to:NB
prediction
statistics
calibration
macroeconomics
to_teach:undergrad-ADA
6 weeks ago by cshalizi
[0802.4192] Maxisets for Model Selection
6 weeks ago by cshalizi
"We address the statistical issue of determining the maximal spaces (maxisets) where model selection procedures attain a given rate of convergence. By considering first general dictionaries, then orthonormal bases, we characterize these maxisets in terms of approximation spaces. These results are illustrated by classical choices of wavelet model collections. For each of them, the maxisets are described in terms of functional spaces. We take a special care of the issue of calculability and measure the induced loss of performance in terms of maxisets."
in_NB
statistics
model_selection
approximation
6 weeks ago by cshalizi
[0802.4363] Estimating the entropy of binary time series: Methodology, some theory and a simulation study
6 weeks ago by cshalizi
"Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal entropy estimator.
"**Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency."
in_NB
to_read
entropy_estimation
information_theory
time_series
statistics
kontoyiannis.ioannis
re:stacs
"**Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency."
6 weeks ago by cshalizi
[math/0612776] Uniform error bounds for smoothing splines
6 weeks ago by cshalizi
"Almost sure bounds are established on the uniform error of smoothing spline estimators in nonparametric regression with random designs. Some results of Einmahl and Mason (2005) are used to derive uniform error bounds for the approximation of the spline smoother by an ``equivalent'' reproducing kernel regression estimator, as well as for proving uniform error bounds on the reproducing kernel regression estimator itself, uniformly in the smoothing parameter over a wide range. This admits data-driven choices of the smoothing parameter."
to:NB
splines
regression
nonparametrics
statistics
learning_theory
6 weeks ago by cshalizi
[math/0603130] Nonparametric methods for inference in the presence of instrumental variables
6 weeks ago by cshalizi
"We suggest two nonparametric approaches, based on kernel methods and orthogonal series to estimating regression functions in the presence of instrumental variables. For the first time in this class of problems, we derive optimal convergence rates, and show that they are attained by particular estimators. In the presence of instrumental variables the relation that identifies the regression function also defines an ill-posed inverse problem, the ``difficulty'' of which depends on eigenvalues of a certain integral operator which is determined by the joint density of endogenous and instrumental variables. We delineate the role played by problem difficulty in determining both the optimal convergence rate and the appropriate choice of smoothing parameter."
to:NB
to_read
regression
statistics
instrumental_variables
nonparametrics
to_teach:undergrad-ADA
6 weeks ago by cshalizi
[1204.2581] Modeling Relational Data via Latent Factor Blockmodel
6 weeks ago by cshalizi
"In this paper we address the problem of modeling relational data, which appear in many applications such as social network analysis, recommender systems and bioinformatics. Previous studies either consider latent feature based models but disregarding local structure in the network, or focus exclusively on capturing local structure of objects based on latent blockmodels without coupling with latent characteristics of objects. To combine the benefits of the previous work, we propose a novel model that can simultaneously incorporate the effect of latent features and covariates if any, as well as the effect of latent structure that may exist in the data. To achieve this, we model the relation graph as a function of both latent feature factors and latent cluster memberships of objects to collectively discover globally predictive intrinsic properties of objects and capture latent block structure in the network to improve prediction performance. We also develop an optimization transfer algorithm based on the generalized EM-style strategy to learn the latent factors. We prove the efficacy of our proposed model through the link prediction task and cluster analysis task, and extensive experiments on the synthetic data and several real world datasets suggest that our proposed LFBM model outperforms the other state of the art approaches in the evaluated tasks."
in_NB
network_data_analysis
community_discovery
statistics
inference_to_latent_objects
factor_analysis
relational_learning
6 weeks ago by cshalizi
[1204.1563] Generalized Error Exponents for Sparse Sample Goodness of Fit Tests
6 weeks ago by cshalizi
"We investigate the sparse sample goodness-of-fit problem, where the number of samples $n$ is smaller than the size of the alphabet $m$. The goal of this work is to find an appropriate criterion to analyze statistical tests in this setting. A suitable model for analysis is the high-dimensional model in which both $n$ and $m$ tend to infinity, and $n=o(m)$. We propose a new performance criterion based on large deviation analysis, which generalizes the classical error exponent applicable for large sample problems (in which $m=O(n)$). This new criterion provides insights that are not available from asymptotic consistency or CLT analysis. The main results are:
(i) The best achievable probability of error $P_e$ decays as $-log(P_e)=(n^2/m)(1+o(1))J$ for some $J>0$.
(ii) A well-known coincidence-based test attains the optimal generalized error exponent.
(iii) The widely used Pearson's chi-square test has J=0.
(iv) The contributions (i)-(iii) are established under the assumption that the distribution under the null hypothesis is uniform. For the non-uniform case, a new test is proposed, with a non-zero generalized error exponent."
to:NB
hypothesis_testing
re:LICORS
statistics
large_deviations
goodness-of-fit
(i) The best achievable probability of error $P_e$ decays as $-log(P_e)=(n^2/m)(1+o(1))J$ for some $J>0$.
(ii) A well-known coincidence-based test attains the optimal generalized error exponent.
(iii) The widely used Pearson's chi-square test has J=0.
(iv) The contributions (i)-(iii) are established under the assumption that the distribution under the null hypothesis is uniform. For the non-uniform case, a new test is proposed, with a non-zero generalized error exponent."
6 weeks ago by cshalizi
Colombo , Maathuis , Kalisch , Richardson : Learning high-dimensional directed acyclic graphs with latent and selection variables
7 weeks ago by cshalizi
"We consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables. The FCI (Fast Causal Inference) algorithm has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose the new RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct in the asymptotic limit. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."
--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
in_NB
have_read
statistics
graphical_models
causal_inference
sparsity
to_teach:undergrad-ADA
--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
7 weeks ago by cshalizi
Cook , Forzani , Rothman : Estimating sufficient reductions of the predictors in abundant high-dimensional regressions
7 weeks ago by cshalizi
"We study the asymptotic behavior of a class of methods for sufficient dimension reduction in high-dimension regressions, as the sample size and number of predictors grow in various alignments. It is demonstrated that these methods are consistent in a variety of settings, particularly in abundant regressions where most predictors contribute some information on the response, and oracle rates are possible. Simulation results are presented to support the theoretical conclusion."
to:NB
regression
dimension_reduction
sufficiency
statistics
7 weeks ago by cshalizi
[1203.3083] Clustering in networks with the collapsed Stochastic Block Model
7 weeks ago by cshalizi
"We present an efficient MCMC algorithm to cluster the nodes of a network such that nodes with similar role in the network are clustered together. This is known as block-modelling or block-clustering. We extend the stochastic blockmodel (SBM) of Nowicki & Snijders (2001), by exploiting parameter collapsing to integrate out block parameters. The resulting model defines a posterior over the number of clusters and cluster memberships. Sampling from this model is simpler than from the original SBM as transdimensional MCMC can be avoided. Moreover, our extensions allow the number of clusters to be directly estimated, rather than given as an input parameter. The algorithm is based on the allocation sampler of Nobile & Fearnside (2007). We use synthetic and real data to test the speed and accuracy of our model and algorithm, including the ability to estimate the number of clusters. The algorithm can scale to networks with up to ten thousand nodes."
in_NB
network_data_analysis
community_discovery
statistics
7 weeks ago by cshalizi
[1203.0683] A Method of Moments for Mixture Models and Hidden Markov Models
7 weeks ago by cshalizi
"Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high computational and sample complexity which typically scale exponentially with the number of mixture components. This work develops an efficient method of moments approach to parameter estimation for a broad class of high-dimensional mixture models with many components, including multi-view mixtures of Gaussians (such as mixtures of axis-aligned Gaussians) and hidden Markov models. The new method leads to rigorous unsupervised learning results for mixture models that were not achieved by previous works; and, because of its simplicity, it also constitutes a viable alternative to EM for practical deployment."
Clever: some mixture models can be characterized by expectations, covariances, and third-order mixed moments, so you just need to estimate tensors up to third order, and not very high moments of vectors (which are very noisy) and do some linear algebra. I should probably re-read because I couldn't reproduce this at the board.
in_NB
statistics
estimation
mixture_models
markov_models
state-space_models
have_read
Clever: some mixture models can be characterized by expectations, covariances, and third-order mixed moments, so you just need to estimate tensors up to third order, and not very high moments of vectors (which are very noisy) and do some linear algebra. I should probably re-read because I couldn't reproduce this at the board.
7 weeks ago by cshalizi
[1203.1515] Multiple Change-Point Estimation in Stationary Ergodic Time-Series
7 weeks ago by cshalizi
"The multiple change-point problem is considered in the most general setting, where the only assumption made on the time-series distributions generating the data is that they are stationary ergodic. No modeling, independence or parametric assumptions are made. While the need for such a general setting is dictated by real applications, the problem of change-point estimation becomes a difficult unsupervised learning problem. In this work a novel algorithm for solving this problem is proposed, and it is shown to be asymptotically consistent under the general assumptions considered."
to:NB
change-point_problem
time_series
ergodic_theory
statistics
statistical_inference_for_stochastic_processes
ryabko.daniil
7 weeks ago by cshalizi
[1203.4354] Asymptotic Confidence Sets for General Nonparametric Regression and Classification by Regularized Kernel Methods
7 weeks ago by cshalizi
"Regularized kernel methods such as, e.g., support vector machines and least-squares support vector regression constitute an important class of standard learning algorithms in machine learning. Theoretical investigations concerning asymptotic properties have manly focused on rates of convergence during the last years but there are only very few and limited (asymptotic) results on statistical inference so far. As this is a serious limitation for their use in mathematical statistics, the goal of the article is to fill this gap. Based on asymptotic normality of many of these methods, the article derives a strongly consistent estimator for the unknown covariance matrix of the limiting normal distribution. In this way, we obtain asymptotically correct confidence sets for $psi(f_{P,lambda_0})$ where $f_{P,lambda_0}$ denotes the minimizer of the regularized risk in the reproducing kernel Hilbert space $H$ and $psi:Hrightarrowmathds{R}^m$ is any Hadamard-differentiable functional. Applications include (multivariate) pointwise confidence sets for values of $f_{P,lambda_0}$ and confidence sets for gradients, integrals, and norms."
to:NB
confidence_sets
kernel_methods
statistics
nonparametrics
regression
classifiers
7 weeks ago by cshalizi
[1204.0033] Transforming Graph Representations for Statistical Relational Learning
7 weeks ago by cshalizi
"Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed."
in_NB
relational_learning
statistics
machine_learning
neville.jennifer
change_of_representation
7 weeks ago by cshalizi
[no title]
7 weeks ago by cshalizi
"Conditional independence relations involving latent variables do not necessarily imply observable independences. They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsification and identification. The literature on computing such constraints often involve a deterministic underlying data generating process in a counterfactual framework. If an analyst is ignorant of the nature of the underlying mechanisms then they may wish to use a model which allows the underlying mechanisms to be probabilistic. A method of computation for a weaker model without any determinism is given here and demonstrated for the instrumental variable model, though applicable to other models. The approach is based on the analysis of mappings with convex polytopes in a decision theoretic framework and can be implemented in readily available polyhedral computation software. Well known constraints and bounds are replicated in a probabilistic model and novel ones are computed for instrumental variable models without non-deterministic versions of the randomization, exclusion restriction and monotonicity assumptions respectively."
(From a quick scan, this looks too heavy to actually teach in ADAfaEPoV, but it's so tagged to remind me to include a reference.)
to:NB
causal_inference
partial_identification
statistics
instrumental_variables
to_teach:undergrad-ADA
(From a quick scan, this looks too heavy to actually teach in ADAfaEPoV, but it's so tagged to remind me to include a reference.)
7 weeks ago by cshalizi
Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
7 weeks ago by cshalizi
"We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples."
--- I wonder whether this hasn't some application to the PC algorithm?
to:NB
graphical_models
lasso
sparsity
statistics
heard_the_talk
--- I wonder whether this hasn't some application to the PC algorithm?
7 weeks ago by cshalizi
A Kernel Two-Sample Test
7 weeks ago by cshalizi
"We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distribution-free tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests."
in_NB
to_read
hilbert_space
kernel_methods
goodness-of-fit
statistics
concentration_of_measure
probability
two-sample_tests
re:network_differences
7 weeks ago by cshalizi
Structured Sparsity and Generalization
7 weeks ago by cshalizi
"We present a data dependent generalization bound for a large class of regularized algorithms which implement structured sparsity constraints. The bound can be applied to standard squared-norm regularization, the Lasso, the group Lasso, some versions of the group Lasso with overlapping groups, multiple kernel learning and other regularization schemes. In all these cases competitive results are obtained. A novel feature of our bound is that it can be applied in an infinite dimensional setting such as the Lasso in a separable Hilbert space or multiple kernel learning with a countable number of kernels."
to:NB
learning_theory
regression
sparsity
statistics
lasso
7 weeks ago by cshalizi
[1203.6898] Long-term stability of sequential Monte Carlo methods under verifiable conditions
8 weeks ago by cshalizi
"This paper discusses particle filtering in general hidden Markov models (HMMs) and presents novel theoretical results on the long-term stability of bootstrap-type particle filters. More specifically, we establish that the asymptotic variance of the Monte Carlo estimates produced by the bootstrap filter is uniformly bounded in time. On the contrary to most previous results of this type, which in general presuppose that the state space of the hidden state process is compact (an assumption that is rarely satisfied in practice), our very mild assumptions are satisfied for a large class of HMMs with possibly non-compact state space. In addition, we derive a similar time uniform bound on the asymptotic Lp error. Importantly, our results hold for misspecified models, i.e. we do not at all assume that the data entering into the particle filter originate from the model governing the dynamics of the particles or not even from an HMM."
to:NB
particle_filters
stochastic_processes
time_series
state_estimation
state-space_models
markov_models
statistics
8 weeks ago by cshalizi
[0804.0991] Quadratic distances on probabilities: A unified foundation
8 weeks ago by cshalizi
"This work builds a unified framework for the study of quadratic form distance measures as they are used in assessing the goodness of fit of models. Many important procedures have this structure, but the theory for these methods is dispersed and incomplete. Central to the statistical analysis of these distances is the spectral decomposition of the kernel that generates the distance. We show how this determines the limiting distribution of natural goodness-of-fit tests. Additionally, we develop a new notion, the spectral degrees of freedom of the test, based on this decomposition. The degrees of freedom are easy to compute and estimate, and can be used as a guide in the construction of useful procedures in this class."
to:NB
statistics
goodness-of-fit
8 weeks ago by cshalizi
[0803.0402] A note on sensitivity of principal component subspaces and the efficient detection of influential observations in high dimensions
8 weeks ago by cshalizi
"In this paper we introduce an influence measure based on second order expansion of the RV and GCD measures for the comparison between unperturbed and perturbed eigenvectors of a symmetric matrix estimator. Example estimators are considered to highlight how this measure compliments recent influence analysis. Importantly, we also show how a sample based version of this measure can be used to accurately and efficiently detect influential observations in practice."
to:NB
principal_components
statistics
to_teach:undergrad-ADA
8 weeks ago by cshalizi
[0803.0835] Goodness-of-fit tests for Markovian time series models: Central limit theory and bootstrap approximations
8 weeks ago by cshalizi
"New goodness-of-fit tests for Markovian models in time series analysis are developed which are based on the difference between a fully nonparametric estimate of the one-step transition distribution function of the observed process and that of the model class postulated under the null hypothesis. The model specification under the null allows for Markovian models, the transition mechanisms of which depend on an unknown vector of parameters and an unspecified distribution of i.i.d. innovations. Asymptotic properties of the test statistic are derived and the critical values of the test are found using appropriate bootstrap schemes. General properties of the bootstrap for Markovian processes are derived. A new central limit theorem for triangular arrays of weakly dependent random variables is obtained. For the proof of stochastic equicontinuity of multidimensional empirical processes, we use a simple approach based on an anisotropic tiling of the space. The finite-sample behavior of the proposed test is illustrated by some numerical examples and a real-data application is given."
in_NB
statistics
statistical_inference_for_stochastic_processes
bootstrap
markov_models
goodness-of-fit
8 weeks ago by cshalizi
[0804.0678] Consistency of spectral clustering
8 weeks ago by cshalizi
"Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that, for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result, we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering."
to:NB
statistics
machine_learning
clustering
spectral_clustering
8 weeks ago by cshalizi
[math/0609514] Sequential Monte Carlo smoothing with application to parameter estimation in non-linear state space models
8 weeks ago by cshalizi
"This paper concerns the use of sequential Monte Carlo methods (SMC) for smoothing in general state space models. A well-known problem when applying the standard SMC technique in the smoothing mode is that the resampling mechanism introduces degeneracy of the approximation in the path space. However, when performing maximum likelihood estimation via the EM algorithm, all functionals involved are of additive form for a large subclass of models. To cope with the problem in this case, a modification of the standard method (based on a technique proposed by Kitagawa and Sato) is suggested. Our algorithm relies on forgetting properties of the filtering dynamics and the quality of the estimates produced is investigated, both theoretically and via simulations."
to:NB
statistics
time_series
state_estimation
state-space_models
particle_filters
8 weeks ago by cshalizi
[1203.6360] You had me at hello: How phrasing affects memorability
8 weeks ago by cshalizi
"Understanding the ways in which information achieves widespread public awareness is a research question of significant interest. We consider whether, and how, the way in which the information is phrased --- the choice of words and sentence structure --- can affect this process. To this end, we develop an analysis framework and build a corpus of movie quotes, annotated with memorability information, in which we are able to control for both the speaker and the setting of the quotes. We find significant differences between memorable and non-memorable quotes in several key dimensions. One is lexical distinctiveness: in aggregate, memorable quotes use less common word choices, but at the same time are built upon a scaffolding of common syntactic patterns; another is that memorable quotes tend to be more general in ways that make them easy to apply in new contexts. We also show how the concept of "memorable language" can be extended across domains."
to:NB
linguistics
statistics
cultural_evolution
8 weeks ago by cshalizi
[1203.5673] Effect of Nonstationarity on Models Inferred from Neural Data
8 weeks ago by cshalizi
"Neurons subject to a common non-stationary input may exhibit a correlated firing behavior. Correlations in the statistics of neural spike trains also arise as the effect of interaction between neurons. Here we show that these two situations can be distinguished, with machine learning techniques, provided the data are rich enough. In order to do this, we study the problem of inferring a kinetic Ising model, stationary or nonstationary, from the available data. We apply the inference procedure to two data sets: one from salamander retinal ganglion cells and the other from a realistic computational cortical network model. We show that many aspects of the concerted activity of the salamander retinal neurons can be traced simply to the external input. A model of non-interacting neurons subject to a non-stationary external field outperforms a model with stationary input with couplings between neurons, even accounting for the differences in the number of model parameters. When couplings are added to the non-stationary model, for the retinal data, little is gained: the inferred couplings are generally not significant. Likewise, the distribution of the sizes of sets of neurons that spike simultaneously and the frequency of spike patterns as function of their rank (Zipf plots) are well-explained by an independent-neuron model with time-dependent external input, and adding connections to such a model does not offer significant improvement. For the cortical model data, robust couplings, well correlated with the real connections, can be inferred using the non-stationary model. Adding connections to this model slightly improves the agreement with the data for the probability of synchronous spikes but hardly affects the Zipf plot."
to:NB
neural_data_analysis
statistics
time_series
8 weeks ago by cshalizi
[1203.5950] Capturing the time-varying drivers of an epidemic using stochastic dynamical systems
8 weeks ago by cshalizi
"Epidemics are often modelled using state-space models based on dynamical systems, observed through partial and noisy data. In this paper we develop stochastic extensions to the popular SEIR model with parameters evolving in time, in order to capture unknown influences of changing behaviors, public interventions, seasonal effects etc. Our models assign diffusion processes for the time-varying parameters, and our inferential procedure is based on the particle Markov Chain Monte Carlo algorithm, suitably adjusted to accommodate the features of this challenging nonlinear stochastic model. The performance of the proposed computational methods is validated on simulated data and the adopted model is applied to the 2009 A/H1N1 pandemic in England. In addition to estimating the trajectories of the effective contact rate, the methodology is applied in real time to provide evidence in related public health decisions."
to:NB
time_series
epidemic_models
state-space_models
statistics
8 weeks ago by cshalizi
[1203.5471] The Bayesian Analysis of Complex, High-Dimensional Models: Can it be CODA?
8 weeks ago by cshalizi
"We consider the Bayesian analysis of a few complex, high-dimensional models and show that intuitive priors, which are not tailored to the fine details of the data model and the estimated parameters are going to fail in situations in which simple good frequentist estimators exit. The models we consider are, partially observed sample, the partial linear model, estimating linear and quadratic functionals of a white noise models, and estimating with stopping times. We argue that these findings do not contradict a strong version of Doob's consistency theorem which claims that the existence of a uniformly $sqrt n$ consistent estimator ensures that the Bayes posterior is $sqrt n$ consistent for values of the parameter with prior probability 1."
to:NB
statistics
bayesian_consistency
8 weeks ago by cshalizi
[1203.5829] Ensemble estimators for multivariate entropy estimation
8 weeks ago by cshalizi
"The problem of estimation of density functionals like entropy and mutual information has received much attention in the statistics and information theory communities. A large class of estimators of functionals of the probability density suffer from the curse of dimensionality, wherein the exponent in the MSE rate of convergence decays increasingly slowly as the dimension $d$ of the samples increases. In particular, the rate is often glacially slow of order $O(T^{-{gamma}/{d}})$, where $T$ is the number of samples, and $gamma>0$ is a rate parameter. Examples of such estimators include kernel density estimators, $k$-NN density estimators, $k$-NN entropy estimators, intrinsic dimension estimators and other examples. In this paper, we propose a weighted convex combination of an ensemble of such estimators, where optimal weights can be chosen such that the weighted estimator converges at a much faster dimension invariant rate of $O(T^{-1})$. Furthermore, we show that these optimal weights can be determined by solving a convex optimization problem which can be performed offline and does not require training data. We illustrate the superior performance of our weighted estimator for two important applications: (i) estimating the Panter-Dite distortion-rate factor and (ii) estimating the Shannon entropy for testing the probability distribution of a random sample."
in_NB
ensemble_methods
entropy_estimation
statistics
8 weeks ago by cshalizi
[1203.5974] The Concentration and Stability of the Community Detecting Functions on Random Networks
8 weeks ago by cshalizi
"We propose a general form of community detecting functions for finding the communities or the optimal partition of a random network, and examine the concentration and stability of the function values using the bounded difference martingale method. We derive LDP inequalities for both the general case and several specific community detecting functions: modularity, graph bipartitioning and q-Potts community structure. We also discuss the concentration and stability of community detecting functions on different types of random networks: the sparse and non-sparse networks and some examples such as ER and CL networks."
in_NB
to_read
community_discovery
network_data_analysis
statistics
8 weeks ago by cshalizi
Taylor & Francis Online :: Bayesian Nonparametric Modeling for Causal Inference - Journal of Computational and Graphical Statistics - Volume 20, Issue 1
8 weeks ago by cshalizi
"Researchers have long struggled to identify causal effects in nonexperimental settings. Many recently proposed strategies assume ignorability of the treatment assignment mechanism and require fitting two models—one for the assignment mechanism and one for the response surface. This article proposes a strategy that instead focuses on very flexibly modeling just the response surface using a Bayesian nonparametric modeling procedure, Bayesian Additive Regression Trees (BART). BART has several advantages: it is far simpler to use than many recent competitors, requires less guesswork in model fitting, handles a large number of predictors, yields coherent uncertainty intervals, and fluidly handles continuous treatment variables and missing data for the outcome variable. BART also naturally identifies heterogeneous treatment effects. BART produces more accurate estimates of average treatment effects compared to propensity score matching, propensity-weighted estimators, and regression adjustment in the nonlinear simulation situations examined. Further, it is highly competitive in linear settings with the “correct” model, linear regression. Supplemental materials including code and data to replicate simulations and examples from the article as well as methods for population inference are available online."
to:NB
regression
causal_inference
nonparametrics
statistics
hill.jennifer
8 weeks ago by cshalizi
Taylor & Francis Online :: Nonparametric Regression on a Graph - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
8 weeks ago by cshalizi
"The ‘Signal plus Noise’ model for nonparametric regression can be extended to the case of observations taken at the vertices of a graph. This model includes many familiar regression problems. This article discusses the use of the edges of a graph to measure roughness in penalized regression. Distance between estimate and observation is measured at every vertex in the L2 norm, and roughness is penalized on every edge in the L1 norm. Thus the ideas of total variation penalization can be extended to a graph. The resulting minimization problem presents special computational challenges, so we describe a new and fast algorithm and demonstrate its use with examples.
The examples include image analysis, a simulation applicable to discrete spatial variation, and classification. In our examples, penalized regression improves upon kernel smoothing in terms of identifying local extreme values on planar graphs. In all examples we use fully automatic procedures for setting the smoothing parameters."
to:NB
statistics
network_data_analysis
smoothing
regression
The examples include image analysis, a simulation applicable to discrete spatial variation, and classification. In our examples, penalized regression improves upon kernel smoothing in terms of identifying local extreme values on planar graphs. In all examples we use fully automatic procedures for setting the smoothing parameters."
8 weeks ago by cshalizi
Taylor & Francis Online :: Statistical Inference on Random Graphs: Comparative Power Analyses via Monte Carlo - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
8 weeks ago by cshalizi
"We present a comparative power analysis, via Monte Carlo, of various graph invariants used as statistics for testing graph homogeneity versus a “chatter” alternative—the existence of a local region of excessive activity. Our results indicate that statistical inference on random graphs, even in a relatively simple setting, can be decidedly nontrivial. We find that none of the graph invariants considered is uniformly most powerful throughout our space of alternatives. Code for reproducing all the simulation results presented in this article is available online."
to:NB
re:network_differences
statistics
hypothesis_testing
network_data_analysis
8 weeks ago by cshalizi
Taylor & Francis Online :: Graphical Diagnostics for Markov Models for Categorical Data - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
8 weeks ago by cshalizi
"Markov models are widely used as a method for describing categorical data that exhibit stationary and nonstationary autocorrelation. However, diagnostic methods are a largely overlooked topic for Markov models. We introduce two types of residuals for this purpose: one for assessing the length of runs between state changes, and the other for assessing the frequency with which the process moves from any given state to the other states. Methods for calculating the sampling distribution of both types of residuals are presented, enabling objective interpretation through graphical summaries. The graphical summaries are formed using a modification of the probability integral transformation that is applicable for discrete data. Residuals from simulated datasets are presented to demonstrate when the model is, and is not, adequate for the data. The two types of residuals are used to highlight inadequacies of a model posed for real data on seabed fauna from the marine environment."
to:NB
visual_display_of_quantitative_information
statistics
markov_models
to_teach:undergrad-ADA
8 weeks ago by cshalizi
[1203.6130] Spectral dimensionality reduction for HMMs
8 weeks ago by cshalizi
"Hidden Markov Models (HMMs) can be accurately approximated using co-occurrence frequencies of pairs and triples of observations by using a fast spectral method in contrast to the usual slow methods like EM or Gibbs sampling. We provide a new spectral method which significantly reduces the number of model parameters that need to be estimated, and generates a sample complexity that does not depend on the size of the observation vocabulary. We present an elementary proof giving bounds on the relative accuracy of probability estimates from our model. (Correlaries show our bounds can be weakened to provide either L1 bounds or KL bounds which provide easier direct comparisons to previous work.) Our theorem uses conditions that are checkable from the data, instead of putting conditions on the unobservable Markov transition matrix."
to:NB
to_read
markov_models
statistics
machine_learning
dimension_reduction
re:AoS_project
spectral_clustering
8 weeks ago by cshalizi
Bickel , Kleijn : The semiparametric Bernstein–von Mises theorem
8 weeks ago by cshalizi
"In a smooth semiparametric estimation problem, the marginal posterior for the parameter of interest is expected to be asymptotically normal and satisfy frequentist criteria of optimality if the model is endowed with a suitable prior. It is shown that, under certain straightforward and interpretable conditions, the assertion of Le Cam’s acclaimed, but strictly parametric, Bernstein–von Mises theorem [Univ. California Publ. Statist. 1 (1953) 277–329] holds in the semiparametric situation as well. As a consequence, Bayesian point-estimators achieve efficiency, for example, in the sense of Hájek’s convolution theorem [Z. Wahrsch. Verw. Gebiete 14 (1970) 323–330]. The model is required to satisfy differentiability and metric entropy conditions, while the nuisance prior must assign nonzero mass to certain Kullback–Leibler neighborhoods [Ghosal, Ghosh and van der Vaart Ann. Statist. 28 (2000) 500–531]. In addition, the marginal posterior is required to converge at parametric rate, which appears to be the most stringent condition in examples. The results are applied to estimation of the linear coefficient in partial linear regression, with a Gaussian prior on a smoothness class for the nuisance."
to:NB
statistics
bayesian_consistency
nonparametrics
bickel.peter
bernstein-von_mises
8 weeks ago by cshalizi
[1203.6502] Quantifying causal influences
8 weeks ago by cshalizi
"Common methods of causal inference generate directed acyclic graphs (DAGs) that formalize causal relations between n variables. Given the joint distribution of all these variables, the DAG contains all information about how intervening on one variable would change the distribution of the other n-1 variables. It remains, however, a non-trivial question how to quantify the causal influence of one variable on another one.
Here we propose a measure for causal strength that refers to direct effects and measure the "strength of an arrow" or a set of arrows. It is based on a hypothetical intervention that modifies the joint distribution by cutting the corresponding edge. The causal strength is then the relative entropy distance between the old and the new distribution.
We discuss other measures of causal strength like the average causal effect, transfer entropy and information flow and describe their limitations. We argue that our measure is also more appropriate for time series than the known ones.
Finally, we discuss conceptual problems in defining the strength of indirect effects."
to:NB
to_read
causality
graphical_models
information_theory
statistics
via:ded-maxim
Here we propose a measure for causal strength that refers to direct effects and measure the "strength of an arrow" or a set of arrows. It is based on a hypothetical intervention that modifies the joint distribution by cutting the corresponding edge. The causal strength is then the relative entropy distance between the old and the new distribution.
We discuss other measures of causal strength like the average causal effect, transfer entropy and information flow and describe their limitations. We argue that our measure is also more appropriate for time series than the known ones.
Finally, we discuss conceptual problems in defining the strength of indirect effects."
8 weeks ago by cshalizi
related tags
20th_century_history ⊕ academia ⊕ accuracy_vs_precision ⊕ adamic.lada ⊕ adams.terrence ⊕ additive_models ⊕ agent-based_models ⊕ ai ⊕ algorithmic_information_theory ⊕ allometric_scaling ⊕ america ⊕ american_history ⊕ american_south ⊕ analysis ⊕ analysis_of_variance ⊕ anderson.chris ⊕ anderson.norm ⊕ anomaly_detection ⊕ antidepressants ⊕ approximate_bayesian_computation ⊕ approximation ⊕ archaeology ⊕ arlot.sylvain ⊕ artificial_intelligence ⊕ astronomy ⊕ asymptotics ⊕ attractor_reconstruction ⊕ autism ⊕ automata_theory ⊕ automated_diagnosis ⊕ ay.nihat ⊕ bacanu.silviu-alin ⊕ bad_data_analysis ⊕ bad_science ⊕ bad_science_journalism ⊕ barron.andrew ⊕ bartlett.m.s. ⊕ base_rates ⊕ bayesianism ⊕ bayesian_consistency ⊕ bayesian_nonparametrics ⊕ bayes_rule ⊕ beirl.wolfgang ⊕ beran.jan ⊕ bergstrom.carl ⊕ berk.richard ⊕ berk.robert_h ⊕ bernstein-von-mises ⊕ bernstein-von_mises ⊕ biau.gerard ⊕ bibliometry ⊕ bickel.david ⊕ bickel.peter ⊕ biochemical_networks ⊕ bioinformatics ⊕ blanchard.gilles ⊕ blattman.chris ⊕ blei.david ⊕ blitzstein.joseph ⊕ blogged ⊕ blogging ⊕ blogs ⊕ books:noted ⊕ books:recommended ⊕ book_reviews ⊕ boosting ⊕ bootstrap ⊕ boris ⊕ boucheron.stephane ⊕ bousquet.olivier ⊕ branching_processes ⊕ breiman.leo ⊕ brillinger.david ⊕ brown.emery ⊕ brown.lawrence ⊕ buhlmann.peter ⊕ buntine.wray ⊕ burke.timothy ⊕ busy_busy_busy ⊕ cai.t._tony ⊕ calibration ⊕ CART ⊕ cartoons ⊕ caruana.rich ⊕ categorical_data ⊕ category_theory ⊕ catoni.olivier ⊕ cats ⊕ causality ⊕ causal_inference ⊕ cavalli-sforza ⊕ celisse.alain ⊕ central_limit_theorem ⊕ cesa-bianchi.nicolo ⊕ change-point_problem ⊕ change_of_representation ⊕ chaos ⊕ chatterjee.souav ⊕ chow-liu_trees ⊕ citation_networks ⊕ clarke.kevin ⊕ classifiers ⊕ clermont.gilles ⊕ climate_change ⊕ clustering ⊕ coarse-graining ⊕ coates.ta-nehisi ⊕ cognitive_science ⊕ collaborative_filtering ⊕ collective_cognition ⊕ collinearity ⊕ community_discovery ⊕ comparative_methods ⊕ complexity ⊕ compressed_sensing ⊕ computability ⊕ computational_complexity ⊕ computational_statistics ⊕ concentration_of_measure ⊕ conferences ⊕ confidence_sets ⊕ confirmation_bias ⊕ confounding ⊕ congress ⊕ consistency ⊕ contagion ⊕ content_analysis ⊕ context-free_grammars ⊕ contingency_tables ⊕ convexity ⊕ convex_sets ⊕ cool_if_true ⊕ copulas ⊕ corporations ⊕ cosmology ⊕ covariance ⊕ coveted ⊕ cox.david_r. ⊕ cramer-rao ⊕ crime ⊕ cross-validation ⊕ cultural_evolution ⊕ curse_of_dimensionality ⊕ curse_of_dimensonality ⊕ curve-estimation ⊕ curve_fitting ⊕ damouras.sotirios ⊕ das.kaustav ⊕ dasgupta.anirban ⊕ databases ⊕ dataset_shift ⊕ data_analysis ⊕ data_mining ⊕ data_sets ⊕ dawid.a.p. ⊕ dawid.philip ⊕ debunking ⊕ deceiving_us_has_become_an_industrial_process ⊕ decision-making ⊕ decision_theory ⊕ decision_trees ⊕ default_priors ⊕ degrees_of_freedom ⊕ delong.brad ⊕ del_moral.pierre ⊕ density_estimation ⊕ density_ratio_estimation ⊕ design_for_a_brain ⊕ development_policy ⊕ deviation_bounds ⊕ deviation_inequalities ⊕ devlin.bernie ⊕ devroye.luc ⊕ dewitt.helen ⊕ diaconis.persi ⊕ didelez.vanessa ⊕ differential_equations ⊕ differential_geometry ⊕ dimension_estimation ⊕ dimension_reduction ⊕ directed_information ⊕ discretization ⊕ distance_covariance ⊕ distributed_systems ⊕ distributions ⊕ diversity ⊕ donoho.david ⊕ douc.randal ⊕ dsges ⊕ dsm ⊕ dsquared ⊕ dudoit.sandrine ⊕ dynamical_systems ⊕ dynamical_systemss ⊕ dynamics_in_cognition ⊕ earthquakes ⊕ eckles.dean ⊕ ecology ⊕ econometrics ⊕ economics ⊕ economics_of_superstars ⊕ economic_history ⊕ education ⊕ EEG ⊕ ellenberg.jordan ⊕ empirical_likelihood ⊕ empirical_processes ⊕ em_algorithm ⊕ encompassing ⊕ ensemble_methods ⊕ entropy ⊕ entropy_estimation ⊕ epidemic_models ⊕ epidemiology ⊕ epistemology ⊕ ergodic_decomposition ⊕ ergodic_theory ⊕ error-in-variables ⊕ error_in_variables ⊕ error_statistics ⊕ estimation ⊕ estimation_of_dynamical_systems ⊕ events ⊕ evidence ⊕ evisceration ⊕ evolutionary_biology ⊕ evolutionary_optimization ⊕ evolutionary_psychology ⊕ exchangeable_arrays ⊕ exchangeable_sequences ⊕ expectation-maximization ⊕ experimental_design ⊕ experimental_psychology ⊕ experimental_sociology ⊕ experiments ⊕ exponential_families ⊕ exponential_family_random_graphs ⊕ factor_analysis ⊕ fact_checking ⊕ fan.jianqing ⊕ fast-and-frugal_heuristics ⊕ feature_selection ⊕ feedback ⊕ fienberg.steve ⊕ filtering ⊕ finance ⊕ financial_markets ⊕ financial_speculation ⊕ fingerprints ⊕ fink.daniel ⊕ fisher.r.a. ⊕ fisher_information ⊕ fleuret.francois ⊕ flocks_and_swarms ⊕ fluctuation-response ⊕ fmri ⊕ food_webs ⊕ foundations_of_statistics ⊕ fourier_analysis ⊕ fox.emily ⊕ fractals ⊕ franklin.charles ⊕ fraser.d.a.s. ⊕ fraud ⊕ freedman.david ⊕ freedman.david_a ⊕ freeman.peter ⊕ functional_connectivity ⊕ functional_data ⊕ functional_data_analysis ⊕ funny:academic ⊕ funny:geeky ⊕ funny:malicious ⊕ funny:unintentionally ⊕ game_theory ⊕ gaussian_processes ⊕ gelman.andrew ⊕ geman.donald ⊕ generalized_linear_models ⊕ genetics ⊕ gene_expression_data_analysis ⊕ genomics ⊕ genomic_control ⊕ genovese.chris ⊕ genovese.christopher ⊕ geology ⊕ geometry ⊕ geometry_of_statistical_inference ⊕ getoor.lise ⊕ geyer.charles ⊕ gibbs_distributions ⊕ gibrats_law ⊕ gigs ⊕ gini_coefficient ⊕ gives_economists_a_bad_name ⊕ glymour.clark ⊕ gneiting.tilmann ⊕ goerg.georg_m. ⊕ good-turing_estimation ⊕ good.i.j. ⊕ goodness-of-fit ⊕ good_causes ⊕ grade_inflation ⊕ grading ⊕ grammar_induction ⊕ grants ⊕ graphical_models ⊕ graph_limits ⊕ great_depression ⊕ green.peter_j. ⊕ grunwald.peter ⊕ gustafson.paul ⊕ guttorp.peter ⊕ guyon.isabelle ⊕ haavelmo.trygve ⊕ hacking.ian ⊕ handcock.mark ⊕ hansen.bruce ⊕ hansen.christian ⊕ hardle.wolfgang ⊕ harrison.matt ⊕ hart.jeffrey ⊕ haslinger.rob ⊕ have_read ⊕ healy.kieran ⊕ heard ⊕ heard_the_talk ⊕ heavy_tails ⊕ hendry.david ⊕ heritability ⊕ heteroskedasticity ⊕ heuristics ⊕ hierarchical_models ⊕ hierarchical_structure ⊕ high-dimensional_probability ⊕ high-dimensional_statistics ⊕ hilbert_space ⊕ hill.jennifer ⊕ historical_linguistics ⊕ history_of_economics ⊕ history_of_ideas ⊕ history_of_mathematics ⊕ history_of_science ⊕ history_of_statistics ⊕ hjort.nils_lid ⊕ hodrick-prescott_filter ⊕ hoeffdings_inequality ⊕ hoff.peter ⊕ hofling.holger ⊕ holmes.susan ⊕ hooker.giles ⊕ hopcroft.john ⊕ hoyer.patrik ⊕ huber.peter ⊕ huff.darrell ⊕ human_genetics ⊕ hypothesis_testing ⊕ iacus.stefano ⊕ identifiability ⊕ independence_testing ⊕ independent_components_analysis ⊕ independent_component_analysis ⊕ indirect_inference ⊕ induction ⊕ industrial_organization ⊕ inequalities ⊕ inequality ⊕ inference_to_latent_objects ⊕ influence ⊕ information_criteria ⊕ information_geometry ⊕ information_retrieval ⊕ information_theory ⊕ institutions ⊕ instrumental_variables ⊕ interview ⊕ intro_stats ⊕ inverse_problems ⊕ in_NB ⊕ iran ⊕ ising_model ⊕ i_see_what_you_did_there ⊕ jakulin.aleks ⊕ jiang.wenxin ⊕ jordan.michael_i. ⊕ kafadar.karen ⊕ kalisch.markus ⊕ kalman_filter ⊕ karhunen-loeve_decomposition ⊕ karl ⊕ kass.rob ⊕ kelly.kevin_t. ⊕ kempthorne.oscar ⊕ kernel_estimators ⊕ kernel_methods ⊕ king.gary ⊕ kirshner.sergey ⊕ kith_and_kin ⊕ klein.ezra ⊕ kleinberg.jon ⊕ klemens.ben ⊕ kolaczyk.eric ⊕ kolmogorov-smirnov-test ⊕ kontorovich.aryeh ⊕ kontoyiannis.ioannis ⊕ krijnen.wim ⊕ krivitsky.pavel ⊕ kronecker_graphs ⊕ lafferty.john ⊕ lagrange_multipliers ⊕ landauers_principle ⊕ lane.david ⊕ lang.kevin ⊕ langford.john ⊕ laplace_approximation ⊕ large_deviations ⊕ lasso ⊕ latent_dirichlet_allocation ⊕ latent_semantic_analysis ⊕ latent_variables ⊕ lauritzen.steffen ⊕ law ⊕ law_of_the_iterated_logarithm ⊕ lead ⊕ leamer.ed ⊕ learning_in_games ⊕ learning_theory ⊕ lebanon.guy ⊕ lebaron.blake ⊕ lee.ann ⊕ lee.ann_b. ⊕ leeb.hannes ⊕ lehmann.erich ⊕ lei.jing ⊕ levina.elizaveta ⊕ levina.liza ⊕ levitt.steven ⊕ levy_processes ⊕ le_cam.lucien ⊕ liberman.mark ⊕ lie_detection ⊕ likelihood ⊕ likelihood_ratio_tests ⊕ linear_algebra ⊕ linear_regression ⊕ linguistics ⊕ literary_criticism ⊕ liu.han ⊕ liu.richard ⊕ lives_of_the_scientists ⊕ logic ⊕ logical_positivism ⊕ logistic_regression ⊕ lolcats ⊕ long-memory_processes ⊕ long-range_dependence ⊕ low-rank_approximation ⊕ low-regret-learning ⊕ luca.diana ⊕ lugosi.gabor ⊕ luxburg.ulrike_von ⊕ machine_learning ⊕ machine_translation ⊕ macroeconomics ⊕ macro_from_micro ⊕ mandelbrot.benoit ⊕ manifold_learning ⊕ markov_models ⊕ martingales ⊕ massart.pascal ⊕ matching ⊕ mathematical_logic ⊕ maximum_entropy ⊕ mayo.deborah ⊕ mccloskey.deirdre ⊕ mean-field_theory ⊕ measure_theory ⊕ medicine ⊕ medieval_european_history ⊕ meier.lukas ⊕ mental_testing ⊕ meta-analysis ⊕ methodological_advice ⊕ methodology ⊕ method_of_moments ⊕ method_of_sieves ⊕ meyn.sean_p. ⊕ minimax ⊕ minimum_description_length ⊕ mis-specification_testing ⊕ missing_data ⊕ misspecification ⊕ mixing ⊕ mixture_models ⊕ mizon.grayham ⊕ model-checking ⊕ modeling ⊕ model_averaging ⊕ model_checking ⊕ model_search ⊕ model_selection ⊕ model_uncertainty ⊕ modularity ⊕ monte_carlo ⊕ morley.james ⊕ morvai.gusztav ⊕ moulines.eric ⊕ multiple_comparisons ⊕ multiple_testing ⊕ murray.charles ⊕ nadler.boaz ⊕ nardi.yuval ⊕ natural_history_of_truthiness ⊕ natural_language_processing ⊕ neal.radford ⊕ nearest_neighbors ⊕ networks ⊕ network_data_analysis ⊕ network_formation ⊕ network_sampling ⊕ neural_coding_and_decoding ⊕ neural_data_analysis ⊕ neural_modeling ⊕ neural_networks ⊕ neuroscience ⊕ neutral_models ⊕ neville.jennifer ⊕ neyman-pearson_lemma ⊕ neyman.jerzy ⊕ neyman_smooth_tests ⊕ nielsen.michael ⊕ nilsson_jacobi.martin ⊕ nobel.andrew ⊕ noel.hans ⊕ nolan.deborah ⊕ nominate ⊕ non-equilibrium ⊕ non-stationarity ⊕ nonparametrics ⊕ norvig.peter ⊕ nuisance_parameters ⊕ nyhan.brendan ⊕ obesity ⊕ obituaries ⊕ obvious_to_one_skilled_in_the_art ⊕ occams_razor ⊕ occupy_wall_street ⊕ official_statistics ⊕ online_learning ⊕ optimization ⊕ oracle_inequalities ⊕ order_statistics ⊕ ordinal_data ⊕ outliers ⊕ owen.art ⊕ p-values ⊕ pac-bayesian ⊕ paper_writing ⊕ partial_identification ⊕ particle_detectors ⊕ particle_filters ⊕ particle_physics ⊕ parzen.emanuel ⊕ pattern_recognition ⊕ pattison.philippa ⊕ pearl.judea ⊕ pearson ⊕ penn.mark ⊕ percival.daniel ⊕ perl ⊕ perturbation_theory ⊕ phase_transitions ⊕ philosophy_of_mind ⊕ philosophy_of_science ⊕ physics ⊕ physics_of_information ⊕ pillai.natesh ⊕ pittsburgh ⊕ please_give_me_strength ⊕ point_processes ⊕ political_economy ⊕ political_science ⊕ pollard.david ⊕ polling ⊕ popular_social_science ⊕ porter.mason ⊕ poverty ⊕ practices_relating_to_the_transmission_of_genetic_information ⊕ pre-validation ⊕ prediction ⊕ prediction_trees ⊕ prequentialism ⊕ principal_components ⊕ privacy ⊕ probability ⊕ programming ⊕ projection ⊕ proof_theory ⊕ propensity_scores ⊕ psychology ⊕ psychometrics ⊕ public_health ⊕ r ⊕ racine.jeffrey ⊕ racism ⊕ racist_idiocy ⊕ raginsky.maxim ⊕ randomization ⊕ random_fields ⊕ random_forests ⊕ random_matrices ⊕ random_matrix_theory ⊕ random_time_changes ⊕ rare_events ⊕ rational_expectations ⊕ rauchway.eric ⊕ ravikumar.pradeep ⊕ re:almost_none ⊕ re:AoS_project ⊕ re:bayes_as_evol ⊕ re:functional_communities ⊕ re:growing_ensemble_project ⊕ re:g_paper ⊕ re:homophily_and_confounding ⊕ re:knightian_uncertainty ⊕ re:LICORS ⊕ re:LoB_project ⊕ re:model_selection_for_networks ⊕ re:naive-semi-supervised ⊕ re:network_differences ⊕ re:network_model_selection ⊕ re:neutral_model_of_inquiry ⊕ re:phil-of-bayes_paper ⊕ re:smoothing_adjacency_matrices ⊕ re:social-networks-as-sensor-networks ⊕ re:stacs ⊕ re:what_is_the_right_null_model_for_linear_regression ⊕ re:XV_for_mixing ⊕ re:XV_for_networks ⊕ re:your_favorite_dsge_sucks ⊕ re:your_favorite_ergm_sucks ⊕ red_state_blue_state ⊕ regression ⊕ reinforcement_learning ⊕ relational_learning ⊕ renyi_entropy ⊕ replicator_dynamics ⊕ resampling ⊕ review_papers ⊕ rhetoric ⊕ richard.jean-francois ⊕ richards.joey ⊕ riedewald.mirek ⊕ rigollet.philippe ⊕ rinaldo.alessandro ⊕ ripley.brian ⊕ risk_assessment ⊕ risk_vs_uncertainty ⊕ robins.james ⊕ robustness ⊕ robust_statistics ⊕ roeder.kathryn ⊕ rosenblatt.murray ⊕ rosvall.martin ⊕ rubin.jonathan ⊕ running_dogs_of_reaction ⊕ ryabko.daniil ⊕ saddle-point_approximation ⊕ salakhutdinov.ruslan ⊕ salmon ⊕ sampling ⊕ sandler.mark ⊕ sardinia ⊕ sarkar.purnamrita ⊕ sarwate.anand ⊕ savage.leonard_j. ⊕ schafer.chad ⊕ schofield.lynne ⊕ science_journalism ⊕ scooped ⊕ scoring_rules ⊕ search_engines ⊕ selection_bias ⊕ self-centered ⊕ self-promotion ⊕ self-similarity ⊕ semantics ⊕ semi-supervised_learning ⊕ sensitive_dependence_on_initial_conditions ⊕ sequential_monte_carlo ⊕ series_of_footnotes ⊕ sethna.james ⊕ shanteau.james ⊕ sheu.chyong-hwa ⊕ shot_after_a_fair_trial ⊕ shrinkage ⊕ signal_processing ⊕ silver.nathan ⊕ simulation ⊕ simulation-based_inference ⊕ singular_value_decomposition ⊕ sleep ⊕ smoothing ⊕ snijders.tom ⊕ sober.elliott ⊕ social_life_of_the_mind ⊕ social_media ⊕ social_networks ⊕ social_neuroscience ⊕ social_science_methodology ⊕ sociology ⊕ sociology_of_science ⊕ software ⊕ sornette.didier ⊕ sorokina.daria ⊕ spanos.aris ⊕ sparsity ⊕ spatial_statistics ⊕ spectral_clustering ⊕ spectral_estimation ⊕ spectral_methods ⊕ speed.terry ⊕ splines ⊕ sports ⊕ stability_of_learning ⊕ standardized_testing ⊕ stanley.h._eugene ⊕ stark.philip ⊕ state-space_models ⊕ state-space_reconstruction ⊕ state_estimation ⊕ stationarity ⊕ stationary_features ⊕ statistical_inference_for_stochastic_processes ⊕ statistical_interaction ⊕ statistical_mechanics ⊕ statistics ⊖ stein.charles ⊕ steins_method ⊕ stepping_stone_model ⊕ stigler.stephen ⊕ stochastic_approximation ⊕ stochastic_differential_equations ⊕ stochastic_models ⊕ stochastic_processes ⊕ stochastic_volatility ⊕ structural_equations ⊕ structural_risk_minimization ⊕ studentization ⊕ sufficiency ⊕ sugiyama.masashi ⊕ summer_schools ⊕ superefficiency ⊕ supervenience ⊕ support_vector_machines ⊕ survival_analysis ⊕ systems_identification ⊕ taskar.ben ⊕ teaching ⊕ television ⊕ tetrad ⊕ text_mining ⊕ theory_of_the_novel ⊕ the_american_dilemma ⊕ thomas.andrew ⊕ tibshirani.robert ⊕ tibshirani.ryan ⊕ time_rescaling ⊕ time_series ⊕ to:blog ⊕ to:NB ⊕ topic_models ⊕ to_be_shot_after_a_fair_trial ⊕ to_read ⊕ to_teach ⊕ to_teach:advanced-stochastic-processes ⊕ to_teach:complexity-and-inference ⊕ to_teach:data-mining ⊕ to_teach:statcomp ⊕ to_teach:undergrad-ADA ⊕ to_teach:undergrad-research ⊕ track_down_references ⊕ turbulence ⊕ tutorials ⊕ two-sample_tests ⊕ unemployment ⊕ universal_prediction ⊕ us_politics ⊕ utter_stupidity ⊕ value_of_information ⊕ van_der_maas.h.l.j. ⊕ van_der_vaart.aad ⊕ van_de_geer.sara ⊕ van_handel.ramon ⊕ van_roy.benjamin ⊕ vapnik.v.n. ⊕ variable_selection ⊕ variance_components ⊕ variance_estimation ⊕ variational_inference ⊕ variational_methods ⊕ vc-dimension ⊕ ventura.valerie ⊕ verdinelli.isa ⊕ verzani.john ⊕ via:? ⊕ via:aaronsw ⊕ via:aaron_clauset ⊕ via:abbas-raza ⊕ via:ale ⊕ via:ariddell ⊕ via:arsyed ⊕ via:arthegall ⊕ via:chl ⊕ via:crooked_timber ⊕ via:deaneckles ⊕ via:ded-maxim ⊕ via:dpfeldman ⊕ via:erindanielson ⊕ via:flaxman ⊕ via:gelman ⊕ via:guslacerda ⊕ via:idiolect ⊕ via:jhofman ⊕ via:john-burke ⊕ via:judea_pearl ⊕ via:justin ⊕ via:kass ⊕ via:kathryn ⊕ via:klk ⊕ via:larry ⊕ via:logista ⊕ via:martens ⊕ via:matthew_berryman ⊕ via:mejn ⊕ via:moritz-heene ⊕ via:mreid ⊕ via:nielsen ⊕ via:nikete ⊕ via:rocha ⊕ via:ryan_t ⊕ via:scotte ⊕ via:shivak ⊕ via:slaniel ⊕ via:stodden ⊕ via:students ⊕ via:the_author ⊕ via:unfogged ⊕ via:vqv ⊕ via:wikipedia ⊕ violence ⊕ visual_display_of_quantitative_information ⊕ von_mises.richard ⊕ vovk.vladimir_g. ⊕ vu.vincent ⊕ wahba.grace ⊕ wainright.martin ⊕ wainwright.martin ⊕ wald.abraham ⊕ war ⊕ wasserman.larry ⊕ wavelets ⊕ weak_dependence ⊕ weaver.rhiannon ⊕ weiss.benjamin ⊕ wermuth.nanny ⊕ whats_gone_wrong_with_america ⊕ wheels:reinvention_of ⊕ why_oh_why_cant_we_have_a_better_press_corps ⊕ wiener-khinchin ⊕ wiener.norbert ⊕ wilks.s._s. ⊕ willett.rebecca ⊕ williamson.robert ⊕ wolfowitz.j. ⊕ xing.eric ⊕ yajima.masano ⊕ yu.bin ⊕ zenker.sven ⊕ zhang.tong ⊕ zhu.ji ⊕ ziliak.stephen ⊕ zilsel.edgar ⊕Copy this bookmark: