cshalizi + statistics   1020

Wang , Phillips : A specification test for nonlinear nonstationary models
"We provide a limit theory for a general class of kernel smoothed U-statistics that may be used for specification testing in time series regression with nonstationary data. The test framework allows for linear and nonlinear models with endogenous regressors that have autoregressive unit roots or near unit roots. The limit theory for the specification test depends on the self-intersection local time of a Gaussian process. A new weak convergence result is developed for certain partial sums of functions involving nonstationary time series that converges to the intersection local time process. This result is of independent interest and is useful in other applications. Simulations examine the finite sample performance of the test."
to:NB  time_series  non-stationarity  model-checking  statistics  misspecification 
10 days ago by cshalizi
Rigollet : Kullback–Leibler aggregation and misspecified generalized linear models
"In a regression setup with deterministic design, we study the pure aggregation problem and introduce a natural extension from the Gaussian distribution to distributions in the exponential family. While this extension bears strong connections with generalized linear models, it does not require identifiability of the parameter or even that the model on the systematic component is true. It is shown that this problem can be solved by constrained and/or penalized likelihood maximization and we derive sharp oracle inequalities that hold both in expectation and with high probability. Finally all the bounds are proved to be optimal in a minimax sense."
to:NB  regression  ensemble_methods  statistics 
10 days ago by cshalizi
[1205.3703] Generic chaining and the l1-penalty
"We address the choice of the tuning parameter $lambda$ in $ell_1$-penalized M-estimation. Our main concern is models which are highly nonlinear, such as the Gaussian mixture model. The number of parameters $p$ is moreover large, possibly larger than the number of observations $n$. The generic chaining technique of Talagrand[2005] is tailored for this problem. It leads to the choice $lambda asymp sqrt {log p / n}$, as in the standard Lasso procedure (which concerns the linear model and least squares loss)."
to:NB  to_read  statistics  empirical_processes  high-dimensional_statistics  van_de_geer.sara 
11 days ago by cshalizi
Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation - Fearnhead - 2012 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference."
to:NB  indirect_inference  estimation  statistics  approximate_bayesian_computation  computational_statistics  to_teach:complexity-and-inference  re:stacs 
13 days ago by cshalizi
[1205.2609] Which Spatial Partition Trees are Adaptive to Intrinsic Dimension?
"Recent theory work has found that a special type of spatial partition tree - called a random projection tree - is adaptive to the intrinsic dimension of the data from which it is built. Here we examine this same question, with a combination of theory and experiments, for a broader class of trees that includes k-d trees, dyadic trees, and PCA trees. Our motivation is to get a feel for (i) the kind of intrinsic low dimensional structure that can be empirically verified, (ii) the extent to which a spatial partition can exploit such structure, and (iii) the implications for standard statistical tasks such as regression, vector quantization, and nearest neighbor search."
to:NB  decision_trees  prediction  regression  statistics  dimension_reduction  machine_learning 
13 days ago by cshalizi
Likelihood inference for discriminating between long-memory and change-point models - Yau - 2012 - Journal of Time Series Analysis - Wiley Online Library
"We develop a likelihood ratio (LR) test procedure for discriminating between a short-memory time series with a change-point (CP) and a long-memory (LM) time series. Under the null hypothesis, the time series consists of two segments of short-memory time series with different means and possibly different covariance functions. The location of the shift in the mean is unknown. Under the alternative, the time series has no shift in mean but rather is LM. The LR statistic is defined as the normalized log-ratio of the Whittle likelihood between the CP model and the LM model, which is asymptotically normally distributed under the null. The LR test provides a parametric alternative to the CUSUM test proposed by Berkes et al. (2006). Moreover, the LR test is more general than the CUSUM test in the sense that it is applicable to changes in other marginal or dependence features other than a change-in-mean. We show its good performance in simulations and apply it to two data examples."
to:NB  time_series  change-point_problem  long-range_dependence  statistics  to_teach:undergrad-ADA  hypothesis_testing 
13 days ago by cshalizi
[1205.1828] The Natural Gradient by Analogy to Signal Whitening, and Recipes and Tricks for its Use
"The natural gradient allows for more efficient gradient descent by removing dependencies and biases inherent in a function's parameterization. Several papers present the topic thoroughly and precisely. It remains a very difficult idea to get your head around however. The intent of this note is to provide simple intuition for the natural gradient and its use. We review how an ill conditioned parameter space can undermine learning, introduce the natural gradient by analogy to the more widely understood concept of signal whitening, and present tricks and specific prescriptions for applying the natural gradient to learning problems."

Does this ever mention the phrase "Fisher information"?
to:NB  optimization  statistics  estimation  fisher_information  information_geometry 
18 days ago by cshalizi
[1203.3504] On Measurement Bias in Causal Inference
"This paper addresses the problem of measurement errors in causal inference and highlights several algebraic and graphical methods for eliminating systematic bias induced by such errors. In particulars, the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models."
to:NB  causal_inference  inference_to_latent_objects  pearl.judea  to_teach:undergrad-ADA  statistics  error_in_variables  via:arthegall 
18 days ago by cshalizi
Lai , Huang , Lee : Fixed and random effects selection in nonparametric additive mixed models
"This paper considers the problem of model selection in a nonparametric additive mixed modeling framework. The fixed effects are modeled nonparametrically using truncated series expansions with B-spline basis. Estimation and selection of such nonparametric fixed effects are simultaneously achieved by using the adaptive group lasso methodology, while the random effects are selected by a traditional backward selection mechanism. To facilitate the automatic selection of model dimension, computable expressions for the degrees of freedom for both the fixed and random effects components are derived, and the Bayesian Information criterion (BIC) is used to select the final model choice. Theoretically it is shown that this BIC model selection method is consistent, while computationally a practical algorithm is developed for solving the optimization problem involved. Simulation results show that the proposed methodology is often capable of selecting the correct significant fixed and random effects components, especially when the sample size and/or signal to noise ratio are not too small. The new method is also applied to two real data sets."
to:NB  regression  additive_models  statistics 
19 days ago by cshalizi
[1205.1406] Graph Prediction in a Low-Rank and Autoregressive Setting
"We study the problem of prediction for evolving graph data. We formulate the problem as the minimization of a convex objective encouraging sparsity and low-rank of the solution, that reflect natural graph properties. The convex formulation allows to obtain oracle inequalities and efficient solvers. We provide empirical results for our algorithm and comparison with competing methods, and point out two open questions related to compressed sensing and algebra of low-rank and sparse matrices."
to:NB  network_data_analysis  prediction  statistics  low-rank_approximation 
19 days ago by cshalizi
Accurately estimating neuronal correlation requires a new spike-sorting paradigm
"Neurophysiology is increasingly focused on identifying coincident activity among neurons. Strong inferences about neural computation are made from the results of such studies, so it is important that these results be accurate. However, the preliminary step in the analysis of such data, the assignment of spike waveforms to individual neurons (“spike-sorting”), makes a critical assumption which undermines the analysis: that spikes, and hence neurons, are independent. We show that this assumption guarantees that coincident spiking estimates such as correlation coefficients are biased. We also show how to eliminate this bias. Our solution involves sorting spikes jointly, which contrasts with the current practice of sorting spikes independently of other spikes. This new “ensemble sorting” yields unbiased estimates of coincident spiking, and permits more data to be analyzed with confidence, improving the quality and quantity of neurophysiological inferences. These results should be of interest outside the context of neuronal correlations studies. Indeed, simultaneous recording of many neurons has become the rule rather than the exception in experiments, so it is essential to spike sort correctly if we are to make valid inferences about any properties of, and relationships between, neurons."
to:NB  heard_the_talk  neuroscience  neural_data_analysis  ventura.valerie  kith_and_kin  statistics  inference_to_latent_objects 
20 days ago by cshalizi
Clarke , Clarke : Prediction in several conventional contexts
"We review predictive techniques from several traditional branches of statistics. Starting with prediction based on the normal model and on the empirical distribution function, we proceed to techniques for various forms of regression and classification. Then, we turn to time series, longitudinal data, and survival analysis. Our focus throughout is on the mechanics of prediction more than on the properties of predictors."

(to_teach tags are tentative.)
to:NB  prediction  statistics  classifiers  regression  to_teach:undergrad-ADA  to_teach:data-mining 
20 days ago by cshalizi
Ehm , Gneiting : Local proper scoring rules of order two
"Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if it encourages truthful reporting. It is local of order k if the score depends on the predictive density only through its value and the values of its derivatives of order up to k at the realizing event. Complementing fundamental recent work by Parry, Dawid and Lauritzen, we characterize the local proper scoring rules of order 2 relative to a broad class of Lebesgue densities on the real line, using a different approach. In a data example, we use local and nonlocal proper scoring rules to assess statistically postprocessed ensemble weather forecasts."
to:NB  prediction  scoring_rules  statistics  gneiting.tilmann 
21 days ago by cshalizi
Dawid , Lauritzen , Parry : Proper local scoring rules on discrete sample spaces
"A scoring rule is a loss function measuring the quality of a quoted probability distribution Q for a random variable X, in the light of the realized outcome x of X; it is proper if the expected score, under any distribution P for X, is minimized by quoting Q = P. Using the fact that any differentiable proper scoring rule on a finite sample space is the gradient of a concave homogeneous function, we consider when such a rule can be local in the sense of depending only on the probabilities quoted for points in a nominated neighborhood of x. Under mild conditions, we characterize such a proper local scoring rule in terms of a collection of homogeneous functions on the cliques of an undirected graph on the space . A useful property of such rules is that the quoted distribution Q need only be known up to a scale factor. Examples of the use of such scoring rules include Besag’s pseudo-likelihood and Hyvärinen’s method of ratio matching."
to:NB  prediction  scoring_rules  statistics  lauritzen.steffen  dawid.philip 
21 days ago by cshalizi
Parry , Dawid , Lauritzen : Proper local scoring rules
"We investigate proper scoring rules for continuous distributions on the real line. It is known that the log score is the only such rule that depends on the quoted density only through its value at the outcome that materializes. Here we allow further dependence on a finite number m of derivatives of the density at the outcome, and describe a large class of such m-local proper scoring rules: these exist for all even m but no odd m. We further show that for m ≥ 2 all such m-local rules can be computed without knowledge of the normalizing constant of the distribution."
to:NB  prediction  scoring_rules  lauritzen.steffen  dawid.philip  statistics 
21 days ago by cshalizi
[0805.1404] Adaptive estimation of a distribution function and its density in sup-norm loss by wavelet and spline projections
"Given an i.i.d. sample from a distribution $F$ on $mathbb{R}$ with uniformly continuous density $p_0$, purely data-driven estimators are constructed that efficiently estimate $F$ in sup-norm loss and simultaneously estimate $p_0$ at the best possible rate of convergence over H"older balls, also in sup-norm loss. The estimators are obtained by applying a model selection procedure close to Lepski's method with random thresholds to projections of the empirical measure onto spaces spanned by wavelets or $B$-splines. The random thresholds are based on suprema of Rademacher processes indexed by wavelet or spline projection kernels. This requires Bernstein-type analogs of the inequalities in Koltchinskii [Ann. Statist. 34 (2006) 2593-2656] for the deviation of suprema of empirical processes from their Rademacher symmetrizations."
to:NB  density_estimation  wavelets  splines  statistics  empirical_processes 
22 days ago by cshalizi
Testing parametric conditional distributions using the nonparametric smoothing method
"This paper proposes a new goodness-of-fit test for parametric conditional probability distributions using the nonparametric smoothing methodology. An asymptotic normal distribution is established for the test statistic under the null hypothesis of correct specification of the parametric distribution. The test is shown to have power against local alternatives converging to the null at certain rates. The test can be applied to testing for possible misspecifications in a wide variety of parametric models. A bootstrap procedure is provided for obtaining more accurate critical values for the test. Monte Carlo simulations show that the test has good power against some common alternatives."
to:NB  misspecification  density_estimation  smoothing  statistics  to_teach:undergrad-ADA 
22 days ago by cshalizi
Consistent Model Selection Criteria on High Dimensions
"Asymptotic properties of model selection criteria for high-dimensional regression models are studied where the dimension of covariates is much larger than the sample size. Several sufficient conditions for model selection consistency are provided. Non-Gaussian error distributions are considered and it is shown that the maximal number of covariates for model selection consistency depends on the tail behavior of the error distribution. Also, sufficient conditions for model selection consistency are given when the variance of the noise is neither known nor estimated consistently. Results of simulation studies as well as real data analysis are given to illustrate that finite sample performances of consistent model selection criteria can be quite different."
to:NB  model_selection  statistics  high-dimensional_probability 
25 days ago by cshalizi
"The huge Package for High-dimensional Undirected Graph Estimation in R"
"We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efficiency."
to:NB  to_teach:undergrad-ADA  graphical_models  statistics  kith_and_kin  wasserman.larry  roeder.kathryn  liu.han 
25 days ago by cshalizi
[1204.6703] Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation
"Topic models can be seen as a generalization of the clustering problem, in that they posit that observations are generated due to multiple latent factors (e.g. the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic probability vectors (the distributions over words for each topic), when only the words are observed and the corresponding topics are hidden.
"We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e. third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on k by k matrices, where k is the number of latent factors (e.g. the number of topics), rather than in the d-dimensional observed space (typically d >> k)."

That's a really remarkable claim, and I'd tag it to_be_shot_after_a_fair_trial if it weren't being made by genuinely serious people.
in_NB  to_read  latent_variables  topic_models  text_mining  mixture_models  statistics  machine_learning  cool_if_true  spectral_clustering 
27 days ago by cshalizi
[1204.6265] Statistical inference for dynamical systems: a review
"The topic of statistical inference for dynamical systems has been studied extensively across several fields. In this survey we focus on the problem of parameter estimation for non-linear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research."
to:NB  to_read  statistical_inference_for_stochastic_processes  dynamical_systems  statistics  time_series  state-space_models  state-space_reconstruction  pillai.natesh  via:ded-maxim 
28 days ago by cshalizi
[1006.1015] Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees
"Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960's. In bioinformatics, psychometrics and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and `generalizability' of these summaries. This paper provides an implementation of the geometric distance between trees developed by Billera, Holmes and Vogtmann (2001) [BHV] equally applicable to phylogenetic trees and hieirarchical clustering trees, and shows some of the applications in statistical inference for which this distance can be useful. In particular, since BHV have shown that the space of trees is negatively curved (a CAT(0) space), a natural representation of a collection of trees is a tree. We compare this representation to the Euclidean approximations of treespace made available through Multidimensional Scaling of the matrix of distances between trees. We also provide applications of the distances between trees to hierarchical clustering trees constructed from microarrays. Our method gives a new way of evaluating the influence both of certain columns (positions, variables or genes) and of certain rows (whether species, observations or arrays)."
to:NB  clustering  hierarchical_structure  holmes.susan  data_mining  statistics  to_teach:data-mining  gene_expression_data_analysis  via:ryan_t 
4 weeks ago by cshalizi
A Normal Law for the Plug-in Estimator of Entropy
"This paper establishes a sufficient condition for the asymptotic normality of the plug-in estimator of Shannon's entropy defined on a countable alphabet. The sufficient condition covers a range of cases with countably infinite alphabets, for which no normality results were previously known."
in_NB  entropy_estimation  statistics  information_theory 
4 weeks ago by cshalizi
[1204.5633] Noncentral Limit Theorem and the Bootstrap for Quantiles of Dependent Data
"We will show under minimal conditions on differentiability and dependence that the central limit theorem for quantiles holds and that the block bootstrap is weakly consistent. Under slightly stronger conditions, the bootstrap is strongly consistent. Without the differentiability condition, quantiles might have a non-normal asymptotic distribution and the bootstrap might fail."
to:NB  bootstrap  statistics  statistical_inference_for_stochastic_processes 
4 weeks ago by cshalizi
[1201.5871] Null models for network data
"The analysis of datasets taking the form of simple, undirected graphs continues to gain in importance across a variety of disciplines. Two choices of null model, the logistic-linear model and the implicit log-linear model, have come into common use for analyzing such network data, in part because each accounts for the heterogeneity of network node degrees typically observed in practice. Here we show how these both may be viewed as instances of a broader class of null models, with the property that all members of this class give rise to essentially the same likelihood-based estimates of link probabilities in sparse graph regimes. This facilitates likelihood-based computation and inference, and enables practitioners to choose the most appropriate null model from this family based on application context. Comparative model fits for a variety of network datasets demonstrate the practical implications of our results."
in_NB  network_data_analysis  have_read  statistics  estimation  approximation  re:smoothing_adjacency_matrices 
5 weeks ago by cshalizi
[1204.3915] Theory and Inference for a Class of Observation-driven Models with Application to Time Series of Counts
"This paper studies theory and inference related to a class of time series models that incorporates nonlinear dynamics. It is assumed that the observations follow a one-parameter exponential family of distributions given an accompanying process that evolves as a function of lagged observations. We employ an iterated random function approach and a special coupling technique to show that, under suitable conditions on the parameter space, the conditional mean process is a geometric moment contracting Markov chain and that the observation process is absolutely regular with geometrically decaying coefficients. Moreover the asymptotic theory of the maximum likelihood estimates of the parameters is established under some mild assumptions. These models are applied to two examples; the first is the number of transactions per minute of Ericsson stock and the second is related to return times of extreme events of Goldman Sachs Group stock."

--- Without reading beyond the abstract, I'm guessing chains with complete connections.
to:NB  time_series  markov_models  statistics 
5 weeks ago by cshalizi
[1204.3941] A Log-Linear Graphical Model for Inferring Genetic Networks from High-Throughput Sequencing Data
"We develop a novel method for estimating high-dimensional Poisson graphical models, the Log-Linear Graphical Model, allowing us to infer networks based on high-throughput sequencing data. Our model assumes that conditional on all other genes, each gene is Poisson, jointly defining a pair-wise Poisson Markov random field. We estimate our genetic networks via neighborhood selection by fitting `1-norm penalized log-linear models, an approach we call the Poisson Graphical Lasso. Additionally, we develop a fast parallel algorithm, permitting us to fit our graphical models to high-dimensional genomic data sets."
in_NB  graphical_models  gene_expression_data_analysis  lasso  network_data_analysis  statistics  regression 
5 weeks ago by cshalizi
Xu , McLeod : Further asymptotic properties of the generalized information criterion
"Asymptotic properties of the generalized information criterion for model selection are examined and new conditions under which this criterion is overfitting, consistent, or underfitting are derived."
in_NB  model_selection  information_criteria  statistics 
5 weeks ago by cshalizi
Ockham's Razor: Foundations - Carnegie Mellon Center for Formal Epistemology
Despite my presence on the program, this should actually be really good.

"Scientific theory choice is guided by judgments of simplicity, a bias frequently referred to as "Ockham's Razor". But what is simplicity and how, if at all, does it help science find the truth?  Should we view simple theories as means for obtaining accurate predictions, as classical statisticians recommend?  Or should we believe the theories themselves, as Bayesian methods seem to justify?  The aim of this workshop is to re-examine the foundations of Ockham's razor, with a firm focus on the connections, if any, between simplicity and truth. "
self-promotion  occams_razor  philosophy_of_science  epistemology  kelly.kevin_t.  kith_and_kin  mayo.deborah  vapnik.v.n.  sober.elliott  leeb.hannes  wasserman.larry  model_selection  statistics  complexity  machine_learning  learning_theory  grunwald.peter 
5 weeks ago by cshalizi
Xiao , Wu : Covariance matrix estimation for stationary time series
"We obtain a sharp convergence rate for banded covariance matrix estimates of stationary processes. A precise order of magnitude is derived for spectral radius of sample covariance matrices. We also consider a thresholded covariance matrix estimator that can better characterize sparsity if the true covariance matrix is sparse. As our main tool, we implement Toeplitz [Math. Ann. 70 (1911) 351–376] idea and relate eigenvalues of covariance matrices to the spectral densities or Fourier transforms of the covariances. We develop a large deviation result for quadratic forms of stationary processes using m-dependence approximation, under the framework of causal representation and physical dependence measures."
to:NB  time_series  statistics  estimation  variance_estimation 
6 weeks ago by cshalizi
Arias-Castro , Bubeck , Lugosi : Detection of correlations
"We consider the hypothesis testing problem of deciding whether an observed high-dimensional vector has independent normal components or, alternatively, if it has a small subset of correlated components. The correlated components may have a certain combinatorial structure known to the statistician. We establish upper and lower bounds for the worst-case (minimax) risk in terms of the size of the correlated subset, the level of correlation, and the structure of the class of possibly correlated sets. We show that some simple tests have near-optimal performance in many cases, while the generalized likelihood ratio test is suboptimal in some important cases."
to:NB  statistics  factor_analysis 
6 weeks ago by cshalizi
Bai , Li : Statistical analysis of factor models of high dimension
"This paper considers the maximum likelihood estimation of factor models of high dimension, where the number of variables (N) is comparable with or even greater than the number of observations (T). An inferential theory is developed. We establish not only consistency but also the rate of convergence and the limiting distributions. Five different sets of identification conditions are considered. We show that the distributions of the MLE estimators depend on the identification restrictions. Unlike the principal components approach, the maximum likelihood estimator explicitly allows heteroskedasticities, which are jointly estimated with other parameters. Efficiency of MLE relative to the principal components method is also considered."
to:NB  to_read  factor_analysis  statistics  high-dimensional_statistics 
6 weeks ago by cshalizi
Space–time modelling of coupled spatiotemporal environmental variables - Ippoliti - 2012 - Journal of the Royal Statistical Society: Series C (Applied Statistics) - Wiley Online Library
"dynamic factor model for spatiotemporal coupled environmental variables. The model is proposed in a state space formulation which, through Kalman recursions, allows a unified approach to prediction and estimation. Full probabilistic inference for the model parameters is facilitated by adapting standard Markov chain Monte Carlo algorithms for dynamic linear models to our model formulation. The predictive ability of the model is discussed for two different data sets with variables measured at two different scales. Some possibilities for further research are also outlined."
to:NB  spatial_statistics  state-space_models  statistics 
6 weeks ago by cshalizi
Local polynomial regression for symmetric positive definite matrices - Yuan - 2012 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"Local polynomial regression has received extensive attention for the non-parametric estimation of regression functions when both the response and the covariate are in Euclidean space. However, little has been done when the response is in a Riemannian manifold. We develop an intrinsic local polynomial regression estimate for the analysis of symmetric positive definite matrices as responses that lie in a Riemannian manifold with covariate in Euclidean space. The primary motivation and application of the methodology proposed is in computer vision and medical imaging. We examine two commonly used metrics, including the trace metric and the log-Euclidean metric on the space of symmetric positive definite matrices. For each metric, we develop a cross-validation bandwidth selection method, derive the asymptotic bias, variance and normality of the intrinsic local constant and local linear estimators, and compare their asymptotic mean-square errors. Simulation studies are further used to compare the estimators under the two metrics and to examine their finite sample performance. We use our method to detect diagnostic differences between diffusion tensors along fibre tracts in a study of human immunodeficiency virus."
to:NB  variance_estimation  statistics  regression  nonparametrics  kernel_estimators 
6 weeks ago by cshalizi
Quantifying the weight of evidence from a forensic fingerprint comparison: a new paradigm - Neumann - 2012 - Journal of the Royal Statistical Society: Series A (Statistics in Society) - Wiley Online Library
"The fingerprint has, with considerable justification, come to be regarded as the acme of forensic identification. Over the last century, millions of cases have been resolved world wide because of marks left at crime scenes. The comparison methodology has not evolved greatly during its history and it is universal practice to present fingerprint evidence to a court as a categoric opinion of identification or exclusion, or to classify the evidence as inconclusive and not to report it. There has been a growing movement to supplement the fingerprint examination process by one that has a statistical model, supported by appropriate databases for calculating numerical measures of weight of evidence. The movement calls for the establishment of a logical framework for informing conclusions, based on explicit assumptions and data and open to revision and improvement. The aim is to enable the numerical evaluation of evidence that would currently be reported as a categorical identification and also of evidence that would currently be classified as inconclusive. The paper presents the results of a project carried out by the Forensic Science Service that aims to attain this goal. After a historical review, we describe a formal model for assigning numerical values to configurations of minutiae in fingerprints. We describe how the parameters of the model have been optimized to take account of interoperator variability and distortion of the finger pad, and we present the results of a substantial validation experiment that was based on searches that have been carried out on the US national fingerprint database of approximately 600 million fingerprints."
to:NB  fingerprints  statistics 
6 weeks ago by cshalizi
[1204.2043] Unbiased Cultural Transmission in Time-Averaged Archaeological Assemblages
"Unbiased models are foundational in the archaeological study of cultural transmission. Applications have as- sumed that archaeological data represent synchronic samples, despite the accretional nature of the archaeological record. I document the circumstances under which time-averaging alters the distribution of model predictions. Richness is inflated in long-duration assemblages, and evenness is "flattened" compared to unaveraged samples. Tests of neutrality, employed to differentiate biased and unbiased models, suffer serious problems with Type I error under time-averaging. Finally, the time-scale over which time-averaging alters predictions is determined by the mean trait lifetime, providing a way to evaluate the impact of these effects upon archaeological samples."
to:NB  archaeology  cultural_evolution  statistics  neutral_models 
6 weeks ago by cshalizi
[1204.2296] Co-clustering for Directed Graphs; the Stochastic Co-Blockmodel and a Spectral Algorithm
"This paper extends the spectral clustering algorithm to directed networks in a way that co-clusters or bi-clusters the rows and columns of a graph Laplacian. Co-clustering leverages the increased complexity of asymmetric relationships to gain new insight into the structure of the directed network. To understand this algorithm and to study its asymptotic properties in a canonical setting, we propose the Stochastic Co-Blockmodel to encode co-clustering structure. This is the first statistical model of co-clustering and it is derived using the concept of stochastic equivalence that motivated the original Stochastic Blockmodel. Although directed spectral clustering is not derived from the Stochastic Co-Blockmodel, we show that, asymptotically, the algorithm can estimate the blocks in a high dimensional asymptotic setting in which the number of blocks grows with the number of nodes. The algorithm, model, and asymptotic results can all be extended to bipartite graphs."
in_NB  relational_learning  network_data_analysis  statistics  clustering  community_discovery  spectral_clustering  yu.bin 
6 weeks ago by cshalizi
[1204.2477] A Simple Explanation of A Spectral Algorithm for Learning Hidden Markov Models
"A simple linear algebraic explanation of the algorithm in "A Spectral Algorithm for Learning Hidden Markov Models" (COLT 2009). Most of the content is in Figure 2; the text just makes everything precise in four nearly-trivial claims."
to:NB  to_read  statistics  markov_models  re:AoS_project  spectral_methods 
6 weeks ago by cshalizi
[1204.2763] A Cram'er-Rao inequality for non differentiable models
"We compute a variance lower bound for unbiased estimators in specified statistical models. The construction of the bound is related to the original Cram'er-Rao bound, although it does not require the differentiability of the model. Moreover, we show our efficiency bound to be always greater than the Cram'er-Rao bound in smooth models, thus providing a sharper result."
to:NB  cramer-rao  statistics  estimation  information_geometry 
6 weeks ago by cshalizi
[1204.2762] On the Uniform Asymptotic Validity of Subsampling and the Bootstrap
"This paper provides conditions under which subsampling and the bootstrap can be used to construct estimators of the quantiles of the distribution of a root that behave well uniformly over a large class of distributions $mathbf P$. These results are then applied (i) to construct confidence regions that behave well uniformly over $mathbf P$ in the sense that the coverage probability tends to at least the nominal level uniformly over $mathbf P$ and (ii) to construct tests that behave well uniformly over $mathbf P$ in the sense that the size tends to no greater than the nominal level uniformly over $mathbf P$. Without these stronger notions of convergence, the asymptotic approximations to the coverage probability or size may be poor even in very large samples. Specific applications include the multivariate mean, testing moment inequalities, multiple testing, the empirical process, and $U$-statistics."
in_NB  bootstrap  statistics 
6 weeks ago by cshalizi
Assessing gross domestic product and inflation probability forecasts derived from Bank of England fan charts - Galbraith - 2011 - Journal of the Royal Statistical Society: Series A (Statistics in Society) - Wiley Online Library
"Density forecasts, including the pioneering Bank of England ‘fan charts’, are often used to produce forecast probabilities of a particular event. We use the Bank of England's forecast densities to calculate the forecast probability that annual rates of inflation and output growth exceed given thresholds. We subject these implicit probability forecasts to graphical and numerical diagnostic checks. We measure both their calibration and their resolution, providing both statistical and graphical interpretations of the results. The results reinforce earlier evidence on limitations of these forecasts and provide new evidence on their information content and on the relative performance of inflation and gross domestic product growth forecasts. In particular, gross domestic product forecasts show little or no ability to predict periods of low growth beyond the current quarter, in part because of the important role of data revisions."
to:NB  prediction  statistics  calibration  macroeconomics  to_teach:undergrad-ADA 
6 weeks ago by cshalizi
[0802.4192] Maxisets for Model Selection
"We address the statistical issue of determining the maximal spaces (maxisets) where model selection procedures attain a given rate of convergence. By considering first general dictionaries, then orthonormal bases, we characterize these maxisets in terms of approximation spaces. These results are illustrated by classical choices of wavelet model collections. For each of them, the maxisets are described in terms of functional spaces. We take a special care of the issue of calculability and measure the induced loss of performance in terms of maxisets."
in_NB  statistics  model_selection  approximation 
6 weeks ago by cshalizi
[0802.4363] Estimating the entropy of binary time series: Methodology, some theory and a simulation study
"Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal entropy estimator.
"**Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency."
in_NB  to_read  entropy_estimation  information_theory  time_series  statistics  kontoyiannis.ioannis  re:stacs 
6 weeks ago by cshalizi
[math/0612776] Uniform error bounds for smoothing splines
"Almost sure bounds are established on the uniform error of smoothing spline estimators in nonparametric regression with random designs. Some results of Einmahl and Mason (2005) are used to derive uniform error bounds for the approximation of the spline smoother by an ``equivalent'' reproducing kernel regression estimator, as well as for proving uniform error bounds on the reproducing kernel regression estimator itself, uniformly in the smoothing parameter over a wide range. This admits data-driven choices of the smoothing parameter."
to:NB  splines  regression  nonparametrics  statistics  learning_theory 
6 weeks ago by cshalizi
[math/0603130] Nonparametric methods for inference in the presence of instrumental variables
"We suggest two nonparametric approaches, based on kernel methods and orthogonal series to estimating regression functions in the presence of instrumental variables. For the first time in this class of problems, we derive optimal convergence rates, and show that they are attained by particular estimators. In the presence of instrumental variables the relation that identifies the regression function also defines an ill-posed inverse problem, the ``difficulty'' of which depends on eigenvalues of a certain integral operator which is determined by the joint density of endogenous and instrumental variables. We delineate the role played by problem difficulty in determining both the optimal convergence rate and the appropriate choice of smoothing parameter."
to:NB  to_read  regression  statistics  instrumental_variables  nonparametrics  to_teach:undergrad-ADA 
6 weeks ago by cshalizi
[1204.2581] Modeling Relational Data via Latent Factor Blockmodel
"In this paper we address the problem of modeling relational data, which appear in many applications such as social network analysis, recommender systems and bioinformatics. Previous studies either consider latent feature based models but disregarding local structure in the network, or focus exclusively on capturing local structure of objects based on latent blockmodels without coupling with latent characteristics of objects. To combine the benefits of the previous work, we propose a novel model that can simultaneously incorporate the effect of latent features and covariates if any, as well as the effect of latent structure that may exist in the data. To achieve this, we model the relation graph as a function of both latent feature factors and latent cluster memberships of objects to collectively discover globally predictive intrinsic properties of objects and capture latent block structure in the network to improve prediction performance. We also develop an optimization transfer algorithm based on the generalized EM-style strategy to learn the latent factors. We prove the efficacy of our proposed model through the link prediction task and cluster analysis task, and extensive experiments on the synthetic data and several real world datasets suggest that our proposed LFBM model outperforms the other state of the art approaches in the evaluated tasks."
in_NB  network_data_analysis  community_discovery  statistics  inference_to_latent_objects  factor_analysis  relational_learning 
6 weeks ago by cshalizi
[1204.1563] Generalized Error Exponents for Sparse Sample Goodness of Fit Tests
"We investigate the sparse sample goodness-of-fit problem, where the number of samples $n$ is smaller than the size of the alphabet $m$. The goal of this work is to find an appropriate criterion to analyze statistical tests in this setting. A suitable model for analysis is the high-dimensional model in which both $n$ and $m$ tend to infinity, and $n=o(m)$. We propose a new performance criterion based on large deviation analysis, which generalizes the classical error exponent applicable for large sample problems (in which $m=O(n)$). This new criterion provides insights that are not available from asymptotic consistency or CLT analysis. The main results are:
(i) The best achievable probability of error $P_e$ decays as $-log(P_e)=(n^2/m)(1+o(1))J$ for some $J>0$.
(ii) A well-known coincidence-based test attains the optimal generalized error exponent.
(iii) The widely used Pearson's chi-square test has J=0.
(iv) The contributions (i)-(iii) are established under the assumption that the distribution under the null hypothesis is uniform. For the non-uniform case, a new test is proposed, with a non-zero generalized error exponent."
to:NB  hypothesis_testing  re:LICORS  statistics  large_deviations  goodness-of-fit 
6 weeks ago by cshalizi
Colombo , Maathuis , Kalisch , Richardson : Learning high-dimensional directed acyclic graphs with latent and selection variables
"We consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables. The FCI (Fast Causal Inference) algorithm has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose the new RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct in the asymptotic limit. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."

--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
in_NB  have_read  statistics  graphical_models  causal_inference  sparsity  to_teach:undergrad-ADA 
7 weeks ago by cshalizi
Cook , Forzani , Rothman : Estimating sufficient reductions of the predictors in abundant high-dimensional regressions
"We study the asymptotic behavior of a class of methods for sufficient dimension reduction in high-dimension regressions, as the sample size and number of predictors grow in various alignments. It is demonstrated that these methods are consistent in a variety of settings, particularly in abundant regressions where most predictors contribute some information on the response, and oracle rates are possible. Simulation results are presented to support the theoretical conclusion."
to:NB  regression  dimension_reduction  sufficiency  statistics 
7 weeks ago by cshalizi
[1203.3083] Clustering in networks with the collapsed Stochastic Block Model
"We present an efficient MCMC algorithm to cluster the nodes of a network such that nodes with similar role in the network are clustered together. This is known as block-modelling or block-clustering. We extend the stochastic blockmodel (SBM) of Nowicki & Snijders (2001), by exploiting parameter collapsing to integrate out block parameters. The resulting model defines a posterior over the number of clusters and cluster memberships. Sampling from this model is simpler than from the original SBM as transdimensional MCMC can be avoided. Moreover, our extensions allow the number of clusters to be directly estimated, rather than given as an input parameter. The algorithm is based on the allocation sampler of Nobile & Fearnside (2007). We use synthetic and real data to test the speed and accuracy of our model and algorithm, including the ability to estimate the number of clusters. The algorithm can scale to networks with up to ten thousand nodes."
in_NB  network_data_analysis  community_discovery  statistics 
7 weeks ago by cshalizi
[1203.0683] A Method of Moments for Mixture Models and Hidden Markov Models
"Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high computational and sample complexity which typically scale exponentially with the number of mixture components. This work develops an efficient method of moments approach to parameter estimation for a broad class of high-dimensional mixture models with many components, including multi-view mixtures of Gaussians (such as mixtures of axis-aligned Gaussians) and hidden Markov models. The new method leads to rigorous unsupervised learning results for mixture models that were not achieved by previous works; and, because of its simplicity, it also constitutes a viable alternative to EM for practical deployment."

Clever: some mixture models can be characterized by expectations, covariances, and third-order mixed moments, so you just need to estimate tensors up to third order, and not very high moments of vectors (which are very noisy) and do some linear algebra. I should probably re-read because I couldn't reproduce this at the board.
in_NB  statistics  estimation  mixture_models  markov_models  state-space_models  have_read 
7 weeks ago by cshalizi
[1203.1515] Multiple Change-Point Estimation in Stationary Ergodic Time-Series
"The multiple change-point problem is considered in the most general setting, where the only assumption made on the time-series distributions generating the data is that they are stationary ergodic. No modeling, independence or parametric assumptions are made. While the need for such a general setting is dictated by real applications, the problem of change-point estimation becomes a difficult unsupervised learning problem. In this work a novel algorithm for solving this problem is proposed, and it is shown to be asymptotically consistent under the general assumptions considered."
to:NB  change-point_problem  time_series  ergodic_theory  statistics  statistical_inference_for_stochastic_processes  ryabko.daniil 
7 weeks ago by cshalizi
[1203.4354] Asymptotic Confidence Sets for General Nonparametric Regression and Classification by Regularized Kernel Methods
"Regularized kernel methods such as, e.g., support vector machines and least-squares support vector regression constitute an important class of standard learning algorithms in machine learning. Theoretical investigations concerning asymptotic properties have manly focused on rates of convergence during the last years but there are only very few and limited (asymptotic) results on statistical inference so far. As this is a serious limitation for their use in mathematical statistics, the goal of the article is to fill this gap. Based on asymptotic normality of many of these methods, the article derives a strongly consistent estimator for the unknown covariance matrix of the limiting normal distribution. In this way, we obtain asymptotically correct confidence sets for $psi(f_{P,lambda_0})$ where $f_{P,lambda_0}$ denotes the minimizer of the regularized risk in the reproducing kernel Hilbert space $H$ and $psi:Hrightarrowmathds{R}^m$ is any Hadamard-differentiable functional. Applications include (multivariate) pointwise confidence sets for values of $f_{P,lambda_0}$ and confidence sets for gradients, integrals, and norms."
to:NB  confidence_sets  kernel_methods  statistics  nonparametrics  regression  classifiers 
7 weeks ago by cshalizi
[1204.0033] Transforming Graph Representations for Statistical Relational Learning
"Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed."
in_NB  relational_learning  statistics  machine_learning  neville.jennifer  change_of_representation 
7 weeks ago by cshalizi
[no title]
"Conditional independence relations involving latent variables do not necessarily imply observable independences. They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsification and identification. The literature on computing such constraints often involve a deterministic underlying data generating process in a counterfactual framework. If an analyst is ignorant of the nature of the underlying mechanisms then they may wish to use a model which allows the underlying mechanisms to be probabilistic. A method of computation for a weaker model without any determinism is given here and demonstrated for the instrumental variable model, though applicable to other models. The approach is based on the analysis of mappings with convex polytopes in a decision theoretic framework and can be implemented in readily available polyhedral computation software. Well known constraints and bounds are replicated in a probabilistic model and novel ones are computed for instrumental variable models without non-deterministic versions of the randomization, exclusion restriction and monotonicity assumptions respectively."

(From a quick scan, this looks too heavy to actually teach in ADAfaEPoV, but it's so tagged to remind me to include a reference.)
to:NB  causal_inference  partial_identification  statistics  instrumental_variables  to_teach:undergrad-ADA 
7 weeks ago by cshalizi
Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
"We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples."
--- I wonder whether this hasn't some application to the PC algorithm?
to:NB  graphical_models  lasso  sparsity  statistics  heard_the_talk 
7 weeks ago by cshalizi
A Kernel Two-Sample Test
"We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distribution-free tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests."
in_NB  to_read  hilbert_space  kernel_methods  goodness-of-fit  statistics  concentration_of_measure  probability  two-sample_tests  re:network_differences 
7 weeks ago by cshalizi
Structured Sparsity and Generalization
"We present a data dependent generalization bound for a large class of regularized algorithms which implement structured sparsity constraints. The bound can be applied to standard squared-norm regularization, the Lasso, the group Lasso, some versions of the group Lasso with overlapping groups, multiple kernel learning and other regularization schemes. In all these cases competitive results are obtained. A novel feature of our bound is that it can be applied in an infinite dimensional setting such as the Lasso in a separable Hilbert space or multiple kernel learning with a countable number of kernels."
to:NB  learning_theory  regression  sparsity  statistics  lasso 
7 weeks ago by cshalizi
[1203.6898] Long-term stability of sequential Monte Carlo methods under verifiable conditions
"This paper discusses particle filtering in general hidden Markov models (HMMs) and presents novel theoretical results on the long-term stability of bootstrap-type particle filters. More specifically, we establish that the asymptotic variance of the Monte Carlo estimates produced by the bootstrap filter is uniformly bounded in time. On the contrary to most previous results of this type, which in general presuppose that the state space of the hidden state process is compact (an assumption that is rarely satisfied in practice), our very mild assumptions are satisfied for a large class of HMMs with possibly non-compact state space. In addition, we derive a similar time uniform bound on the asymptotic Lp error. Importantly, our results hold for misspecified models, i.e. we do not at all assume that the data entering into the particle filter originate from the model governing the dynamics of the particles or not even from an HMM."
to:NB  particle_filters  stochastic_processes  time_series  state_estimation  state-space_models  markov_models  statistics 
8 weeks ago by cshalizi
[0804.0991] Quadratic distances on probabilities: A unified foundation
"This work builds a unified framework for the study of quadratic form distance measures as they are used in assessing the goodness of fit of models. Many important procedures have this structure, but the theory for these methods is dispersed and incomplete. Central to the statistical analysis of these distances is the spectral decomposition of the kernel that generates the distance. We show how this determines the limiting distribution of natural goodness-of-fit tests. Additionally, we develop a new notion, the spectral degrees of freedom of the test, based on this decomposition. The degrees of freedom are easy to compute and estimate, and can be used as a guide in the construction of useful procedures in this class."
to:NB  statistics  goodness-of-fit 
8 weeks ago by cshalizi
[0803.0402] A note on sensitivity of principal component subspaces and the efficient detection of influential observations in high dimensions
"In this paper we introduce an influence measure based on second order expansion of the RV and GCD measures for the comparison between unperturbed and perturbed eigenvectors of a symmetric matrix estimator. Example estimators are considered to highlight how this measure compliments recent influence analysis. Importantly, we also show how a sample based version of this measure can be used to accurately and efficiently detect influential observations in practice."
to:NB  principal_components  statistics  to_teach:undergrad-ADA 
8 weeks ago by cshalizi
[0803.0835] Goodness-of-fit tests for Markovian time series models: Central limit theory and bootstrap approximations
"New goodness-of-fit tests for Markovian models in time series analysis are developed which are based on the difference between a fully nonparametric estimate of the one-step transition distribution function of the observed process and that of the model class postulated under the null hypothesis. The model specification under the null allows for Markovian models, the transition mechanisms of which depend on an unknown vector of parameters and an unspecified distribution of i.i.d. innovations. Asymptotic properties of the test statistic are derived and the critical values of the test are found using appropriate bootstrap schemes. General properties of the bootstrap for Markovian processes are derived. A new central limit theorem for triangular arrays of weakly dependent random variables is obtained. For the proof of stochastic equicontinuity of multidimensional empirical processes, we use a simple approach based on an anisotropic tiling of the space. The finite-sample behavior of the proposed test is illustrated by some numerical examples and a real-data application is given."
in_NB  statistics  statistical_inference_for_stochastic_processes  bootstrap  markov_models  goodness-of-fit 
8 weeks ago by cshalizi
[0804.0678] Consistency of spectral clustering
"Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that, for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result, we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering."
to:NB  statistics  machine_learning  clustering  spectral_clustering 
8 weeks ago by cshalizi
[math/0609514] Sequential Monte Carlo smoothing with application to parameter estimation in non-linear state space models
"This paper concerns the use of sequential Monte Carlo methods (SMC) for smoothing in general state space models. A well-known problem when applying the standard SMC technique in the smoothing mode is that the resampling mechanism introduces degeneracy of the approximation in the path space. However, when performing maximum likelihood estimation via the EM algorithm, all functionals involved are of additive form for a large subclass of models. To cope with the problem in this case, a modification of the standard method (based on a technique proposed by Kitagawa and Sato) is suggested. Our algorithm relies on forgetting properties of the filtering dynamics and the quality of the estimates produced is investigated, both theoretically and via simulations."
to:NB  statistics  time_series  state_estimation  state-space_models  particle_filters 
8 weeks ago by cshalizi
[1203.6360] You had me at hello: How phrasing affects memorability
"Understanding the ways in which information achieves widespread public awareness is a research question of significant interest. We consider whether, and how, the way in which the information is phrased --- the choice of words and sentence structure --- can affect this process. To this end, we develop an analysis framework and build a corpus of movie quotes, annotated with memorability information, in which we are able to control for both the speaker and the setting of the quotes. We find significant differences between memorable and non-memorable quotes in several key dimensions. One is lexical distinctiveness: in aggregate, memorable quotes use less common word choices, but at the same time are built upon a scaffolding of common syntactic patterns; another is that memorable quotes tend to be more general in ways that make them easy to apply in new contexts. We also show how the concept of "memorable language" can be extended across domains."
to:NB  linguistics  statistics  cultural_evolution 
8 weeks ago by cshalizi
[1203.5673] Effect of Nonstationarity on Models Inferred from Neural Data
"Neurons subject to a common non-stationary input may exhibit a correlated firing behavior. Correlations in the statistics of neural spike trains also arise as the effect of interaction between neurons. Here we show that these two situations can be distinguished, with machine learning techniques, provided the data are rich enough. In order to do this, we study the problem of inferring a kinetic Ising model, stationary or nonstationary, from the available data. We apply the inference procedure to two data sets: one from salamander retinal ganglion cells and the other from a realistic computational cortical network model. We show that many aspects of the concerted activity of the salamander retinal neurons can be traced simply to the external input. A model of non-interacting neurons subject to a non-stationary external field outperforms a model with stationary input with couplings between neurons, even accounting for the differences in the number of model parameters. When couplings are added to the non-stationary model, for the retinal data, little is gained: the inferred couplings are generally not significant. Likewise, the distribution of the sizes of sets of neurons that spike simultaneously and the frequency of spike patterns as function of their rank (Zipf plots) are well-explained by an independent-neuron model with time-dependent external input, and adding connections to such a model does not offer significant improvement. For the cortical model data, robust couplings, well correlated with the real connections, can be inferred using the non-stationary model. Adding connections to this model slightly improves the agreement with the data for the probability of synchronous spikes but hardly affects the Zipf plot."
to:NB  neural_data_analysis  statistics  time_series 
8 weeks ago by cshalizi
[1203.5950] Capturing the time-varying drivers of an epidemic using stochastic dynamical systems
"Epidemics are often modelled using state-space models based on dynamical systems, observed through partial and noisy data. In this paper we develop stochastic extensions to the popular SEIR model with parameters evolving in time, in order to capture unknown influences of changing behaviors, public interventions, seasonal effects etc. Our models assign diffusion processes for the time-varying parameters, and our inferential procedure is based on the particle Markov Chain Monte Carlo algorithm, suitably adjusted to accommodate the features of this challenging nonlinear stochastic model. The performance of the proposed computational methods is validated on simulated data and the adopted model is applied to the 2009 A/H1N1 pandemic in England. In addition to estimating the trajectories of the effective contact rate, the methodology is applied in real time to provide evidence in related public health decisions."
to:NB  time_series  epidemic_models  state-space_models  statistics 
8 weeks ago by cshalizi
[1203.5471] The Bayesian Analysis of Complex, High-Dimensional Models: Can it be CODA?
"We consider the Bayesian analysis of a few complex, high-dimensional models and show that intuitive priors, which are not tailored to the fine details of the data model and the estimated parameters are going to fail in situations in which simple good frequentist estimators exit. The models we consider are, partially observed sample, the partial linear model, estimating linear and quadratic functionals of a white noise models, and estimating with stopping times. We argue that these findings do not contradict a strong version of Doob's consistency theorem which claims that the existence of a uniformly $sqrt n$ consistent estimator ensures that the Bayes posterior is $sqrt n$ consistent for values of the parameter with prior probability 1."
to:NB  statistics  bayesian_consistency 
8 weeks ago by cshalizi
[1203.5829] Ensemble estimators for multivariate entropy estimation
"The problem of estimation of density functionals like entropy and mutual information has received much attention in the statistics and information theory communities. A large class of estimators of functionals of the probability density suffer from the curse of dimensionality, wherein the exponent in the MSE rate of convergence decays increasingly slowly as the dimension $d$ of the samples increases. In particular, the rate is often glacially slow of order $O(T^{-{gamma}/{d}})$, where $T$ is the number of samples, and $gamma>0$ is a rate parameter. Examples of such estimators include kernel density estimators, $k$-NN density estimators, $k$-NN entropy estimators, intrinsic dimension estimators and other examples. In this paper, we propose a weighted convex combination of an ensemble of such estimators, where optimal weights can be chosen such that the weighted estimator converges at a much faster dimension invariant rate of $O(T^{-1})$. Furthermore, we show that these optimal weights can be determined by solving a convex optimization problem which can be performed offline and does not require training data. We illustrate the superior performance of our weighted estimator for two important applications: (i) estimating the Panter-Dite distortion-rate factor and (ii) estimating the Shannon entropy for testing the probability distribution of a random sample."
in_NB  ensemble_methods  entropy_estimation  statistics 
8 weeks ago by cshalizi
[1203.5974] The Concentration and Stability of the Community Detecting Functions on Random Networks
"We propose a general form of community detecting functions for finding the communities or the optimal partition of a random network, and examine the concentration and stability of the function values using the bounded difference martingale method. We derive LDP inequalities for both the general case and several specific community detecting functions: modularity, graph bipartitioning and q-Potts community structure. We also discuss the concentration and stability of community detecting functions on different types of random networks: the sparse and non-sparse networks and some examples such as ER and CL networks."
in_NB  to_read  community_discovery  network_data_analysis  statistics 
8 weeks ago by cshalizi
Taylor & Francis Online :: Bayesian Nonparametric Modeling for Causal Inference - Journal of Computational and Graphical Statistics - Volume 20, Issue 1
"Researchers have long struggled to identify causal effects in nonexperimental settings. Many recently proposed strategies assume ignorability of the treatment assignment mechanism and require fitting two models—one for the assignment mechanism and one for the response surface. This article proposes a strategy that instead focuses on very flexibly modeling just the response surface using a Bayesian nonparametric modeling procedure, Bayesian Additive Regression Trees (BART). BART has several advantages: it is far simpler to use than many recent competitors, requires less guesswork in model fitting, handles a large number of predictors, yields coherent uncertainty intervals, and fluidly handles continuous treatment variables and missing data for the outcome variable. BART also naturally identifies heterogeneous treatment effects. BART produces more accurate estimates of average treatment effects compared to propensity score matching, propensity-weighted estimators, and regression adjustment in the nonlinear simulation situations examined. Further, it is highly competitive in linear settings with the “correct” model, linear regression. Supplemental materials including code and data to replicate simulations and examples from the article as well as methods for population inference are available online."
to:NB  regression  causal_inference  nonparametrics  statistics  hill.jennifer 
8 weeks ago by cshalizi
Taylor & Francis Online :: Nonparametric Regression on a Graph - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
"The ‘Signal plus Noise’ model for nonparametric regression can be extended to the case of observations taken at the vertices of a graph. This model includes many familiar regression problems. This article discusses the use of the edges of a graph to measure roughness in penalized regression. Distance between estimate and observation is measured at every vertex in the L2 norm, and roughness is penalized on every edge in the L1 norm. Thus the ideas of total variation penalization can be extended to a graph. The resulting minimization problem presents special computational challenges, so we describe a new and fast algorithm and demonstrate its use with examples.

The examples include image analysis, a simulation applicable to discrete spatial variation, and classification. In our examples, penalized regression improves upon kernel smoothing in terms of identifying local extreme values on planar graphs. In all examples we use fully automatic procedures for setting the smoothing parameters."
to:NB  statistics  network_data_analysis  smoothing  regression 
8 weeks ago by cshalizi
Taylor & Francis Online :: Statistical Inference on Random Graphs: Comparative Power Analyses via Monte Carlo - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
"We present a comparative power analysis, via Monte Carlo, of various graph invariants used as statistics for testing graph homogeneity versus a “chatter” alternative—the existence of a local region of excessive activity. Our results indicate that statistical inference on random graphs, even in a relatively simple setting, can be decidedly nontrivial. We find that none of the graph invariants considered is uniformly most powerful throughout our space of alternatives. Code for reproducing all the simulation results presented in this article is available online."
to:NB  re:network_differences  statistics  hypothesis_testing  network_data_analysis 
8 weeks ago by cshalizi
Taylor & Francis Online :: Graphical Diagnostics for Markov Models for Categorical Data - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
"Markov models are widely used as a method for describing categorical data that exhibit stationary and nonstationary autocorrelation. However, diagnostic methods are a largely overlooked topic for Markov models. We introduce two types of residuals for this purpose: one for assessing the length of runs between state changes, and the other for assessing the frequency with which the process moves from any given state to the other states. Methods for calculating the sampling distribution of both types of residuals are presented, enabling objective interpretation through graphical summaries. The graphical summaries are formed using a modification of the probability integral transformation that is applicable for discrete data. Residuals from simulated datasets are presented to demonstrate when the model is, and is not, adequate for the data. The two types of residuals are used to highlight inadequacies of a model posed for real data on seabed fauna from the marine environment."
to:NB  visual_display_of_quantitative_information  statistics  markov_models  to_teach:undergrad-ADA 
8 weeks ago by cshalizi
[1203.6130] Spectral dimensionality reduction for HMMs
"Hidden Markov Models (HMMs) can be accurately approximated using co-occurrence frequencies of pairs and triples of observations by using a fast spectral method in contrast to the usual slow methods like EM or Gibbs sampling. We provide a new spectral method which significantly reduces the number of model parameters that need to be estimated, and generates a sample complexity that does not depend on the size of the observation vocabulary. We present an elementary proof giving bounds on the relative accuracy of probability estimates from our model. (Correlaries show our bounds can be weakened to provide either L1 bounds or KL bounds which provide easier direct comparisons to previous work.) Our theorem uses conditions that are checkable from the data, instead of putting conditions on the unobservable Markov transition matrix."
to:NB  to_read  markov_models  statistics  machine_learning  dimension_reduction  re:AoS_project  spectral_clustering 
8 weeks ago by cshalizi
Bickel , Kleijn : The semiparametric Bernstein–von Mises theorem
"In a smooth semiparametric estimation problem, the marginal posterior for the parameter of interest is expected to be asymptotically normal and satisfy frequentist criteria of optimality if the model is endowed with a suitable prior. It is shown that, under certain straightforward and interpretable conditions, the assertion of Le Cam’s acclaimed, but strictly parametric, Bernstein–von Mises theorem [Univ. California Publ. Statist. 1 (1953) 277–329] holds in the semiparametric situation as well. As a consequence, Bayesian point-estimators achieve efficiency, for example, in the sense of Hájek’s convolution theorem [Z. Wahrsch. Verw. Gebiete 14 (1970) 323–330]. The model is required to satisfy differentiability and metric entropy conditions, while the nuisance prior must assign nonzero mass to certain Kullback–Leibler neighborhoods [Ghosal, Ghosh and van der Vaart Ann. Statist. 28 (2000) 500–531]. In addition, the marginal posterior is required to converge at parametric rate, which appears to be the most stringent condition in examples. The results are applied to estimation of the linear coefficient in partial linear regression, with a Gaussian prior on a smoothness class for the nuisance."
to:NB  statistics  bayesian_consistency  nonparametrics  bickel.peter  bernstein-von_mises 
8 weeks ago by cshalizi
[1203.6502] Quantifying causal influences
"Common methods of causal inference generate directed acyclic graphs (DAGs) that formalize causal relations between n variables. Given the joint distribution of all these variables, the DAG contains all information about how intervening on one variable would change the distribution of the other n-1 variables. It remains, however, a non-trivial question how to quantify the causal influence of one variable on another one.
Here we propose a measure for causal strength that refers to direct effects and measure the "strength of an arrow" or a set of arrows. It is based on a hypothetical intervention that modifies the joint distribution by cutting the corresponding edge. The causal strength is then the relative entropy distance between the old and the new distribution.
We discuss other measures of causal strength like the average causal effect, transfer entropy and information flow and describe their limitations. We argue that our measure is also more appropriate for time series than the known ones.
Finally, we discuss conceptual problems in defining the strength of indirect effects."
to:NB  to_read  causality  graphical_models  information_theory  statistics  via:ded-maxim 
8 weeks ago by cshalizi
« earlier      

related tags

20th_century_history  academia  accuracy_vs_precision  adamic.lada  adams.terrence  additive_models  agent-based_models  ai  algorithmic_information_theory  allometric_scaling  america  american_history  american_south  analysis  analysis_of_variance  anderson.chris  anderson.norm  anomaly_detection  antidepressants  approximate_bayesian_computation  approximation  archaeology  arlot.sylvain  artificial_intelligence  astronomy  asymptotics  attractor_reconstruction  autism  automata_theory  automated_diagnosis  ay.nihat  bacanu.silviu-alin  bad_data_analysis  bad_science  bad_science_journalism  barron.andrew  bartlett.m.s.  base_rates  bayesianism  bayesian_consistency  bayesian_nonparametrics  bayes_rule  beirl.wolfgang  beran.jan  bergstrom.carl  berk.richard  berk.robert_h  bernstein-von-mises  bernstein-von_mises  biau.gerard  bibliometry  bickel.david  bickel.peter  biochemical_networks  bioinformatics  blanchard.gilles  blattman.chris  blei.david  blitzstein.joseph  blogged  blogging  blogs  books:noted  books:recommended  book_reviews  boosting  bootstrap  boris  boucheron.stephane  bousquet.olivier  branching_processes  breiman.leo  brillinger.david  brown.emery  brown.lawrence  buhlmann.peter  buntine.wray  burke.timothy  busy_busy_busy  cai.t._tony  calibration  CART  cartoons  caruana.rich  categorical_data  category_theory  catoni.olivier  cats  causality  causal_inference  cavalli-sforza  celisse.alain  central_limit_theorem  cesa-bianchi.nicolo  change-point_problem  change_of_representation  chaos  chatterjee.souav  chow-liu_trees  citation_networks  clarke.kevin  classifiers  clermont.gilles  climate_change  clustering  coarse-graining  coates.ta-nehisi  cognitive_science  collaborative_filtering  collective_cognition  collinearity  community_discovery  comparative_methods  complexity  compressed_sensing  computability  computational_complexity  computational_statistics  concentration_of_measure  conferences  confidence_sets  confirmation_bias  confounding  congress  consistency  contagion  content_analysis  context-free_grammars  contingency_tables  convexity  convex_sets  cool_if_true  copulas  corporations  cosmology  covariance  coveted  cox.david_r.  cramer-rao  crime  cross-validation  cultural_evolution  curse_of_dimensionality  curse_of_dimensonality  curve-estimation  curve_fitting  damouras.sotirios  das.kaustav  dasgupta.anirban  databases  dataset_shift  data_analysis  data_mining  data_sets  dawid.a.p.  dawid.philip  debunking  deceiving_us_has_become_an_industrial_process  decision-making  decision_theory  decision_trees  default_priors  degrees_of_freedom  delong.brad  del_moral.pierre  density_estimation  density_ratio_estimation  design_for_a_brain  development_policy  deviation_bounds  deviation_inequalities  devlin.bernie  devroye.luc  dewitt.helen  diaconis.persi  didelez.vanessa  differential_equations  differential_geometry  dimension_estimation  dimension_reduction  directed_information  discretization  distance_covariance  distributed_systems  distributions  diversity  donoho.david  douc.randal  dsges  dsm  dsquared  dudoit.sandrine  dynamical_systems  dynamical_systemss  dynamics_in_cognition  earthquakes  eckles.dean  ecology  econometrics  economics  economics_of_superstars  economic_history  education  EEG  ellenberg.jordan  empirical_likelihood  empirical_processes  em_algorithm  encompassing  ensemble_methods  entropy  entropy_estimation  epidemic_models  epidemiology  epistemology  ergodic_decomposition  ergodic_theory  error-in-variables  error_in_variables  error_statistics  estimation  estimation_of_dynamical_systems  events  evidence  evisceration  evolutionary_biology  evolutionary_optimization  evolutionary_psychology  exchangeable_arrays  exchangeable_sequences  expectation-maximization  experimental_design  experimental_psychology  experimental_sociology  experiments  exponential_families  exponential_family_random_graphs  factor_analysis  fact_checking  fan.jianqing  fast-and-frugal_heuristics  feature_selection  feedback  fienberg.steve  filtering  finance  financial_markets  financial_speculation  fingerprints  fink.daniel  fisher.r.a.  fisher_information  fleuret.francois  flocks_and_swarms  fluctuation-response  fmri  food_webs  foundations_of_statistics  fourier_analysis  fox.emily  fractals  franklin.charles  fraser.d.a.s.  fraud  freedman.david  freedman.david_a  freeman.peter  functional_connectivity  functional_data  functional_data_analysis  funny:academic  funny:geeky  funny:malicious  funny:unintentionally  game_theory  gaussian_processes  gelman.andrew  geman.donald  generalized_linear_models  genetics  gene_expression_data_analysis  genomics  genomic_control  genovese.chris  genovese.christopher  geology  geometry  geometry_of_statistical_inference  getoor.lise  geyer.charles  gibbs_distributions  gibrats_law  gigs  gini_coefficient  gives_economists_a_bad_name  glymour.clark  gneiting.tilmann  goerg.georg_m.  good-turing_estimation  good.i.j.  goodness-of-fit  good_causes  grade_inflation  grading  grammar_induction  grants  graphical_models  graph_limits  great_depression  green.peter_j.  grunwald.peter  gustafson.paul  guttorp.peter  guyon.isabelle  haavelmo.trygve  hacking.ian  handcock.mark  hansen.bruce  hansen.christian  hardle.wolfgang  harrison.matt  hart.jeffrey  haslinger.rob  have_read  healy.kieran  heard  heard_the_talk  heavy_tails  hendry.david  heritability  heteroskedasticity  heuristics  hierarchical_models  hierarchical_structure  high-dimensional_probability  high-dimensional_statistics  hilbert_space  hill.jennifer  historical_linguistics  history_of_economics  history_of_ideas  history_of_mathematics  history_of_science  history_of_statistics  hjort.nils_lid  hodrick-prescott_filter  hoeffdings_inequality  hoff.peter  hofling.holger  holmes.susan  hooker.giles  hopcroft.john  hoyer.patrik  huber.peter  huff.darrell  human_genetics  hypothesis_testing  iacus.stefano  identifiability  independence_testing  independent_components_analysis  independent_component_analysis  indirect_inference  induction  industrial_organization  inequalities  inequality  inference_to_latent_objects  influence  information_criteria  information_geometry  information_retrieval  information_theory  institutions  instrumental_variables  interview  intro_stats  inverse_problems  in_NB  iran  ising_model  i_see_what_you_did_there  jakulin.aleks  jiang.wenxin  jordan.michael_i.  kafadar.karen  kalisch.markus  kalman_filter  karhunen-loeve_decomposition  karl  kass.rob  kelly.kevin_t.  kempthorne.oscar  kernel_estimators  kernel_methods  king.gary  kirshner.sergey  kith_and_kin  klein.ezra  kleinberg.jon  klemens.ben  kolaczyk.eric  kolmogorov-smirnov-test  kontorovich.aryeh  kontoyiannis.ioannis  krijnen.wim  krivitsky.pavel  kronecker_graphs  lafferty.john  lagrange_multipliers  landauers_principle  lane.david  lang.kevin  langford.john  laplace_approximation  large_deviations  lasso  latent_dirichlet_allocation  latent_semantic_analysis  latent_variables  lauritzen.steffen  law  law_of_the_iterated_logarithm  lead  leamer.ed  learning_in_games  learning_theory  lebanon.guy  lebaron.blake  lee.ann  lee.ann_b.  leeb.hannes  lehmann.erich  lei.jing  levina.elizaveta  levina.liza  levitt.steven  levy_processes  le_cam.lucien  liberman.mark  lie_detection  likelihood  likelihood_ratio_tests  linear_algebra  linear_regression  linguistics  literary_criticism  liu.han  liu.richard  lives_of_the_scientists  logic  logical_positivism  logistic_regression  lolcats  long-memory_processes  long-range_dependence  low-rank_approximation  low-regret-learning  luca.diana  lugosi.gabor  luxburg.ulrike_von  machine_learning  machine_translation  macroeconomics  macro_from_micro  mandelbrot.benoit  manifold_learning  markov_models  martingales  massart.pascal  matching  mathematical_logic  maximum_entropy  mayo.deborah  mccloskey.deirdre  mean-field_theory  measure_theory  medicine  medieval_european_history  meier.lukas  mental_testing  meta-analysis  methodological_advice  methodology  method_of_moments  method_of_sieves  meyn.sean_p.  minimax  minimum_description_length  mis-specification_testing  missing_data  misspecification  mixing  mixture_models  mizon.grayham  model-checking  modeling  model_averaging  model_checking  model_search  model_selection  model_uncertainty  modularity  monte_carlo  morley.james  morvai.gusztav  moulines.eric  multiple_comparisons  multiple_testing  murray.charles  nadler.boaz  nardi.yuval  natural_history_of_truthiness  natural_language_processing  neal.radford  nearest_neighbors  networks  network_data_analysis  network_formation  network_sampling  neural_coding_and_decoding  neural_data_analysis  neural_modeling  neural_networks  neuroscience  neutral_models  neville.jennifer  neyman-pearson_lemma  neyman.jerzy  neyman_smooth_tests  nielsen.michael  nilsson_jacobi.martin  nobel.andrew  noel.hans  nolan.deborah  nominate  non-equilibrium  non-stationarity  nonparametrics  norvig.peter  nuisance_parameters  nyhan.brendan  obesity  obituaries  obvious_to_one_skilled_in_the_art  occams_razor  occupy_wall_street  official_statistics  online_learning  optimization  oracle_inequalities  order_statistics  ordinal_data  outliers  owen.art  p-values  pac-bayesian  paper_writing  partial_identification  particle_detectors  particle_filters  particle_physics  parzen.emanuel  pattern_recognition  pattison.philippa  pearl.judea  pearson  penn.mark  percival.daniel  perl  perturbation_theory  phase_transitions  philosophy_of_mind  philosophy_of_science  physics  physics_of_information  pillai.natesh  pittsburgh  please_give_me_strength  point_processes  political_economy  political_science  pollard.david  polling  popular_social_science  porter.mason  poverty  practices_relating_to_the_transmission_of_genetic_information  pre-validation  prediction  prediction_trees  prequentialism  principal_components  privacy  probability  programming  projection  proof_theory  propensity_scores  psychology  psychometrics  public_health  r  racine.jeffrey  racism  racist_idiocy  raginsky.maxim  randomization  random_fields  random_forests  random_matrices  random_matrix_theory  random_time_changes  rare_events  rational_expectations  rauchway.eric  ravikumar.pradeep  re:almost_none  re:AoS_project  re:bayes_as_evol  re:functional_communities  re:growing_ensemble_project  re:g_paper  re:homophily_and_confounding  re:knightian_uncertainty  re:LICORS  re:LoB_project  re:model_selection_for_networks  re:naive-semi-supervised  re:network_differences  re:network_model_selection  re:neutral_model_of_inquiry  re:phil-of-bayes_paper  re:smoothing_adjacency_matrices  re:social-networks-as-sensor-networks  re:stacs  re:what_is_the_right_null_model_for_linear_regression  re:XV_for_mixing  re:XV_for_networks  re:your_favorite_dsge_sucks  re:your_favorite_ergm_sucks  red_state_blue_state  regression  reinforcement_learning  relational_learning  renyi_entropy  replicator_dynamics  resampling  review_papers  rhetoric  richard.jean-francois  richards.joey  riedewald.mirek  rigollet.philippe  rinaldo.alessandro  ripley.brian  risk_assessment  risk_vs_uncertainty  robins.james  robustness  robust_statistics  roeder.kathryn  rosenblatt.murray  rosvall.martin  rubin.jonathan  running_dogs_of_reaction  ryabko.daniil  saddle-point_approximation  salakhutdinov.ruslan  salmon  sampling  sandler.mark  sardinia  sarkar.purnamrita  sarwate.anand  savage.leonard_j.  schafer.chad  schofield.lynne  science_journalism  scooped  scoring_rules  search_engines  selection_bias  self-centered  self-promotion  self-similarity  semantics  semi-supervised_learning  sensitive_dependence_on_initial_conditions  sequential_monte_carlo  series_of_footnotes  sethna.james  shanteau.james  sheu.chyong-hwa  shot_after_a_fair_trial  shrinkage  signal_processing  silver.nathan  simulation  simulation-based_inference  singular_value_decomposition  sleep  smoothing  snijders.tom  sober.elliott  social_life_of_the_mind  social_media  social_networks  social_neuroscience  social_science_methodology  sociology  sociology_of_science  software  sornette.didier  sorokina.daria  spanos.aris  sparsity  spatial_statistics  spectral_clustering  spectral_estimation  spectral_methods  speed.terry  splines  sports  stability_of_learning  standardized_testing  stanley.h._eugene  stark.philip  state-space_models  state-space_reconstruction  state_estimation  stationarity  stationary_features  statistical_inference_for_stochastic_processes  statistical_interaction  statistical_mechanics  statistics  stein.charles  steins_method  stepping_stone_model  stigler.stephen  stochastic_approximation  stochastic_differential_equations  stochastic_models  stochastic_processes  stochastic_volatility  structural_equations  structural_risk_minimization  studentization  sufficiency  sugiyama.masashi  summer_schools  superefficiency  supervenience  support_vector_machines  survival_analysis  systems_identification  taskar.ben  teaching  television  tetrad  text_mining  theory_of_the_novel  the_american_dilemma  thomas.andrew  tibshirani.robert  tibshirani.ryan  time_rescaling  time_series  to:blog  to:NB  topic_models  to_be_shot_after_a_fair_trial  to_read  to_teach  to_teach:advanced-stochastic-processes  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:statcomp  to_teach:undergrad-ADA  to_teach:undergrad-research  track_down_references  turbulence  tutorials  two-sample_tests  unemployment  universal_prediction  us_politics  utter_stupidity  value_of_information  van_der_maas.h.l.j.  van_der_vaart.aad  van_de_geer.sara  van_handel.ramon  van_roy.benjamin  vapnik.v.n.  variable_selection  variance_components  variance_estimation  variational_inference  variational_methods  vc-dimension  ventura.valerie  verdinelli.isa  verzani.john  via:?  via:aaronsw  via:aaron_clauset  via:abbas-raza  via:ale  via:ariddell  via:arsyed  via:arthegall  via:chl  via:crooked_timber  via:deaneckles  via:ded-maxim  via:dpfeldman  via:erindanielson  via:flaxman  via:gelman  via:guslacerda  via:idiolect  via:jhofman  via:john-burke  via:judea_pearl  via:justin  via:kass  via:kathryn  via:klk  via:larry  via:logista  via:martens  via:matthew_berryman  via:mejn  via:moritz-heene  via:mreid  via:nielsen  via:nikete  via:rocha  via:ryan_t  via:scotte  via:shivak  via:slaniel  via:stodden  via:students  via:the_author  via:unfogged  via:vqv  via:wikipedia  violence  visual_display_of_quantitative_information  von_mises.richard  vovk.vladimir_g.  vu.vincent  wahba.grace  wainright.martin  wainwright.martin  wald.abraham  war  wasserman.larry  wavelets  weak_dependence  weaver.rhiannon  weiss.benjamin  wermuth.nanny  whats_gone_wrong_with_america  wheels:reinvention_of  why_oh_why_cant_we_have_a_better_press_corps  wiener-khinchin  wiener.norbert  wilks.s._s.  willett.rebecca  williamson.robert  wolfowitz.j.  xing.eric  yajima.masano  yu.bin  zenker.sven  zhang.tong  zhu.ji  ziliak.stephen  zilsel.edgar 

Copy this bookmark:



description:


tags: