cshalizi + estimation   121

Constructing summary statistics for approximate Bayesian computation: semi-automatic approximate Bayesian computation - Fearnhead - 2012 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"Many modern statistical applications involve inference for complex stochastic models, where it is easy to simulate from the models, but impossible to calculate likelihoods. Approximate Bayesian computation (ABC) is a method of inference for such models. It replaces calculation of the likelihood by a step which involves simulating artificial data for different parameter values, and comparing summary statistics of the simulated data with summary statistics of the observed data. Here we show how to construct appropriate summary statistics for ABC in a semi-automatic manner. We aim for summary statistics which will enable inference about certain parameters of interest to be as accurate as possible. Theoretical results show that optimal summary statistics are the posterior means of the parameters. Although these cannot be calculated analytically, we use an extra stage of simulation to estimate how the posterior means vary as a function of the data; and we then use these estimates of our summary statistics within ABC. Empirical results show that our approach is a robust method for choosing summary statistics that can result in substantially more accurate ABC analyses than the ad hoc choices of summary statistics that have been proposed in the literature. We also demonstrate advantages over two alternative methods of simulation-based inference."
to:NB  indirect_inference  estimation  statistics  approximate_bayesian_computation  computational_statistics  to_teach:complexity-and-inference  re:stacs 
13 days ago by cshalizi
[1205.1828] The Natural Gradient by Analogy to Signal Whitening, and Recipes and Tricks for its Use
"The natural gradient allows for more efficient gradient descent by removing dependencies and biases inherent in a function's parameterization. Several papers present the topic thoroughly and precisely. It remains a very difficult idea to get your head around however. The intent of this note is to provide simple intuition for the natural gradient and its use. We review how an ill conditioned parameter space can undermine learning, introduce the natural gradient by analogy to the more widely understood concept of signal whitening, and present tricks and specific prescriptions for applying the natural gradient to learning problems."

Does this ever mention the phrase "Fisher information"?
to:NB  optimization  statistics  estimation  fisher_information  information_geometry 
18 days ago by cshalizi
[1201.5871] Null models for network data
"The analysis of datasets taking the form of simple, undirected graphs continues to gain in importance across a variety of disciplines. Two choices of null model, the logistic-linear model and the implicit log-linear model, have come into common use for analyzing such network data, in part because each accounts for the heterogeneity of network node degrees typically observed in practice. Here we show how these both may be viewed as instances of a broader class of null models, with the property that all members of this class give rise to essentially the same likelihood-based estimates of link probabilities in sparse graph regimes. This facilitates likelihood-based computation and inference, and enables practitioners to choose the most appropriate null model from this family based on application context. Comparative model fits for a variety of network datasets demonstrate the practical implications of our results."
in_NB  network_data_analysis  have_read  statistics  estimation  approximation  re:smoothing_adjacency_matrices 
5 weeks ago by cshalizi
Xiao , Wu : Covariance matrix estimation for stationary time series
"We obtain a sharp convergence rate for banded covariance matrix estimates of stationary processes. A precise order of magnitude is derived for spectral radius of sample covariance matrices. We also consider a thresholded covariance matrix estimator that can better characterize sparsity if the true covariance matrix is sparse. As our main tool, we implement Toeplitz [Math. Ann. 70 (1911) 351–376] idea and relate eigenvalues of covariance matrices to the spectral densities or Fourier transforms of the covariances. We develop a large deviation result for quadratic forms of stationary processes using m-dependence approximation, under the framework of causal representation and physical dependence measures."
to:NB  time_series  statistics  estimation  variance_estimation 
6 weeks ago by cshalizi
[1204.2763] A Cram'er-Rao inequality for non differentiable models
"We compute a variance lower bound for unbiased estimators in specified statistical models. The construction of the bound is related to the original Cram'er-Rao bound, although it does not require the differentiability of the model. Moreover, we show our efficiency bound to be always greater than the Cram'er-Rao bound in smooth models, thus providing a sharper result."
to:NB  cramer-rao  statistics  estimation  information_geometry 
6 weeks ago by cshalizi
[1203.0683] A Method of Moments for Mixture Models and Hidden Markov Models
"Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high computational and sample complexity which typically scale exponentially with the number of mixture components. This work develops an efficient method of moments approach to parameter estimation for a broad class of high-dimensional mixture models with many components, including multi-view mixtures of Gaussians (such as mixtures of axis-aligned Gaussians) and hidden Markov models. The new method leads to rigorous unsupervised learning results for mixture models that were not achieved by previous works; and, because of its simplicity, it also constitutes a viable alternative to EM for practical deployment."

Clever: some mixture models can be characterized by expectations, covariances, and third-order mixed moments, so you just need to estimate tensors up to third order, and not very high moments of vectors (which are very noisy) and do some linear algebra. I should probably re-read because I couldn't reproduce this at the board.
in_NB  statistics  estimation  mixture_models  markov_models  state-space_models  have_read 
7 weeks ago by cshalizi
[1203.5181] $k$-MLE: A fast algorithm for learning statistical mixture models
"We describe $k$-MLE, a fast and efficient local search algorithm for learning finite statistical mixtures of exponential families such as Gaussian mixture models. Mixture models are traditionally learned using the expectation-maximization (EM) soft clustering technique that monotonically increases the incomplete (expected complete) likelihood. Given prescribed mixture weights, the hard clustering $k$-MLE algorithm iteratively assigns data to the most likely weighted component and update the component models using Maximum Likelihood Estimators (MLEs). Using the duality between exponential families and Bregman divergences, we prove that the local convergence of the complete likelihood of $k$-MLE follows directly from the convergence of a dual additively weighted Bregman hard clustering. The inner loop of $k$-MLE can be implemented using any $k$-means heuristic like the celebrated Lloyd's batched or Hartigan's greedy swap updates. We then show how to update the mixture weights by minimizing a cross-entropy criterion that implies to update weights by taking the relative proportion of cluster points, and reiterate the mixture parameter update and mixture weight update processes until convergence. Hard EM is interpreted as a special case of $k$-MLE when both the component update and the weight update are performed successively in the inner loop. To initialize $k$-MLE, we propose $k$-MLE++, a careful initialization of $k$-MLE guaranteeing probabilistically a global bound on the best possible complete likelihood."
in_NB  em_algorithm  mixture_models  statistics  machine_learning  clustering  estimation 
9 weeks ago by cshalizi
Cai : Minimax and Adaptive Inference in Nonparametric Function Estimation
"Since Stein’s 1956 seminal paper, shrinkage has played a fundamental role in both parametric and nonparametric inference. This article discusses minimaxity and adaptive minimaxity in nonparametric function estimation. Three interrelated problems, function estimation under global integrated squared error, estimation under pointwise squared error, and nonparametric confidence intervals, are considered. Shrinkage is pivotal in the development of both the minimax theory and the adaptation theory.
"While the three problems are closely connected and the minimax theories bear some similarities, the adaptation theories are strikingly different. For example, in a sharp contrast to adaptive point estimation, in many common settings there do not exist nonparametric confidence intervals that adapt to the unknown smoothness of the underlying function. A concise account of these theories is given. The connections as well as differences among these problems are discussed and illustrated through examples."
in_NB  statistics  estimation  regression  confidence_sets  nonparametrics  shrinkage  cai.t._tony  minimax 
10 weeks ago by cshalizi
Richards , Lee , Schafer , Freeman : Prototype selection for parameter estimation in complex models
"Parameter estimation in astrophysics often requires the use of complex physical models. In this paper we study the problem of estimating the parameters that describe star formation history (SFH) in galaxies. Here, high-dimensional spectral data from galaxies are appropriately modeled as linear combinations of physical components, called simple stellar populations (SSPs), plus some nonlinear distortions. Theoretical data for each SSP is produced for a fixed parameter vector via computer modeling. Though the parameters that define each SSP are continuous, optimizing the signal model over a large set of SSPs on a fine parameter grid is computationally infeasible and inefficient. The goal of this study is to estimate the set of parameters that describes the SFH of each galaxy. These target parameters, such as the average ages and chemical compositions of the galaxy’s stellar populations, are derived from the SSP parameters and the component weights in the signal model. Here, we introduce a principled approach of choosing a small basis of SSP prototypes for SFH parameter estimation. The basic idea is to quantize the vector space and effective support of the model components. In addition to greater computational efficiency, we achieve better estimates of the SFH target parameters. In simulations, our proposed quantization method obtains a substantial improvement in estimating the target parameters over the common method of employing a parameter grid. Sparse coding techniques are not appropriate for this problem without proper constraints, while constrained sparse coding methods perform poorly for parameter estimation because their objective is signal reconstruction, not estimation of the target parameters."
to:NB  to_read  statistics  estimation  astronomy  kith_and_kin  lee.ann_b.  schafer.chad  richards.joey  freeman.peter 
11 weeks ago by cshalizi
Fan , Liao , Mincheva : High-dimensional covariance matrix estimation in approximate factor models
"The variance–covariance matrix plays a central role in the inferential theories of high-dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu [J. Amer. Statist. Assoc. 106 (2011) 672–684], taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied."
to:NB  statistics  estimation  factor_analysis  sparsity 
12 weeks ago by cshalizi
[1202.5183] Asymptotic normality and valid inference for Gaussian variational approximation
"We derive the precise asymptotic distributional behavior of Gaussian variational approximate estimators of the parameters in a single-predictor Poisson mixed model. These results are the deepest yet obtained concerning the statistical properties of a variational approximation method. Moreover, they give rise to asymptotically valid statistical inference. A simulation study demonstrates that Gaussian variational approximate confidence intervals possess good to excellent coverage properties, and have a similar precision to their exact likelihood counterparts."
to:NB  statistics  estimation  variational_inference 
12 weeks ago by cshalizi
Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics
"We consider the task of estimating, from observed data, a probabilistic model that is parameterized by a finite number of parameters. In particular, we are considering the situation where the model probability density function is unnormalized. That is, the model is only specified up to the partition function. The partition function normalizes a model so that it integrates to one for any choice of the parameters. However, it is often impossible to obtain it in closed form. Gibbs distributions, Markov and multi-layer networks are examples of models where analytical normalization is often impossible. Maximum likelihood estimation can then not be used without resorting to numerical approximations which are often computationally expensive. We propose here a new objective function for the estimation of both normalized and unnormalized models. The basic idea is to perform nonlinear logistic regression to discriminate between the observed data and some artificially generated noise. With this approach, the normalizing partition function can be estimated like any other parameter. We prove that the new estimation method leads to a consistent (convergent) estimator of the parameters. For large noise sample sizes, the new estimator is furthermore shown to behave like the maximum likelihood estimator. In the estimation of unnormalized models, there is a trade-off between statistical and computational performance. We show that the new method strikes a competitive trade-off in comparison to other estimation methods for unnormalized models. As an application to real data, we estimate novel two-layer models of natural image statistics with spline nonlinearities."
to:NB  statistics  estimation 
12 weeks ago by cshalizi
Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming
"Sparse additive models are families of d-variate functions with the additive decomposition f* = ∑j ∈ S f*j, where S is an unknown subset of cardinality s << d. In this paper, we consider the case where each univariate component function f*j lies in a reproducing kernel Hilbert space (RKHS), and analyze a method for estimating the unknown function f* based on kernels combined with l1-type convex regularization. Working within a high-dimensional framework that allows both the dimension d and sparsity s to increase with n, we derive convergence rates in the L2(P) and L2(Pn) norms over the class Fd,s,H of sparse additive models with each univariate function f*j in the unit ball of a univariate RKHS with bounded kernel function. We complement our upper bounds by deriving minimax lower bounds on the L2(P) error, thereby showing the optimality of our method. Thus, we obtain optimal minimax rates for many interesting classes of sparse additive models, including polynomials, splines, and Sobolev classes. We also show that if, in contrast to our univariate conditions, the d-variate function class is assumed to be globally bounded, then much faster estimation rates are possible for any sparsity s = Ω(√n), showing that global boundedness is a significant restriction in the high-dimensional setting."
to:NB  statistics  regression  estimation  minimax  additive_models  sparsity  wainright.martin  yu.bin 
12 weeks ago by cshalizi
[0805.3040] Higher order influence functions and minimax estimation of nonlinear functionals
"We present a theory of point and interval estimation for nonlinear functionals in parametric, semi-, and non-parametric models based on higher order influence functions (Robins (2004), Section 9; Li et al. (2004), Tchetgen et al. (2006), Robins et al. (2007)). Higher order influence functions are higher order U-statistics. Our theory extends the first order semiparametric theory of Bickel et al. (1993) and van der Vaart (1991) by incorporating the theory of higher order scores considered by Pfanzagl (1990), Small and McLeish (1994) and Lindsay and Waterman (1996). The theory reproduces many previous results, produces new non-$sqrt{n}$ results, and opens up the ability to perform optimal non-$sqrt{n}$ inference in complex high dimensional models. We present novel rate-optimal point and interval estimators for various functionals of central importance to biostatistics in settings in which estimation at the expected $sqrt{n}$ rate is not possible, owing to the curse of dimensionality. We also show that our higher order influence functions have a multi-robustness property that extends the double robustness property of first order influence functions described by Robins and Rotnitzky (2001) and van der Laan and Robins (2003)."
to:NB  statistics  robins.james  van_der_vaart.aad  estimation 
12 weeks ago by cshalizi
Okabayashi , Geyer : Long range search for maximum likelihood in exponential families
"Exponential families are often used to model data sets with complex dependence. Maximum likelihood estimators (MLE) can be difficult to estimate when the likelihood is expensive to compute. Markov chain Monte Carlo (MCMC) methods based on the MCMC-MLE algorithm in [17] are guaranteed to converge in theory under certain conditions when starting from any value, but in practice such an algorithm may labor to converge when given a poor starting value. We present a simple line search algorithm to find the MLE of a regular exponential family when the MLE exists and is unique. The algorithm can be started from any initial value and avoids the trial and error experimentation associated with calibrating algorithms like stochastic approximation. Unlike many optimization algorithms, this approach utilizes first derivative information only, evaluating neither the likelihood function itself nor derivatives of higher order than first. We show convergence of the algorithm for the case where the gradient can be calculated exactly. When it cannot, it has a particularly convenient form that is easily estimable with MCMC, making the algorithm still useful to a practitioner."
to:NB  statistics  exponential_families  exponential_family_random_graphs  network_data_analysis  estimation  monte_carlo  optimization  geyer.charles 
february 2012 by cshalizi
[1201.1980] Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter
"Statistical models that include random effects are commonly used to analyze longitudinal and correlated data, often with strong and parametric assumptions about the random effects distribution. There is marked disagreement in the literature as to whether such parametric assumptions are important or innocuous. In the context of generalized linear mixed models used to analyze clustered or longitudinal data, we examine the impact of random effects distribution misspecification on a variety of inferences, including prediction, inference about covariate effects, prediction of random effects and estimation of random effects variances. We describe examples, theoretical calculations and simulations to elucidate situations in which the specification is and is not important. A key conclusion is the large degree of robustness of maximum likelihood for a wide variety of commonly encountered situations."
to:NB  regression  statistics  estimation  hierarchical_models 
january 2012 by cshalizi
[0805.4136] Inference for the dark energy equation of state using Type IA supernova data
"The surprising discovery of an accelerating universe led cosmologists to posit the existence of "dark energy"--a mysterious energy field that permeates the universe. Understanding dark energy has become the central problem of modern cosmology. After describing the scientific background in depth, we formulate the task as a nonlinear inverse problem that expresses the comoving distance function in terms of the dark-energy equation of state. We present two classes of methods for making sharp statistical inferences about the equation of state from observations of Type Ia Supernovae (SNe). First, we derive a technique for testing hypotheses about the equation of state that requires no assumptions about its form and can distinguish among competing theories. Second, we present a framework for computing parametric and nonparametric estimators of the equation of state, with an associated assessment of uncertainty. Using our approach, we evaluate the strength of statistical evidence for various competing models of dark energy. Consistent with current studies, we find that with the available Type Ia SNe data, it is not possible to distinguish statistically among popular dark-energy models, and that, in particular, there is no support in the data for rejecting a cosmological constant. With much more supernova data likely to be available in coming years (e.g., from the DOE/NASA Joint Dark Energy Mission), we address the more interesting question of whether future data sets will have sufficient resolution to distinguish among competing theories."

--- I am biased, because Chris G. and Larry are friends, but this seems to me a model of the modern applied statistics paper: use interesting statistical tools to say something helpful about an important scientific problem on its own terms, rather than distorting the problem until it "looks like a nail".
in_NB  kith_and_kin  cosmology  astronomy  inverse_problems  nonparametrics  estimation  hypothesis_testing  statistics  bootstrap  genovese.christopher  wasserman.larry  have_read 
january 2012 by cshalizi
[1112.3914] Robust empirical mean Estimators
"We study robust estimators of the mean of a probability measure $P$, called robust empirical mean estimators. This elementary construction is then used to revisit a problem of aggregation and a problem of estimator selection, extending these methods to not necessarily bounded collections of previous estimators.
We consider then the problem of robust $M$-estimation. We propose a slightly more complicated construction to handle this problem and, as examples of applications, we apply our general approach to least-squares density estimation, to density estimation with K"ullback loss and to a non-Gaussian, unbounded, random design and heteroscedastic regression problem.
Finally, we show that our strategy can be used when the data are only assumed to be mixing."
in_NB  to_read  statistics  estimation  statistical_inference_for_stochastic_processes 
december 2011 by cshalizi
[1112.0840] On the question of effective sample size in network modeling
"We raise the issue of effective sample size in network graph modeling and inference and illustrate, using simple models and arguments, how this issue can quickly become nontrivial."
in_NB  network_data_analysis  have_read  estimation  statistics  fisher_information  exponential_family_random_graphs  kolaczyk.eric  krivitsky.pavel 
december 2011 by cshalizi
Prediction-based regularization using data augmented regression - Statistics and Computing, Volume 22, Number 1
"The role of regularization is to control fitted model complexity and variance by penalizing (or constraining) models to be in an area of model space that is deemed reasonable, thus facilitating good predictive performance. This is typically achieved by penalizing a parametric or non-parametric representation of the model. In this paper we advocate instead the use of prior knowledge or expectations about the predictions of models for regularization. This has the twofold advantage of allowing a more intuitive interpretation of penalties and priors and explicitly controlling model extrapolation into relevant regions of the feature space. This second point is especially critical in high-dimensional modeling situations, where the curse of dimensionality implies that new prediction points usually require extrapolation. We demonstrate that prediction-based regularization can, in many cases, be stochastically implemented by simply augmenting the dataset with Monte Carlo pseudo-data. We investigate the range of applicability of this implementation. An asymptotic analysis of the performance of Data Augmented Regression (DAR) in parametric and non-parametric linear regression, and in nearest neighbor regression, clarifies the regularizing behavior of DAR. We apply DAR to simulated and real data, and show that it is able to control the variance of extrapolation, while maintaining, and often improving, predictive accuracy."
in_NB  to_read  statistics  prediction  estimation  hooker.giles  regression  to_teach:undergrad-ADA  to_teach:data-mining  curse_of_dimensionality 
december 2011 by cshalizi
Phys. Rev. E 84, 056214 (2011): State and parameter estimation using unconstrained optimization
"We present an efficient method for estimating variables and parameters of a given system of ordinary differential equations by adapting the model output to an observed time series from the (physical) process described by the model. The proposed method is based on (unconstrained) nonlinear optimization exploiting the particular structure of the relevant cost function. To illustrate the features and performance of the method, simulations are presented using chaotic time series generated by the Colpitts oscillator, the three-dimensional Hindmarsh-Rose neuron model, and a nine-dimensional extended Rössler system." --- Sounds like Hooker & Ramsay.
to:NB  dynamical_systems  statistics  time_series  estimation  statistical_inference_for_stochastic_processes 
november 2011 by cshalizi
Le Cam Made Simple: No-N Asymptotics
"If the log likelihood is approximately quadratic with constant Hessian, then the maximum likelihood estimator (MLE) is approximately normally distributed. No other assumptions are required. We do not need independent and identically distributed data. We do not need the law of large numbers (LLN) or the central limit theorem (CLT). We do not need sample size going to infinity or anything going to infinity.

The theory presented here is a combination of Le Cam style involving local asymptotic normality (LAN) and local asymptotic mixed normality (LAMN) and Cramér style involving derivatives and Fisher information. The main tool is convergence in law of the log likelihood function and its derivatives considered as random elements of a Polish space of continuous functions with the metric of uniform convergence on compact sets. We obtain results for both one-step-Newton estimators and Newton-iterated-to-convergence estimators."
in_NB  have_read  statistics  estimation  geyer.charles  via:ale 
november 2011 by cshalizi
[1111.3029] Parametric estimation. Finite sample theory
"The paper aims at reconsidering the famous Le Cam LAN theory. The main features of the approach which make it different from the classical one are: (1) the study is non-asymptotic, that is, the sample size is fixed and does not tend to infinity; (2) the parametric assumption is possibly misspecified and the underlying data distribution can lie beyond the given parametric family.
The main results include a large deviation bounds for the (quasi) maximum likelihood and the local quadratic majorization of the log-likelihood process. The latter yields a number of important corollaries for statistical inference: concentration, confidence and risk bounds, expansion of the maximum likelihood estimate, etc. All these corollaries are stated in a non-classical way admitting a model misspecification and finite samples. However, the classical asymptotic results including the efficiency bounds can be easily derived as corollaries of the obtained non-asymptotic statements. The general results are illustrated for the i.i.d. set-up as well as for generalized linear and median estimation. The results apply for any dimension of the parameter space and provide a quantitative lower bound on the sample size yielding the root-n accuracy."
in_NB  statistics  estimation 
november 2011 by cshalizi
[1111.3054] Consistency under Sampling of Exponential Random Graph Models
"The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focussing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general ones about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses."
in_NB  self-promotion  exponential_family_random_graphs  exponential_families  statistical_inference_for_stochastic_processes  statistics  network_data_analysis  re:your_favorite_ergm_sucks  estimation  large_deviations 
november 2011 by cshalizi
Bickel , Chen , Levina : The method of moments and degree distributions for network models
"Probability models on graphs are becoming increasingly important in many applications, but statistical tools for fitting such models are not yet well developed. Here we propose a general method of moments approach that can be used to fit a large class of probability models through empirical counts of certain patterns in a graph. We establish some general asymptotic properties of empirical graph moments and prove consistency of the estimates as the graph size grows for all ranges of the average degree including Omega(1). Additional results are obtained for the important special case of degree distributions."

After reading this, I note that they do not go through even one example of actually estimating anything. I think this is because the inversion from moments to graphons, while mathematically well-defined, is hellish to calculate (and probably very numerically unstable).
network_data_analysis  statistics  estimation  bickel.peter  levina.elizaveta  re:smoothing_adjacency_matrices  in_NB  have_read 
november 2011 by cshalizi
[1111.1120] Parametric inference for stochastic differential equations: a smooth and match approach
"We study the problem of parameter estimation for a univariate discretely observed ergodic diffusion process given as a solution to a stochastic differential equation. The estimation procedure we propose consists of two steps. In the first step, which is referred to as a smoothing step, we smooth the data and construct a nonparametric estimator of the invariant density of the process. In the second step, which is referred to as a matching step, we exploit a characterisation of the invariant density as a solution of a certain ordinary differential equation, replace the invariant density in this equation by its nonparametric estimator from the smoothing step in order to arrive at an intuitively appealing criterion function, and next define our estimator of the parameter of interest as a minimiser of this criterion function. In many interesting examples such an estimator will be computationally less intense than the more conventional estimators obtained through approximation of the likelihood function associated with the observations. Our main result shows that our estimator is $sqrt{n}$-consistent under suitable conditions. We also discuss a way of improving its asymptotic performance through a one-step Newton-Raphson type procedure."
to:NB  statistical_inference_for_stochastic_processes  stochastic_differential_equations  ergodic_theory  nonparametrics  statistics  estimation 
november 2011 by cshalizi
Fraser : Is Bayes Posterior just Quick and Dirty Confidence?
Shorter Fraser: Yes. Yes it is.
Longer Fraser: "Bayes introduced the observed likelihood function to statistical inference and provided a weight function to calibrate the parameter; he also introduced a confidence distribution on the parameter space but did not provide present justifications. Of course the names likelihood and confidence did not appear until much later: Fisher for likelihood and Neyman for confidence. Lindley showed that the Bayes and the confidence results were different when the model was not location. This paper examines the occurrence of true statements from the Bayes approach and from the confidence approach, and shows that the proportion of true statements in the Bayes case depends critically on the presence of linearity in the model; and with departure from this linearity the Bayes approach can be a poor approximation and be seriously misleading. Bayesian integration of weighted likelihood thus provides a first-order linear approximation to confidence, but without linearity can give substantially incorrect results."
The responses are worth reading, especially, of course, Larry's.
in_NB  statistics  estimation  confidence_sets  bayesianism  fraser.d.a.s.  have_read 
october 2011 by cshalizi
Goerg : Lambert W random variables—a new family of generalized skewed distributions with applications to risk estimation
"Originating from a system theory and an input/output point of view, I introduce a new class of generalized distributions. A parametric nonlinear transformation converts a random variable X into a so-called Lambert W random variable Y, which allows a very flexible approach to model skewed data. Its shape depends on the shape of X and a skewness parameter γ. In particular, for symmetric X and nonzero γ the output Y is skewed. Its distribution and density function are particular variants of their input counterparts. Maximum likelihood and method of moments estimators are presented, and simulations show that in the symmetric case additional estimation of γ does not affect the quality of other parameter estimates. Applications in finance and biomedicine show the relevance of this class of distributions, which is particularly useful for slightly skewed data. A practical by-result of the Lambert W framework: data can be “unskewed.”

The R package LambertW developed by the author is publicly available (CRAN)."

I'm very proud.
to:NB  kith_and_kin  goerg.georg_m.  statistics  estimation  distributions  heavy_tails 
october 2011 by cshalizi
Variance estimation using refitted cross-validation in ultrahigh dimensional regression - Fan - 2011 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"Variance estimation is a fundamental problem in statistical modelling. In ultrahigh dimensional linear regression where the dimensionality is much larger than the sample size, traditional variance estimation techniques are not applicable. Recent advances in variable selection in ultrahigh dimensional linear regression make this problem accessible. One of the major problems in ultrahigh dimensional regression is the high spurious correlation between the unobserved realized noise and some of the predictors. As a result, the realized noises are actually predicted when extra irrelevant variables are selected, leading to a serious underestimate of the level of noise. We propose a two-stage refitted procedure via a data splitting technique, called refitted cross-validation, to attenuate the influence of irrelevant variables with high spurious correlations. Our asymptotic results show that the resulting procedure performs as well as the oracle estimator, which knows in advance the mean regression function. The simulation studies lend further support to our theoretical claims. The naive two-stage estimator and the plug-in one-stage estimators using the lasso and smoothly clipped absolute deviation are also studied and compared. Their performances can be improved by the refitted cross-validation method proposed."
statistics  regression  variable_selection  cross-validation  estimation  to:NB  fan.jianqing 
october 2011 by cshalizi
Maximum Kernel Likelihood Estimation - Journal of Computational and Graphical Statistics - 17(4):976
"We introduce an estimator for the population mean based on maximizing likelihoods formed by parameterizing a kernel density estimate. Due to these origins, we have dubbed the estimator the maximum kernel likelihood estimate (MKLE). A speedy computational method to compute the MKLE based on binning is implemented in a simulation study which shows that the MKLE at an optimal bandwidth is decidedly superior in terms of efficiency to the sample mean and other measures of location for heavy tailed symmetric distributions. An empirical rule and a computational method to estimate this optimal bandwidth are developed and used to construct bootstrap confidence intervals for the population mean. We show that the intervals have approximately nominal coverage and have significantly smaller average width than the standard t and z intervals. Finally, we develop some mathematical properties for a very close approximation to the MKLE called the kernel mean. In particular, we demonstrate that the kernel mean is indeed unbiased for the population mean for symmetric distributions."
statistics  estimation  kernel_estimators  to:NB  heavy_tails 
october 2011 by cshalizi
Estimating a Function from Ergodic Samples with Additive Noise [Nobel and Adams]
"We study the problem of estimating an unknown function from ergodic samples corrupted by additive noise. It is shown that one can consistently recover an unknown measurable function in this setting, if the one-dimensional (1-D) distribution of the samples is comparable to a known reference distribution, and the noise is independent of the samples and has known mixing rates. The estimates are applied to deterministic sampling schemes, in which successive samples are obtained by repeatedly applying a fixed map to a given initial vector, and it is then shown how the estimates can be used to reconstruct an ergodic transformation from one of its trajectories"
statistics  estimation  regression  ergodic_theory  via:ded-maxim  to:NB  re:your_favorite_dsge_sucks  dynamical_systems  state-space_reconstruction 
september 2011 by cshalizi
[1109.2397] Structured sparsity through convex optimization
"Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the $\ell_1$-norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the $\ell_1$-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to supervised learning in the context of non-linear variable selection, and to unsupervised learning, for structured sparse principal component analysis, and hierarchical dictionary learning."
sparsity  regression  statistics  estimation  convexity  machine_learning  to:NB 
september 2011 by cshalizi
Don Fraser’s rejoinder « Xi'an's Og
Do follow the links to the papers.  Shorter Fraser: except in very special and simple situations, Bayesian credible sets have demonstrably horrible coverage/confidence properties; that is, the probabilities attached to them do not tell you how often they really contain the true parameter values.  In fact, it scarcely seems to make sense to describe those numbers as "probabilities".  (I find Robert's response to Fraser's article extremely unconvincing, especially where it descends into pure aesthetics, e.g. saying that Bayes gives you an elegant and unified way of doing inference.  Well, so does referring all questions to the I Ching, but does it work?)
bayesianism  estimation  confidence_sets  statistics  in_NB 
august 2011 by cshalizi
Divergence in everything: Cramér-Rao from data processing « The Information Structuralist
Very nice.  Reminds me of Pitman's proof in _Basic Concepts_, but I need to go back and look at how that went.
fisher_information  statistics  information_theory  estimation  raginsky.maxim  to:blog 
july 2011 by cshalizi
[0812.0449] Locally adaptive estimation methods with application to univariate time series
"The paper offers a unified approach to the study of three locally adaptive estimation methods in the context of univariate time series from both theoretical and empirical points of view. A general procedure for the computation of critical values is given. The underlying model encompasses all distributions from the exponential family providing for great flexibility. The procedures are applied to simulated and real financial data distributed according to the Gaussian, volatility, Poisson, exponential and Bernoulli models. Numerical results exhibit a very reasonable performance of the methods."
time_series  statistics  estimation  exponential_families  non-stationarity  to:NB 
july 2011 by cshalizi
[1107.3811] Some thoughts on Le Cam's statistical decision theory
From 2000: "The paper contains some musings about the abstractions introduced by Lucien Le Cam into the asymptotic theory of statistical inference and decision theory. A short, self-contained proof of a key result (existence of randomizations via convergence in distribution of likelihood ratios), and an outline of a proof of a local asymptotic minimax theorem, are presented as an illustration of how Le Cam's approach leads to conceptual simplifications of asymptotic theory."
statistics  decision_theory  estimation  pollard.david  le_cam.lucien 
july 2011 by cshalizi
[1107.3806] Asymptotics for minimisers of convex processes
From 1993: "By means of two simple convexity arguments we are able to develop a general method for proving consistency and asymptotic normality of estimators that are defined by minimisation of convex criterion functions. This method is then applied to a fair range of different statistical estimation problems, including Cox regression, logistic and Poisson regression, least absolute deviation regression outside model conditions, and pseudo-likelihood estimation for Markov chains. Our paper has two aims. The first is to exposit the method itself, which in many cases, under reasonable regularity conditions, leads to new proofs that are simpler than the traditional proofs. Our second aim is to exploit the method to its limits for logistic regression and Cox regression, where we seek asymptotic results under as weak regularity conditions as possible. For Cox regression in particular we are able to weaken previously published regularity conditions substantially."
statistics  estimation  pollard.david  hjort.nils_lid  empirical_processes  have_read  in_NB 
july 2011 by cshalizi
[0711.3577] Transform martingale estimating functions
"An estimation method is proposed for a wide variety of discrete time stochastic processes that have an intractable likelihood function but are otherwise conveniently specified by an integral transform such as the characteristic function, the Laplace transform or the probability generating function..." Not sure how often I'll have such a specification, but OK...
estimation  stochastic_processes  statistical_inference_for_stochastic_processes  statistics  to:NB 
july 2011 by cshalizi
Seijo , Sen : A continuous mapping theorem for the smallest argmax functional
"This paper introduces a version of the argmax continuous mapping theorem that applies to M-estimation problems in which the objective functions converge to a limiting process with multiple maximizers. The concept of the smallest maximizer of a function in the d-dimensional Skorohod space is introduced and its main properties are studied. The resulting continuous mapping theorem is applied to three problems arising in change-point regression analysis. Some of the results proved in connection to the d-dimensional Skorohod space are also of independent interest."
statistics  estimation  empirical_processes 
may 2011 by cshalizi
A stable estimator of the information matrix under EM for dependent data
"This article develops a new and stable estimator for information matrix when the EM algorithm is used in maximum likelihood estimation. This estimator is constructed using the smoothed individual complete-data scores that are readily available from running the EM algorithm. The method works for dependent data sets and when the expectation step is an irregular function of the conditioning parameters."  (When I teach EM, I should say something about how to get uncertainty estimates...)
fisher_information  em_algorithm  estimation  statistics  to_teach:data-mining  to_teach:undergrad-ADA 
december 2010 by cshalizi
[1011.4328] Graphical Models Concepts in Compressed Sensing
"This paper surveys recent work in applying ideas from graphical models and message passing algorithms to solve large scale regularized regression problems. In particular, the focus is on compressed sensing reconstruction via $\ell_1$ penalized least-squares (known as LASSO or BPDN). We discuss how to derive fast approximate message passing algorithms to solve this problem. Surprisingly, the analysis of such algorithms allows to prove exact high-dimensional limit results for the LASSO risk. This paper will appear as a chapter in a book on ‘Compressed Sensing’ edited by Yonina Eldar and Gitta Kutyniok."
graphical_models  compressed_sensing  statistics  estimation 
november 2010 by cshalizi
Stochastic Composite Likelihood
"Maximum likelihood estimators are often of limited practical use due to the intensive computation they require. We propose a family of alternative estimators that maximize a stochastic variation of the composite likelihood function. Each of the estimators resolve the computation-accuracy tradeoff differently, and taken together they span a continuous spectrum of computation-accuracy tradeoff resolutions. We prove the consistency of the estimators, provide formulas for their asymptotic variance, statistical robustness, and computational complexity. We discuss experimental results in the context of Boltzmann machines and conditional random fields. The theoretical and experimental studies demonstrate the effectiveness of the estimators when the computational resources are insufficient. They also demonstrate that in some cases reduced computational complexity is associated with robustness thereby increasing statistical accuracy."
likelihood  estimation  statistics  lebanon.guy  to_read 
november 2010 by cshalizi
Approximate Bayesian Computation in Evolution and Ecology (Beaumont, 2010)
Heard the talk in Bristol. Naturally the idea makes a lot more sense when laid out by a proponent than by a hostile critic; it's actually very similar to indirect inference (only with a prior to add bias).
heard_the_talk  to_read  indirect_inference  approximate_bayesian_computation  statistics  estimation 
november 2010 by cshalizi
[1011.0175] A Comparison of Methods for Computing Autocorrelation Time
"This paper describes four methods for estimating autocorrelation time and evaluates these methods with a test set of seven series. Fitting an autoregressive process appears to be the most accurate method of the four. An R package is provided for extending the comparison to more methods and test series."
time_series  correlation_time  estimation  to_teach:complexity-and-inference 
november 2010 by cshalizi
[1010.2731] A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
"...models in which the the number of parameters $p$ is comparable to or larger than the sample size $n$. ... line of recent work has studied models with various types of low-dimensional structure (e.g., sparse vectors; block-structured matrices; low-rank matrices; Markov assumptions) ... a general approach to estimation is to solve a regularized convex program (known as a regularized $M$-estimator) ... loss function (measuring how well the model fits the data) with some regularization function that encourages the assumed structure. ... unified framework for ... consistency and convergence rates for such regularized $M$-estimators under high-dimensional scaling. ... main theorem ... can be used to re-derive some existing results, and ... obtain ... new results ... identifies two key properties of loss and regularization functions, ... restricted strong convexity and decomposability ... ensure ... fast convergence rates ... optimal in many ... cases."
estimation  sparsity  statistics 
october 2010 by cshalizi
"Frechet differentiability in statistical inference for time series" - Statistical Methods & Applications, Volume 19, Number 4
"It is shown how the method of Fréchet differentiability can simplify the asymptotic derivations in an important range of robust inferential problems for stationary and related time series models. The uniform root-n consistency of the empirical distribution function for the Cramer von Mises norm under a weak mixing condition is indicated. Various regularity conditions naturally implemented and leading to the differentiability are discussed. A simulation study supplementing the theoretical discussion is included."
asymptotics  time_series  statistical_inference_for_stochastic_processes  estimation  statistics  mixing 
october 2010 by cshalizi
"A note on the asymptotic behaviour of empirical likelihood statistics" - Statistical Methods & Applications, Volume 19, Number 4
"This paper develops some theoretical results about the asymptotic behaviour of the empirical likelihood and the empirical profile likelihood statistics, which originate from fairly general estimating functions. The results accommodate, within a unified framework, various situations potentially occurring in a wide range of applications. For this reason, they are potentially useful in several contexts, such as, for example, in inference for dependent data. We provide examples showing that known findings in literature about the asymptotic behaviour of some empirical likelihood statistics in time series models can be derived as particular cases of our results."
empirical_likelihood  asymptotics  statistics  estimation  likelihood  statistical_inference_for_stochastic_processes 
october 2010 by cshalizi
[1010.2286] Divergence-based characterization of fundamental limitations of adaptive dynamical systems
"general problem of adaptively controlling and/or identifying a stochastic dynamical system, where our {\em a priori} knowledge allows us to place the system in a subset of a metric space (the uncertainty set). We present an information-theoretic meta-theorem that captures the trade-off between the metric complexity (or richness) of the uncertainty set, the amount of information acquired online in the process of controlling and observing the system, and the residual uncertainty remaining after the observations have been collected. Following the approach of Zames, we quantify {\em a priori} information by the Kolmogorov (metric) entropy of the uncertainty set, while the information acquired online is expressed as a sum of information divergences. The general theory is used to derive new minimax lower bounds on the metric identification error, as well as to give a simple derivation of the minimum time needed to stabilize an uncertain stochastic linear system."
systems_identification  control_theory  estimation  learning_theory  minimax  heard_the_talk  raginsky.maxim  to:NB 
october 2010 by cshalizi
[1005.2137] A self-normalized approach to confidence interval construction in time series
Revised version of a paper from JRSS-B. "ew method to construct confidence intervals for quantities that are associated with a stationary time series, which avoids direct estimation of the asymptotic variances. Unlike the existing tuning-parameter-dependent approaches, our method has the attractive convenience of being free of choosing any user-chosen number or smoothing parameter. The interval is constructed on the basis of an asymptotically distribution-free self-normalized statistic, in which the normalizing matrix is computed using recursive estimates. Under mild conditions, we establish the theoretical validity of our method for a broad class of statistics that are functionals of the empirical distribution of fixed or growing dimension. From a practical point of view, our method is conceptually simple, easy to implement .... Monte-Carlo simulations ... compare the finite sample performance ... with ... normal approximation and the block bootstrap approach."
time_series  confidence_sets  estimation  statistics 
october 2010 by cshalizi
Default priors for Bayesian and frequentist inference - Fraser et al. - 2010 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"We investigate the choice of default priors for use with likelihood for Bayesian and frequentist inference. Such a prior is a density or relative density that weights an observed likelihood function, leading to the elimination of parameters that are not of interest and then a density-type assessment for a parameter of interest. For independent responses from a continuous model, we develop a prior for the full parameter that is closely linked to the original Bayes approach and provides an extension of the right invariant measure to general contexts. We then develop a modified prior that is targeted on a component parameter of interest and by targeting avoids the marginalization paradoxes of Dawid and co-workers. This modifies Jeffreys's prior and provides extensions to the development of Welch and Peers. ... combined to explore priors for a vector parameter of interest in the presence of a vector nuisance parameter. Examples ... illustrate the computation of the priors."
likelihood  estimation  default_priors  bayesianism  statistics  nuisance_parameters 
october 2010 by cshalizi
[1010.1449] The geometry of nonlinear least squares with applications to sloppy models and optimization
Errr --- isn't this all standard information-geometry stuff? But Sethna and Machta are people who usually know what they're talking about...
to_be_shot_after_a_fair_trial  information_geometry  to_read  estimation 
october 2010 by cshalizi
Carvalho, Johannes, Lopes, Polson: Particle Learning and Smoothing
"Particle learning (PL) provides state filtering, sequential parameter learning and smoothing in a general class of state space models. Our approach extends existing particle methods by incorporating the estimation of static parameters via a fully-adapted filter that utilizes conditional sufficient statistics for parameters and/or states as particles. State smoothing in the presence of parameter uncertainty is also solved as a by-product of PL. In a number of examples, we show that PL outperforms existing particle filtering alternatives and proves to be a competitor to MCMC."
particle_filters  filtering  state-space_models  state_estimation  estimation  time_series  statistics 
august 2010 by cshalizi
Mariadassou, Robin, Vacher: Uncovering latent structure in valued graphs: A variational approach
"As more and more network-structured data sets are available, the statistical analysis of valued graphs has become common place. Looking for a latent structure is one of the many strategies used to better understand the behavior of a network. Several methods already exist for the binary case.

We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational tools allow us to achieve approximate maximum likelihood estimation of the parameters of these models. We provide a simulation study showing that our estimation method performs well over a broad range of situations. We apply this method to analyze host–parasite interaction networks in forest ecosystems."
network_data_analysis  inference_to_latent_objects  community_discovery  statistics  estimation 
august 2010 by cshalizi
Snijders, Koskinen, Schweinberger: Maximum likelihood estimation for social network dynamics
"A model for network panel data is discussed, based on the assumption that the observed data are discrete observations of a continuous-time Markov process on the space of all directed graphs on a given node set, in which changes in tie variables are independent conditional on the current graph. The model for tie changes is parametric and designed for applications to social network analysis, where the network dynamics can be interpreted as being generated by choices made by the social actors represented by the nodes of the graph. An algorithm for calculating the Maximum Likelihood estimator is presented, based on data augmentation and stochastic approximation. An application to an evolving friendship network is given and a small simulation study is presented which suggests that for small data sets the Maximum Likelihood estimator is more efficient than the earlier proposed Method of Moments estimator."
network_data_analysis  social_networks  markov_models  estimation  statistics  heard_the_talk 
august 2010 by cshalizi
"Dimension Reduction and Adaptation in Conditional Density Estimation" - Journal of the American Statistical Association - 105(490):761
Sounds neat: "An orthogonal series estimator of the conditional density of a response given a vector of continuous and ordinal/nominal categorical predictors ... The estimator is based on writing a conditional density as a sum of orthogonal projections on all possible subspaces of reduced dimensionality and then estimating each projection via a shrinkage procedure [which] uses a universal thresholding and a dyadic-blockwise shrinkage for low and high frequencies, respectively. The estimator is data-driven, is adaptive to underlying smoothness of a conditional density, and attains a minimax rate of the mean integrated squared error convergence. ... if a conditional density depends only on a subgroup of predictors, then the estimator seizes the opportunity and attains a corresponding minimax rate of convergence. The latter property relaxes the notorious “curse of dimensionality.” ... the estimator is fast, because neither projections nor shrinkages are computation-intensive."
density_estimation  dimension_reduction  statistics  estimation  shrinkage 
july 2010 by cshalizi
[1007.0549] Minimax Manifold Estimation
"We find the minimax rate of convergence in Hausdorff distance for estimating a manifold M of dimension d embedded in R^D given a noisy sample from the manifold. We assume that the manifold satisfies a smoothness condition and that the noise distribution has compact support. We show that the optimal rate of convergence is n^{-2/(2+d)}. Thus, the minimax rate depends only on the dimension of the manifold, not on the dimension of the space in which M is embedded."
manifold_learning  minimax  estimation  kith_and_kin  statistics  to_read  genovese.chris  wasserman.larry  verdinelli.isa 
july 2010 by cshalizi
« earlier      

related tags

additive_models  algorithmic_information_theory  approximate_bayesian_computation  approximation  astronomy  asymptotics  bad_data_analysis  bartlett.m.s.  bayesianism  bickel.peter  books:recommended  bootstrap  cai.t._tony  classifiers  clustering  community_discovery  compressed_sensing  computability  computational_statistics  concentration_of_measure  conferences  confidence_sets  control_theory  convexity  correlation_time  cosmology  cramer-rao  cross-validation  curse_of_dimensionality  data_mining  decision_theory  default_priors  degrees_of_freedom  del_moral.pierre  density_estimation  deviation_bounds  differential_geometry  dimension_estimation  dimension_reduction  distributions  dynamical_systems  empirical_likelihood  empirical_processes  em_algorithm  ensemble_methods  entropy  entropy_estimation  ergodic_theory  estimation  estimation_of_dynamical_systems  exponential_families  exponential_family_random_graphs  factor_analysis  fan.jianqing  filtering  fisher_information  fmri  franklin.charles  fraser.d.a.s.  freeman.peter  genovese.chris  genovese.christopher  geyer.charles  goerg.georg_m.  graphical_models  gustafson.paul  have_read  heard_the_talk  heavy_tails  heteroskedasticity  hierarchical_models  history_of_statistics  hjort.nils_lid  hooker.giles  hypothesis_testing  identifiability  indirect_inference  inference_to_latent_objects  information_geometry  information_theory  instrumental_variables  inverse_problems  in_NB  jordan.michael_i.  kernel_estimators  kith_and_kin  kolaczyk.eric  krivitsky.pavel  large_deviations  lasso  latent_variables  learning_theory  lebanon.guy  lee.ann_b.  levina.elizaveta  le_cam.lucien  likelihood  long-memory_processes  long-range_dependence  machine_learning  manifold_learning  markov_models  martingales  meyn.sean_p.  minimax  misspecification  mixing  mixture_models  model_selection  model_uncertainty  monte_carlo  multiple_comparisons  network_data_analysis  neural_data_analysis  neuroscience  neville.jennifer  non-stationarity  nonparametrics  nuisance_parameters  optimization  particle_filters  pollard.david  prediction  probability  R  raginsky.maxim  random_fields  re:almost_none  re:AoS_project  re:phil-of-bayes_paper  re:smoothing_adjacency_matrices  re:stacs  re:XV_for_mixing  re:XV_for_networks  re:your_favorite_dsge_sucks  re:your_favorite_ergm_sucks  regression  relational_learning  richards.joey  robins.james  saddle-point_approximation  schafer.chad  self-promotion  sequential_monte_carlo  sethna.james  shot_after_a_fair_trial  shrinkage  simulation-based_inference  social_networks  sparsity  splines  state-space_models  state-space_reconstruction  state_estimation  statistical_inference_for_stochastic_processes  statistics  stein.charles  stigler.stephen  stochastic_differential_equations  stochastic_processes  structural_risk_minimization  sufficiency  superefficiency  systems_identification  time_series  to:blog  to:NB  to_be_shot_after_a_fair_trial  to_read  to_teach:advanced-stochastic-processes  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:undergrad-ADA  track_down_references  two-sample_tests  van_der_vaart.aad  van_handel.ramon  variable_selection  variance_estimation  variational_inference  verdinelli.isa  via:ale  via:arthegall  via:ded-maxim  via:guslacerda  via:larry  via:mreid  vovk.vladimir_g.  wainright.martin  wasserman.larry  wheels:reinvention_of  yu.bin  zhang.tong 

Copy this bookmark:



description:


tags: