cshalizi + bootstrap   44

[1204.5633] Noncentral Limit Theorem and the Bootstrap for Quantiles of Dependent Data
"We will show under minimal conditions on differentiability and dependence that the central limit theorem for quantiles holds and that the block bootstrap is weakly consistent. Under slightly stronger conditions, the bootstrap is strongly consistent. Without the differentiability condition, quantiles might have a non-normal asymptotic distribution and the bootstrap might fail."
to:NB  bootstrap  statistics  statistical_inference_for_stochastic_processes 
4 weeks ago by cshalizi
[1204.2762] On the Uniform Asymptotic Validity of Subsampling and the Bootstrap
"This paper provides conditions under which subsampling and the bootstrap can be used to construct estimators of the quantiles of the distribution of a root that behave well uniformly over a large class of distributions $mathbf P$. These results are then applied (i) to construct confidence regions that behave well uniformly over $mathbf P$ in the sense that the coverage probability tends to at least the nominal level uniformly over $mathbf P$ and (ii) to construct tests that behave well uniformly over $mathbf P$ in the sense that the size tends to no greater than the nominal level uniformly over $mathbf P$. Without these stronger notions of convergence, the asymptotic approximations to the coverage probability or size may be poor even in very large samples. Specific applications include the multivariate mean, testing moment inequalities, multiple testing, the empirical process, and $U$-statistics."
in_NB  bootstrap  statistics 
6 weeks ago by cshalizi
[0803.0835] Goodness-of-fit tests for Markovian time series models: Central limit theory and bootstrap approximations
"New goodness-of-fit tests for Markovian models in time series analysis are developed which are based on the difference between a fully nonparametric estimate of the one-step transition distribution function of the observed process and that of the model class postulated under the null hypothesis. The model specification under the null allows for Markovian models, the transition mechanisms of which depend on an unknown vector of parameters and an unspecified distribution of i.i.d. innovations. Asymptotic properties of the test statistic are derived and the critical values of the test are found using appropriate bootstrap schemes. General properties of the bootstrap for Markovian processes are derived. A new central limit theorem for triangular arrays of weakly dependent random variables is obtained. For the proof of stochastic equicontinuity of multidimensional empirical processes, we use a simple approach based on an anisotropic tiling of the space. The finite-sample behavior of the proposed test is illustrated by some numerical examples and a real-data application is given."
in_NB  statistics  statistical_inference_for_stochastic_processes  bootstrap  markov_models  goodness-of-fit 
8 weeks ago by cshalizi
Taylor & Francis Online :: Robustness Diagnosis for Bootstrap Inference - Journal of Computational and Graphical Statistics - Volume 20, Issue 2
"We propose a new robustness diagnostic scheme for bootstrap inference procedures. The scheme is adaptive to the data actually observed, applies readily to bootstrap inference output of diverse format, and therefore provides robustness diagnostics practically more relevant than most conventional robustness measures. Specifically, it monitors the sensitivity of the bootstrap distribution of inference output to specially designed omnidirectional data perturbations, and quantifies findings by a standardized measure with the aid of repeated resampling. The resulting measure, displayed in the form of an R-value plot, permits direct comparisons across different bootstrap procedures and across inference output of different types. Numerical examples are presented using both simulated and real-life data to illustrate applications of the scheme to estimation and hypothesis testing problems. This article has supplementary material online."
in_NB  statistics  bootstrap 
8 weeks ago by cshalizi
Greetings, Philosophers - Kieran Healy
But what _kind_ of bootstrap? It's clustered data (raters x schools), which raises interesting technical issues!
philosophy  academia  data_analysis  healy.kieran  bootstrap  to_teach:undergrad-ADA 
9 weeks ago by cshalizi
Bootstrapping clustered data - Field - 2007 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"Various bootstraps have been proposed for bootstrapping clustered data from one-way arrays. The simulation results in the literature suggest that some of these methods work quite well in practice; the theoretical results are limited and more mixed in their conclusions. For example, McCullagh reached negative conclusions about the use of non-parametric bootstraps for one-way arrays. The purpose of this paper is to extend our understanding of the issues by discussing the effect of different ways of modelling clustered data, the criteria for successful bootstraps used in the literature and extending the theory from functions of the sample mean to include functions of the between and within sums of squares and non-parametric bootstraps to include model-based bootstraps. We determine that the consistency of variance estimates for a bootstrap method depends on the choice of model with the residual bootstrap giving consistency under the transformation model whereas the cluster bootstrap gives consistent estimates under both the transformation and the random-effect model. In addition we note that the criteria based on the distribution of the bootstrap observations are not really useful in assessing consistency."
in_NB  have_read  statistics  bootstrap  to_teach:undergrad-ADA  hierarchical_models 
february 2012 by cshalizi
[1201.6211] On the range of validity of the autoregressive sieve bootstrap
"We explore the limits of the autoregressive (AR) sieve bootstrap, and show that its applicability extends well beyond the realm of linear time series as has been previously thought. In particular, for appropriate statistics, the AR-sieve bootstrap is valid for stationary processes possessing a general Wold-type autoregressive representation with respect to a white noise; in essence, this includes all stationary, purely nondeterministic processes, whose spectral density is everywhere positive. Our main theorem provides a simple and effective tool in assessing whether the AR-sieve bootstrap is asymptotically valid in any given situation. In effect, the large-sample distribution of the statistic in question must only depend on the first and second order moments of the process; prominent examples include the sample mean and the spectral density. As a counterexample, we show how the AR-sieve bootstrap is not always valid for the sample autocovariance even when the underlying process is linear."
in_NB  bootstrap  time_series  statistics  stochastic_processes 
february 2012 by cshalizi
[0805.4136] Inference for the dark energy equation of state using Type IA supernova data
"The surprising discovery of an accelerating universe led cosmologists to posit the existence of "dark energy"--a mysterious energy field that permeates the universe. Understanding dark energy has become the central problem of modern cosmology. After describing the scientific background in depth, we formulate the task as a nonlinear inverse problem that expresses the comoving distance function in terms of the dark-energy equation of state. We present two classes of methods for making sharp statistical inferences about the equation of state from observations of Type Ia Supernovae (SNe). First, we derive a technique for testing hypotheses about the equation of state that requires no assumptions about its form and can distinguish among competing theories. Second, we present a framework for computing parametric and nonparametric estimators of the equation of state, with an associated assessment of uncertainty. Using our approach, we evaluate the strength of statistical evidence for various competing models of dark energy. Consistent with current studies, we find that with the available Type Ia SNe data, it is not possible to distinguish statistically among popular dark-energy models, and that, in particular, there is no support in the data for rejecting a cosmological constant. With much more supernova data likely to be available in coming years (e.g., from the DOE/NASA Joint Dark Energy Mission), we address the more interesting question of whether future data sets will have sufficient resolution to distinguish among competing theories."

--- I am biased, because Chris G. and Larry are friends, but this seems to me a model of the modern applied statistics paper: use interesting statistical tools to say something helpful about an important scientific problem on its own terms, rather than distorting the problem until it "looks like a nail".
in_NB  kith_and_kin  cosmology  astronomy  inverse_problems  nonparametrics  estimation  hypothesis_testing  statistics  bootstrap  genovese.christopher  wasserman.larry  have_read 
january 2012 by cshalizi
Wiley: Mathematical Statistics with Resampling and R
"Resampling helps students understand the meaning of sampling distributions, sampling variability, P-values, hypothesis tests, and confidence intervals. This groundbreaking book shows how to apply modern resampling techniques to mathematical statistics. Extensively class-tested to ensure an accessible presentation, Mathematical Statistics with Resampling and R utilizes the powerful and flexible computer language R to underscore the significance and benefits of modern resampling techniques."

--- This might be a good book for a baby stats. class; but even if it's a _great_ book, how on Earth am I supposed to justify asking students to spend $130 for it?
books:noted  statistics  bootstrap  R 
december 2011 by cshalizi
Quantifying the failure of bootstrap likelihood ratio tests
"When testing geometrically irregular parametric hypotheses, the bootstrap is an intuitively appealing method to circumvent difficult distribution theory. It has been shown, however, that the usual bootstrap is inconsistent in estimating the asymptotic distributions involved in such problems. This paper is concerned with the asymptotic size of likelihood ratio tests when critical values are computed using the inconsistent bootstrap. We clarify how the asymptotic size of such a test can be obtained from the size of the corresponding bootstrap test in the relevant limiting normal experiment. For boundary problems, that is, hypotheses given by convex cones, we show the bootstrap test to always be anticonservative, and we compute the size numerically for different two-dimensional examples. The examples illustrate that the size can be below or above the nominal level, and reveal that the relationship between the size of the test and the geometry of the considered hypotheses is surprisingly subtle."
in_NB  statistics  bootstrap  hypothesis_testing 
december 2011 by cshalizi
Truquet : On a nonparametric resampling scheme for Markov random fields
"We study an extension to general Markov random fields of the resampling scheme given in Bickel and Levina (2006) [4] for texture synthesis with stationary Markov mesh models. The procedure generates bootstrap replicates of a sample using kernel regression and the principle of Gibbs sampling. Consistency of the bootstrap distribution is investigated under the Dobrushin contraction condition. Some simulation examples are given, in particular for the texture synthesis, for which the multiscale algorithm of Paget and Longstaff (1998) [27] is revisited."
in_NB  to_read  random_fields  bootstrap  statistics  spatial_statistics  markov_models 
november 2011 by cshalizi
[1111.1876] On the stability of bootstrap estimators
"It is shown that bootstrap approximations of an estimator which is based on a continuous operator from the set of Borel probability measures defined on a compact metric space into a complete separable metric space is stable in the sense of qualitative robustness. Support vector machines based on shifted loss functions are treated as special cases."
in_NB  statistics  bootstrap  stability_of_learning  re:XV_for_mixing 
november 2011 by cshalizi
Science without (parametric) models: the case of bootstrap resampling: SpringerLink - Synthese, Volume 180, Number 1
"Scientific and statistical inferences build heavily on explicit, parametric models, and often with good reasons. However, the limited scope of parametric models and the increasing complexity of the studied systems in modern science raise the risk of model misspecification. Therefore, I examine alternative, data-based inference techniques, such as bootstrap resampling. I argue that their neglect in the philosophical literature is unjustified: they suit some contexts of inquiry much better and use a more direct approach to scientific inference. Moreover, they make more parsimonious assumptions and often replace theoretical understanding and knowledge about mechanisms by careful experimental design. Thus, it is worthwhile to study in detail how nonparametric models serve as inferential engines in science."
in_NB  philosophy_of_science  bootstrap  statistics  modeling  nonparametrics 
october 2011 by cshalizi
Kreiss , Paparoditis , Politis : On the range of validity of the autoregressive sieve bootstrap
"We explore the limits of the autoregressive (AR) sieve bootstrap, and show that its applicability extends well beyond the realm of linear time series as has been previously thought. In particular, for appropriate statistics, the AR-sieve bootstrap is valid for stationary processes possessing a general Wold-type autoregressive representation with respect to a white noise; in essence, this includes all stationary, purely nondeterministic processes, whose spectral density is everywhere positive. Our main theorem provides a simple and effective tool in assessing whether the AR-sieve bootstrap is asymptotically valid in any given situation. In effect, the large-sample distribution of the statistic in question must only depend on the first and second order moments of the process; prominent examples include the sample mean and the spectral density. As a counterexample, we show how the AR-sieve bootstrap is not always valid for the sample autocovariance even when the underlying process is linear."
in_NB  time_series  bootstrap  statistics  stochastic_processes 
october 2011 by cshalizi
A Resampling Technique for Relational Data
Roughly: Fix an integer b. Do snowballing sampling from uniformly-random seeds until each snowball contains b nodes. Try to wire up the peripheral nodes of each snowball in a similarity-preserving way.

This reminds me of the block bootstrap for time series, only the similarity-preserving step seems ugly; blocks are independent in time series. What if we have the peripheral nodes attach randomly to each other, preserving only degree? We'd need to let b grow with n --- when would this give a good approximation to the sampling distribution?
in_NB  re:XV_for_networks  bootstrap  statistics  relational_learning  neville.jennifer  have_read 
october 2011 by cshalizi
[1110.1248] An algorithm to compute the power of Monte Carlo tests with guaranteed precision
"This article presents an algorithm that generates an exact (conservative) confidence interval of a specified length and coverage probability for the power of a Monte Carlo test (such as a bootstrap or permutation test). It is the first method that achieves this aim for almost any Monte Carlo test. The existing research on power estimation for Monte Carlo tests has focused on obtaining as accurate a result as possible for a fixed computational effort. However, the methods proposed do not provide any guarantee of precision, in the sense that they cannot report a confidence interval to accompany their estimate of the power. Conversely in this article the computational effort is random. The algorithm operates until a confidence interval can be constructed that meets the requirements of the user, in terms of length and coverage probability. We show that, surprisingly, by generating two more datasets that what might have been assumed to be sufficient, the expected number of steps required by the algorithm is finite in many cases of practical interest. These include, for instance, any situation where the distribution of the p-value is absolutely continuous or if it is discrete with finite support. The algorithm is implemented in the R package simctest."
statistics  hypothesis_testing  confidence_sets  monte_carlo  bootstrap  in_NB 
october 2011 by cshalizi
A Perturbation Method for Inference on Regularized Regression Estimates
"Analysis of high-dimensional data often seeks to identify a subset of important features and to assess the effects of these features on outcomes. Traditional statistical inference procedures based on standard regression methods often fail in the presence of high-dimensional features. In recent years, regularization methods have emerged as promising tools for analyzing high-dimensional data. These methods simultaneously select important features and provide stable estimation of their effects. Adaptive LASSO and SCAD, for instance, give consistent and asymptotically normal estimates with oracle properties. However, in finite samples, it remains difficult to obtain interval estimators for the regression parameters. In this article, we propose perturbation resampling-based procedures to approximate the distribution of a general class of penalized parameter estimates. Our proposal, justified by asymptotic theory, provides a simple way to estimate the covariance matrix and confidence regions. Through finite-sample simulations, we verify the ability of this method to give accurate inference and compare it with other widely used standard deviation and confidence interval estimates. We also illustrate our proposals with a dataset used to study the association of HIV drug resistance and a large number of genetic mutations."
in_NB  regression  sparsity  confidence_sets  statistics  bootstrap 
september 2011 by cshalizi
Reality Checks and Comparisons of Nested Predictive Models - Journal of Business and Economic Statistics - 0(0):1
"This article develops a simple bootstrap method for simulating asymptotic critical values for tests of equal forecast accuracy and encompassing among many nested models. Our method combines elements of fixed regressor and wild bootstraps. We first derive the asymptotic distributions of tests of equal forecast accuracy and encompassing applied to forecasts from multiple models that nest the benchmark model—that is, reality check tests. We then prove the validity of the bootstrap for these tests. Monte Carlo experiments indicate that our proposed bootstrap has better finite-sample size and power than other methods designed for comparison of nonnested models."
statistics  model_checking  model_selection  time_series  bootstrap  to_read  to_teach:undergrad-ADA  encompassing 
september 2011 by cshalizi
[1106.2125] Bootstrapping data arrays of arbitrary order
"In this paper we study a bootstrap strategy for estimating the variance of a mean taken over large multifactor crossed random effects data sets. We apply bootstrap reweighting independently to the levels of each factor, giving each observation the product of its factor weights. No exact bootstrap exists for this problem (McCullagh (2000)). We show that the proposed bootstrap is mildly conservative, under sufficient conditions that allow very unbalanced and heteroscedastic inputs. Earlier results for a resampling bootstrap only apply to two factors and are not suitable to online computation. The proposed reweighting approach can be implemented in parallel and online settings. The results for this method apply to any number of factors. The method is illustrated using a 3 factor data set of comment lengths from Facebook."
bootstrap  statistics  eckles.dean  owen.art  have_read  network_data_analysis  re:smoothing_adjacency_matrices  to:blog 
june 2011 by cshalizi
Owen : The pigeonhole bootstrap
"Recently there has been much interest in data that, in statistical language, may be described as having a large crossed and severely unbalanced random effects structure. Such data sets arise for recommender engines and information retrieval problems. Many large bipartite weighted graphs have this structure too. We would like to assess the stability of algorithms fit to such data. Even for linear statistics, a naive form of bootstrap sampling can be seriously misleading and McCullagh [Bernoulli 6 (2000) 285–301] has shown that no bootstrap method is exact. We show that an alternative bootstrap separately resampling rows and columns of the data matrix satisfies a mean consistency property even in heteroscedastic crossed unbalanced random effects models. This alternative does not require the user to fit a crossed random effects model to the data."
bootstrap  network_data_analysis  via:deaneckles  re:smoothing_adjacency_matrices  have_read 
march 2011 by cshalizi
Mccullagh : Resampling and exchangeable arrays
But, ummm, you need to make sure your resampling plan respects the dependence structure.  I'm pretty sure that I could use this to "prove" that you couldn't use resampling to get standard errors for the mean of a stationary time series.  Something very weird here.  To re-read.
bootstrap  network_data_analysis  statistics  via:deaneckles  re:smoothing_adjacency_matrices  have_read 
march 2011 by cshalizi
Levina, Bickel: Texture synthesis and nonparametric resampling of random fields
Found the pre-print, which I'd read in '04, while looking for something else in my office... Note that this is the same shape of mesh that Lindgren and Nordahl advocated for use in 2D information theory, on totally different (I think) grounds.
bootstrap  spatial_statistics  random_fields  statistics  nonparametrics  have_read 
august 2010 by cshalizi
10-705 Intermediate Statistics, Fall 2009
Larry's version of the typical masters-level course based on Casella and Berger. Note: half of what he covers is not in Casella and Berger. (For example, he starts with VC theory!)
learning_theory  statistics  estimation  hypothesis_testing  prediction  minimax  bootstrap  model_selection  regression  classifiers  confidence_sets  wasserman.larry  kith_and_kin 
april 2010 by cshalizi
Lindsay, Liu: Model Assessment Tools for a Model False World
"a model credibility index, which is designed to serve as a one-number summary measure of model adequacy. We define the index to be the maximum sample size at which samples from the model and those from the true data generating mechanism are nearly indistinguishable. We use standard notions from hypothesis testing to make this definition precise. We use data subsampling to estimate the index" --- To be blogged, after the paper with Andy is done.
statistics  misspecification  re:phil-of-bayes_paper  hypothesis_testing  bootstrap  have_read  to:blog 
april 2010 by cshalizi
Arlot, Blanchard, Roquain: Some nonasymptotic results on resampling in high dimension, I: Confidence regions
"We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality of the vector can possibly be much larger than the number of observations and we focus on a nonasymptotic control of the confidence level, following ideas inspired by recent results in learning theory. We consider two approaches, the first based on a concentration principle (valid for a large class of resampling weights) and the second on a resampled quantile, specifically using Rademacher weights. Several intermediate results established in the approach based on concentration principles are of interest in their own right. We also discuss the question of accuracy when using Monte Carlo approximations of the resampled quantities."
statistics  resampling  bootstrap  cross-validation  confidence_sets  to_read  re:XV_for_mixing  concentration_of_measure  learning_theory 
december 2009 by cshalizi
Bruce Hansen's Econometrics Text
"This is a draft of an incomplete first-year Ph.D. econometrics textbook. This manuscript may be printed and reproduced for individual or instructional use, but may not be printed for commercial purposes."
econometrics  statistics  to_read  bootstrap  time_series  regression  hansen.bruce 
june 2009 by cshalizi
[0811.1888] Central Limit Theorem and the Bootstrap for U-Statistics of Strongly Mixing Data
"The asymptotic normality of U-statistics has so far been proved for iid data and under various mixing conditions such as absolute regularity, but not for strong mixing. We use a coupling technique introduced in 1983 by Bradley to prove a new generalized covariance inequality similar to Yoshihara's. It follows from the Hoeffding-decomposition and this inequality that U-statistics of strongly mixing observations converge to a normal limit if the kernel of the U-statistic fulfills some moment and continuity conditions.
The validity of the bootstrap for U-statistics has until now only been established in the case of iid data (see Bickel and Freedman). For mixing data, Politis and Romano proposed the circular block bootstrap, which leads to a consistent estimation of the sample mean's distribution. We extend these results to U-statistics of weakly dependent data and prove a CLT for the circular block bootstrap version of U-statistics under absolute regularity and strong mixing. We also calculate a rate of convergence for the bootstrap variance estimator of a U-statistic and give some simulation results."
central_limit_theorem  statistics  bootstrap  mixing  ergodic_theory  stochastic_processes 
june 2009 by cshalizi
An Evolutionary Bootstrap Approach to Neural Network Pruning and Optimization (LeBaron)
I remember hearing Blake talk about this in Madison, but I didn't appreciate bootstrapping at the time...
statistics  machine_learning  neural_networks  bootstrap  to_read  lebaron.blake 
march 2009 by cshalizi
[0901.3202] Model-Consistent Sparse Estimation through the Bootstrap
"if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection"
lasso  linear_regression  model_selection  variable_selection  bootstrap 
january 2009 by cshalizi
Fast and robust bootstrap
"recent developments on a bootstrap method for robust estimators which is computationally faster and more resistant to outliers than the classical bootstrap"
bootstrap  statistics  linear_regression 
february 2008 by cshalizi
[0709.0406] A resampling-based test to detect person-to-person transmission of infectious disease
The null hypothesis, for non-contagious diseases, is IID onset times, i.e., no dependence between onset times for people near each other in the social network. So it doesn't have power against homophily on traits which affect (or even just predict!) the disease.
epidemiology  statistics  bootstrap  to_teach:complexity-and-inference  network_data_analysis  re:homophily_and_confounding  have_read  re:social-networks-as-sensor-networks 
november 2007 by cshalizi

related tags

academia  astronomy  bergstrom.carl  bibliometry  books:noted  books:recommended  bootstrap  causal_inference  central_limit_theorem  citation_networks  classifiers  clustering  community_discovery  concentration_of_measure  confidence_sets  cosmology  cross-validation  data_analysis  data_mining  dimension_reduction  dynamical_systems  eckles.dean  econometrics  encompassing  ensemble_methods  epidemiology  ergodic_theory  estimation  fourier_analysis  genovese.christopher  goodness-of-fit  graphical_models  hansen.bruce  harrison.matt  have_read  healy.kieran  hierarchical_models  homophily  hypothesis_testing  inverse_problems  in_NB  jordan.michael_i.  kith_and_kin  large_deviations  lasso  learning_theory  lebaron.blake  linear_regression  machine_learning  markov_models  minimax  minimum_description_length  mis-specification_testing  misspecification  mixing  modeling  model_checking  model_selection  monte_carlo  network_data_analysis  neural_data_analysis  neural_networks  neville.jennifer  nonparametrics  owen.art  philosophy  philosophy_of_science  prediction  R  random_fields  re:almost_none  re:homophily_and_confounding  re:network_differences  re:phil-of-bayes_paper  re:smoothing_adjacency_matrices  re:social-networks-as-sensor-networks  re:stacs  re:XV_for_mixing  re:XV_for_networks  regression  relational_learning  resampling  rosvall.martin  sarkar.purnamrita  social_influence  sparsity  spatial_statistics  stability_of_learning  stationarity  statistical_inference_for_stochastic_processes  statistics  stochastic_processes  time_series  to:blog  to:NB  to_read  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:undergrad-ADA  variable_selection  via:arthegall  via:deaneckles  visual_display_of_quantitative_information  wasserman.larry 

Copy this bookmark:



description:


tags: