cshalizi + cross-validation 28
[0803.2963] Consistency of cross validation for comparing regression procedures
11 weeks ago by cshalizi
"Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property."
to:NB
statistics
to_read
cross-validation
model_selection
nonparametrics
to_teach:undergrad-ADA
re:stacs
11 weeks ago by cshalizi
[0806.4140] Optimal oracle inequalities for model selection
12 weeks ago by cshalizi
"Model selection is often performed by empirical risk minimization. The quality of selection in a given situation can be assessed by risk bounds, which require assumptions both on the margin and the tails of the losses used. Starting with examples from the 3 basic estimation problems, regression, classification and density estimation, we formulate risk bounds for empirical risk minimization under successively weakening conditions and prove them at a very general level, for general margin and power tail behavior of the excess losses."
in_NB
statistics
learning_theory
cross-validation
model_selection
van_de_geer.sara
12 weeks ago by cshalizi
Model Selection in Kernel Based Regression using the Influence Function
february 2012 by cshalizi
"Recent results about the robustness of kernel methods involve the analysis of influence functions. By definition the influence function is closely related to leave-one-out criteria. In statistical learning, the latter is often used to assess the generalization of a method. In statistics, the influence function is used in a similar way to analyze the statistical efficiency of a method. Links between both worlds are explored. The influence function is related to the first term of a Taylor expansion. Higher order influence functions are calculated. A recursive relation between these terms is found characterizing the full Taylor expansion. It is shown how to evaluate influence functions at a specific sample distribution to obtain an approximation of the leave-one-out error. A specific implementation is proposed using a L1 loss in the selection of the hyperparameters and a Huber loss in the estimation procedure. The parameter in the Huber loss controlling the degree of robustness is optimized as well. The resulting procedure gives good results, even when outliers are present in the data."
to:NB
statistics
regression
kernel_estimators
model_selection
robustness
nonparametrics
cross-validation
february 2012 by cshalizi
Shen , Welch , Hughes-Oliver : Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery
december 2011 by cshalizi
"Cross-validation (CV) is widely used for tuning a model with respect to user-selected parameters and for selecting a “best” model. For example, the method of k-nearest neighbors requires the user to choose k, the number of neighbors, and a neural network has several tuning parameters controlling the network complexity. Once such parameters are optimized for a particular data set, the next step is often to compare the various optimized models and choose the method with the best predictive performance. Both tuning and model selection boil down to comparing models, either across different values of the tuning parameters or across different classes of statistical models and/or sets of explanatory variables. For multiple large sets of data, like the PubChem drug discovery cheminformatics data which motivated this work, reliable CV comparisons are computationally demanding, or even infeasible. In this paper we develop an efficient sequential methodology for model comparison based on CV. It also takes into account the randomness in CV. The number of models is reduced via an adaptive, multiplicity-adjusted sequential algorithm, where poor performers are quickly eliminated. By exploiting matching of individual observations, it is sometimes even possible to establish the statistically significant inferiority of some models with just one execution of CV."
in_NB
model_selection
statistics
cross-validation
machine_learning
december 2011 by cshalizi
Selecting Amongst Large Classes of Models
december 2011 by cshalizi
Chatty slides from Brian Ripley. Approvable.
in_NB
via:gelman
model_selection
statistics
ripley.brian
cross-validation
information_criteria
december 2011 by cshalizi
Variance estimation using refitted cross-validation in ultrahigh dimensional regression - Fan - 2011 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
october 2011 by cshalizi
"Variance estimation is a fundamental problem in statistical modelling. In ultrahigh dimensional linear regression where the dimensionality is much larger than the sample size, traditional variance estimation techniques are not applicable. Recent advances in variable selection in ultrahigh dimensional linear regression make this problem accessible. One of the major problems in ultrahigh dimensional regression is the high spurious correlation between the unobserved realized noise and some of the predictors. As a result, the realized noises are actually predicted when extra irrelevant variables are selected, leading to a serious underestimate of the level of noise. We propose a two-stage refitted procedure via a data splitting technique, called refitted cross-validation, to attenuate the influence of irrelevant variables with high spurious correlations. Our asymptotic results show that the resulting procedure performs as well as the oracle estimator, which knows in advance the mean regression function. The simulation studies lend further support to our theoretical claims. The naive two-stage estimator and the plug-in one-stage estimators using the lasso and smoothly clipped absolute deviation are also studied and compared. Their performances can be improved by the refitted cross-validation method proposed."
statistics
regression
variable_selection
cross-validation
estimation
to:NB
fan.jianqing
october 2011 by cshalizi
Cross-Validation and Mean-Square Stability
march 2011 by cshalizi
It's a little boggling that they don't cite any of the modern (2000--) work on theoretical properties of CV, but oh well...
cross-validation
learning_theory
stability_of_learning
statistics
re:your_favorite_dsge_sucks
re:XV_for_mixing
re:XV_for_networks
to_read
via:nikete
march 2011 by cshalizi
[1010.6202] Sequential Data-Adaptive Bandwidth Selection by Cross-Validation for Nonparametric Prediction
november 2010 by cshalizi
"We consider the problem of bandwidth selection by cross-validation from a sequential point of view in a nonparametric regression model. Having in mind that in applications one often aims at estimation, prediction and change detection simultaneously, we investigate that approach for sequential kernel smoothers in order to base these tasks on a single statistic. We provide uniform weak laws of large numbers and weak consistency results for the cross-validated bandwidth. Extensions to weakly dependent error terms are discussed as well. The errors may be {\alpha}-mixing or L2-near epoch dependent, which guarantees that the uniform convergence of the cross validation sum and the consistency of the cross-validated bandwidth hold true for a large class of time series. The method is illustrated by analyzing photovoltaic data."
cross-validation
prediction
time_series
model_selection
to_read
november 2010 by cshalizi
Commenges: Statistical models: Conventional, penalized and hierarchical likelihood
december 2009 by cshalizi
"We give an overview of statistical models and likelihood, together with two of its variants: penalized and hierarchical likelihood. The Kullback-Leibler divergence is referred to repeatedly in the literature, for defining the misspecification risk of a model and for grounding the likelihood and the likelihood cross-validation, which can be used for choosing weights in penalized likelihood. Families of penalized likelihood and particular sieves estimators are shown to be equivalent. The similarity of these likelihoods with a posteriori distributions in a Bayesian approach is considered."
statistics
likelihood
cross-validation
re:phil-of-bayes_paper
to_read
december 2009 by cshalizi
Arlot, Blanchard, Roquain: Some nonasymptotic results on resampling in high dimension, I: Confidence regions
december 2009 by cshalizi
"We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality of the vector can possibly be much larger than the number of observations and we focus on a nonasymptotic control of the confidence level, following ideas inspired by recent results in learning theory. We consider two approaches, the first based on a concentration principle (valid for a large class of resampling weights) and the second on a resampled quantile, specifically using Rademacher weights. Several intermediate results established in the approach based on concentration principles are of interest in their own right. We also discuss the question of accuracy when using Monte Carlo approximations of the resampled quantities."
statistics
resampling
bootstrap
cross-validation
confidence_sets
to_read
re:XV_for_mixing
concentration_of_measure
learning_theory
december 2009 by cshalizi
Sensitivity Analysis of k-fold Cross-Validation in Prediction Error Estimation
november 2009 by cshalizi
Apparently IEEE makes this available solely to tease me, since, while we have a fully paid-up electronic subscription, I can't get access.
machine_learning
statistics
cross-validation
to_read
re:XV_for_mixing
re:XV_for_networks
november 2009 by cshalizi
Oracle inequalities for multi-fold cross validation
october 2009 by cshalizi
Gah, does no one have a copy? --- thanks to A. v.d.V. for a reprint.
model_selection
oracle_inequalities
cross-validation
have_read
october 2009 by cshalizi
Elements of Statistical Learning: data mining, inference, and prediction. 2nd Edition.
october 2009 by cshalizi
Free PDF! (Still, I find my bound physical copy much more convenient.)
books:recommended
machine_learning
data_mining
statistics
learning_theory
estimation
cross-validation
ensemble_methods
classifiers
regression
graphical_models
clustering
dimension_reduction
bootstrap
via:arthegall
have_read
october 2009 by cshalizi
[0908.2904] A bias correction for the minimum error rate in cross-validation
august 2009 by cshalizi
How is this different from Burman's old bias correction for CV? And, how much noise does this correction add?
cross-validation
statistics
machine_learning
model_selection
have_read
to_teach:data-mining
tibshirani.robert
tibshirani.ryan
to_teach:undergrad-ADA
august 2009 by cshalizi
Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization
february 2009 by cshalizi
Why hadn't I seen this before?
learning_theory
machine_learning
cross-validation
re:XV_for_networks
via:shivak
niyogi.partha
re:XV_for_mixing
re:your_favorite_dsge_sucks
february 2009 by cshalizi
A Cross-Validation Filter for Time Series Models (Piet De Jong, 1988)
december 2008 by cshalizi
"A filter is presented which computes cross-validation errors and associated statistics for an arbitrary state space model. The procedure is more efficient than an existing approach. Diffuse initial conditions are easily handled using a minor extension. The relationship to the fixed interval smoothing algorithm is investigated."
cross-validation
state-space_models
markov_models
time_series
have_read
december 2008 by cshalizi
Cross-Validation and the Estimation of Conditional Probability Densities
october 2008 by cshalizi
Nice. Definitely needs to be included next time I teach data-mining. (The method is implemented in the "np" package on CRAN.) In particular worth comparing to logistic regression and logistic GAMs for binary conditional probability estimation/classification.
statistics
density_estimation
kernel_methods
cross-validation
to_teach:data-mining
have_read
to_teach:undergrad-ADA
october 2008 by cshalizi
related tags
arlot.sylvain ⊕ books:recommended ⊕ bootstrap ⊕ calibration ⊕ celisse.alain ⊕ change-point_problem ⊕ classifiers ⊕ clustering ⊕ concentration_of_measure ⊕ confidence_sets ⊕ cross-validation ⊖ data_mining ⊕ density_estimation ⊕ dimension_reduction ⊕ dudoit.sandrine ⊕ ensemble_methods ⊕ estimation ⊕ evolutionary_optimization ⊕ fan.jianqing ⊕ graphical_models ⊕ hansen.bruce ⊕ have_read ⊕ heteroskedasticity ⊕ information_criteria ⊕ in_NB ⊕ kernel_estimators ⊕ kernel_methods ⊕ learning_theory ⊕ likelihood ⊕ machine_learning ⊕ markov_models ⊕ model_averaging ⊕ model_selection ⊕ niyogi.partha ⊕ nonparametrics ⊕ oracle_inequalities ⊕ owen.art ⊕ prediction ⊕ racine.jeffrey ⊕ re:phil-of-bayes_paper ⊕ re:stacs ⊕ re:XV_for_mixing ⊕ re:XV_for_networks ⊕ re:your_favorite_dsge_sucks ⊕ regression ⊕ resampling ⊕ ripley.brian ⊕ robustness ⊕ stability_of_learning ⊕ state-space_models ⊕ statistics ⊕ tibshirani.robert ⊕ tibshirani.ryan ⊕ time_series ⊕ to:NB ⊕ to_read ⊕ to_teach:data-mining ⊕ to_teach:undergrad-ADA ⊕ van_de_geer.sara ⊕ variable_selection ⊕ via:arthegall ⊕ via:gelman ⊕ via:nikete ⊕ via:shivak ⊕Copy this bookmark: