cshalizi + variable_selection   21

[0805.1179] Autoregressive Process Modeling via the Lasso Procedure
"The Lasso is a popular model selection and estimation procedure for linear models that enjoys nice theoretical properties. In this paper, we study the Lasso estimator for fitting autoregressive time series models. We adopt a double asymptotic framework where the maximal lag may increase with the sample size. We derive theoretical results establishing various types of consistency. In particular, we derive conditions under which the Lasso estimator for the autoregressive coefficients is model selection consistent, estimation consistent and prediction consistent. Simulation study results are reported."
to:NB  time_series  statistics  lasso  sparsity  variable_selection  kith_and_kin  heard_the_talk  rinaldo.alessandro  nardi.yuval 
12 weeks ago by cshalizi
[1102.3616] Tight conditions for consistent variable selection in high dimensional nonparametric regression
"We address the issue of variable selection in the regression model with very high ambient dimension, i.e., when the number of covariates is very large. The main focus is on the situation where the number of relevant covariates, called intrinsic dimension, is much smaller than the ambient dimension. Without assuming any parametric form of the underlying regression function, we get tight conditions making it possible to consistently estimate the set of relevant variables. These conditions relate the intrinsic dimension to the ambient dimension and to the sample size. The procedure that is provably consistent under these tight conditions is simple and is based on comparing the empirical Fourier coefficients with an appropriately chosen threshold value."
in_NB  regression  variable_selection  nonparametrics  statistics 
february 2012 by cshalizi
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
"We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature−instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples."
in_NB  information_theory  statistics  variable_selection  model_selection  to_teach:data-mining  to:blog  machine_learning  classifiers  have_read  graphical_models 
february 2012 by cshalizi
Nonparametric estimation of the link function including variable selection - Gerhard Tutz and Sebastian Petry - Statistics and Computing, Volume 22, Number 2
"Nonparametric methods for the estimation of the link function in generalized linear models are able to avoid bias in the regression parameters. But for the estimation of the link typically the full model, which includes all predictors, has been used. When the number of predictors is large these methods fail since the full model cannot be estimated. In the present article a boosting type method is proposed that simultaneously selects predictors and estimates the link function. The method performs quite well in simulations and real data examples." (The "to teach" tag is conjectural.)
in_NB  regression  variable_selection  statistics  nonparametrics  to_read  to_teach:undergrad-ADA 
december 2011 by cshalizi
Variance estimation using refitted cross-validation in ultrahigh dimensional regression - Fan - 2011 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
"Variance estimation is a fundamental problem in statistical modelling. In ultrahigh dimensional linear regression where the dimensionality is much larger than the sample size, traditional variance estimation techniques are not applicable. Recent advances in variable selection in ultrahigh dimensional linear regression make this problem accessible. One of the major problems in ultrahigh dimensional regression is the high spurious correlation between the unobserved realized noise and some of the predictors. As a result, the realized noises are actually predicted when extra irrelevant variables are selected, leading to a serious underestimate of the level of noise. We propose a two-stage refitted procedure via a data splitting technique, called refitted cross-validation, to attenuate the influence of irrelevant variables with high spurious correlations. Our asymptotic results show that the resulting procedure performs as well as the oracle estimator, which knows in advance the mean regression function. The simulation studies lend further support to our theoretical claims. The naive two-stage estimator and the plug-in one-stage estimators using the lasso and smoothly clipped absolute deviation are also studied and compared. Their performances can be improved by the refitted cross-validation method proposed."
statistics  regression  variable_selection  cross-validation  estimation  to:NB  fan.jianqing 
october 2011 by cshalizi
[1106.5242] High Dimensional Sparse Econometric Models: An Introduction
I love how they just flat-out identify "econometrics" with "linear regression with Gaussian noise"; but it looks like a clean exposition with proofs.
regression  lasso  variable_selection  econometrics 
june 2011 by cshalizi
[1009.2302] The Predictive Lasso
"We propose a shrinkage procedure for simultaneous variable selection and estimation in generalized linear models (GLMs) with an explicit predictive motivation. The procedure estimates the coefficients by minimizing the Kullback-Leibler divergence of a set of predictive distributions to the corresponding predictive distributions for the full model, subject to an $l_1$ constraint on the coefficient vector. This results in selection of a parsimonious model with similar predictive performance to the full model. Thanks to its similar form to the original lasso problem for GLMs, our procedure can benefit from available $l_1$-regularization path algorithms. Simulation studies and real-data examples confirm the efficiency of our method in terms of predictive performance on future observations."
regression  lasso  variable_selection  sparsity  information_theory  statistics 
september 2010 by cshalizi
"Partial Generalized Additive Models: An Information-Theoretic Approach for Dealing With Concurvity and Selecting Variables" (Gu, Kenny, Zhu, 2010)
"Scientists [want to know] which covariates are important, and how [they] affect the response variable, rather than just making predictions. ... Generalized additive models (GAMs) are a class of interpretable, multivariate nonparametric regression models which are very useful ... for these purposes, but concurvity among covariates (the nonlinear analogue of collinearity for linear regression) can ... produce unstable or even wrong estimates of the covariates’ functional effects. We develop a new procedure called partial generalized additive models (pGAM), based on mutual information ... Our procedure is similar in spirit to the Gram–Schmidt method for linear least squares. By building a GAM on a selected set of transformed variables, pGAM produces more stable models, selects variables parsimoniously, and provides insight into the nature of concurvity between the covariates by calculating functional dependencies among them. ... R code for fitting pGAMs is available online"
regression  additive_models  information_theory  variable_selection  statistics 
september 2010 by cshalizi
Penalized regression with correlation-based penalty
But do I _want_ to exclude _all_ of a bundle of correlated input variables from my regression? Surely it'd be better to include just _one_ of them...
regression  variable_selection  statistics 
june 2009 by cshalizi
[0906.3590] Forest Garrote
We have got to do something about the nams of techniques in this area. I don't mind the whimsy, it's just that combinations like this don't work, metaphorically.
ensemble_methods  classifiers  statistics  machine_learning  sparsity  variable_selection  lasso 
june 2009 by cshalizi
[0901.3202] Model-Consistent Sparse Estimation through the Bootstrap
"if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection"
lasso  linear_regression  model_selection  variable_selection  bootstrap 
january 2009 by cshalizi

Copy this bookmark:



description:


tags: