cshalizi + variable_selection 21
[0805.1179] Autoregressive Process Modeling via the Lasso Procedure
12 weeks ago by cshalizi
"The Lasso is a popular model selection and estimation procedure for linear models that enjoys nice theoretical properties. In this paper, we study the Lasso estimator for fitting autoregressive time series models. We adopt a double asymptotic framework where the maximal lag may increase with the sample size. We derive theoretical results establishing various types of consistency. In particular, we derive conditions under which the Lasso estimator for the autoregressive coefficients is model selection consistent, estimation consistent and prediction consistent. Simulation study results are reported."
to:NB
time_series
statistics
lasso
sparsity
variable_selection
kith_and_kin
heard_the_talk
rinaldo.alessandro
nardi.yuval
12 weeks ago by cshalizi
[1102.3616] Tight conditions for consistent variable selection in high dimensional nonparametric regression
february 2012 by cshalizi
"We address the issue of variable selection in the regression model with very high ambient dimension, i.e., when the number of covariates is very large. The main focus is on the situation where the number of relevant covariates, called intrinsic dimension, is much smaller than the ambient dimension. Without assuming any parametric form of the underlying regression function, we get tight conditions making it possible to consistently estimate the set of relevant variables. These conditions relate the intrinsic dimension to the ambient dimension and to the sample size. The procedure that is provably consistent under these tight conditions is simple and is based on comparing the empirical Fourier coefficients with an appropriately chosen threshold value."
in_NB
regression
variable_selection
nonparametrics
statistics
february 2012 by cshalizi
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
february 2012 by cshalizi
"We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature−instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples."
in_NB
information_theory
statistics
variable_selection
model_selection
to_teach:data-mining
to:blog
machine_learning
classifiers
have_read
graphical_models
february 2012 by cshalizi
Nonparametric estimation of the link function including variable selection - Gerhard Tutz and Sebastian Petry - Statistics and Computing, Volume 22, Number 2
december 2011 by cshalizi
"Nonparametric methods for the estimation of the link function in generalized linear models are able to avoid bias in the regression parameters. But for the estimation of the link typically the full model, which includes all predictors, has been used. When the number of predictors is large these methods fail since the full model cannot be estimated. In the present article a boosting type method is proposed that simultaneously selects predictors and estimates the link function. The method performs quite well in simulations and real data examples." (The "to teach" tag is conjectural.)
in_NB
regression
variable_selection
statistics
nonparametrics
to_read
to_teach:undergrad-ADA
december 2011 by cshalizi
Variance estimation using refitted cross-validation in ultrahigh dimensional regression - Fan - 2011 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library
october 2011 by cshalizi
"Variance estimation is a fundamental problem in statistical modelling. In ultrahigh dimensional linear regression where the dimensionality is much larger than the sample size, traditional variance estimation techniques are not applicable. Recent advances in variable selection in ultrahigh dimensional linear regression make this problem accessible. One of the major problems in ultrahigh dimensional regression is the high spurious correlation between the unobserved realized noise and some of the predictors. As a result, the realized noises are actually predicted when extra irrelevant variables are selected, leading to a serious underestimate of the level of noise. We propose a two-stage refitted procedure via a data splitting technique, called refitted cross-validation, to attenuate the influence of irrelevant variables with high spurious correlations. Our asymptotic results show that the resulting procedure performs as well as the oracle estimator, which knows in advance the mean regression function. The simulation studies lend further support to our theoretical claims. The naive two-stage estimator and the plug-in one-stage estimators using the lasso and smoothly clipped absolute deviation are also studied and compared. Their performances can be improved by the refitted cross-validation method proposed."
statistics
regression
variable_selection
cross-validation
estimation
to:NB
fan.jianqing
october 2011 by cshalizi
[1106.5242] High Dimensional Sparse Econometric Models: An Introduction
june 2011 by cshalizi
I love how they just flat-out identify "econometrics" with "linear regression with Gaussian noise"; but it looks like a clean exposition with proofs.
regression
lasso
variable_selection
econometrics
june 2011 by cshalizi
[1009.2302] The Predictive Lasso
september 2010 by cshalizi
"We propose a shrinkage procedure for simultaneous variable selection and estimation in generalized linear models (GLMs) with an explicit predictive motivation. The procedure estimates the coefficients by minimizing the Kullback-Leibler divergence of a set of predictive distributions to the corresponding predictive distributions for the full model, subject to an $l_1$ constraint on the coefficient vector. This results in selection of a parsimonious model with similar predictive performance to the full model. Thanks to its similar form to the original lasso problem for GLMs, our procedure can benefit from available $l_1$-regularization path algorithms. Simulation studies and real-data examples confirm the efficiency of our method in terms of predictive performance on future observations."
regression
lasso
variable_selection
sparsity
information_theory
statistics
september 2010 by cshalizi
"Partial Generalized Additive Models: An Information-Theoretic Approach for Dealing With Concurvity and Selecting Variables" (Gu, Kenny, Zhu, 2010)
september 2010 by cshalizi
"Scientists [want to know] which covariates are important, and how [they] affect the response variable, rather than just making predictions. ... Generalized additive models (GAMs) are a class of interpretable, multivariate nonparametric regression models which are very useful ... for these purposes, but concurvity among covariates (the nonlinear analogue of collinearity for linear regression) can ... produce unstable or even wrong estimates of the covariates’ functional effects. We develop a new procedure called partial generalized additive models (pGAM), based on mutual information ... Our procedure is similar in spirit to the Gram–Schmidt method for linear least squares. By building a GAM on a selected set of transformed variables, pGAM produces more stable models, selects variables parsimoniously, and provides insight into the nature of concurvity between the covariates by calculating functional dependencies among them. ... R code for fitting pGAMs is available online"
regression
additive_models
information_theory
variable_selection
statistics
september 2010 by cshalizi
Penalized regression with correlation-based penalty
june 2009 by cshalizi
But do I _want_ to exclude _all_ of a bundle of correlated input variables from my regression? Surely it'd be better to include just _one_ of them...
regression
variable_selection
statistics
june 2009 by cshalizi
[0906.3590] Forest Garrote
june 2009 by cshalizi
We have got to do something about the nams of techniques in this area. I don't mind the whimsy, it's just that combinations like this don't work, metaphorically.
ensemble_methods
classifiers
statistics
machine_learning
sparsity
variable_selection
lasso
june 2009 by cshalizi
[0901.3202] Model-Consistent Sparse Estimation through the Bootstrap
january 2009 by cshalizi
"if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection"
lasso
linear_regression
model_selection
variable_selection
bootstrap
january 2009 by cshalizi
related tags
additive_models ⊕ books:recommended ⊕ boosting ⊕ bootstrap ⊕ buhlmann.peter ⊕ candes.emanuel ⊕ causal_inference ⊕ classifiers ⊕ cross-validation ⊕ econometrics ⊕ ensemble_methods ⊕ estimation ⊕ fan.jianqing ⊕ feature_selection ⊕ graphical_models ⊕ have_read ⊕ heard_the_talk ⊕ information_theory ⊕ in_NB ⊕ kernel_methods ⊕ kith_and_kin ⊕ lasso ⊕ linear_regression ⊕ machine_learning ⊕ model_selection ⊕ nardi.yuval ⊕ natural_language_processing ⊕ nonparametrics ⊕ regression ⊕ rinaldo.alessandro ⊕ sparsity ⊕ statistics ⊕ tao.terence ⊕ time_series ⊕ to:blog ⊕ to:NB ⊕ to_read ⊕ to_teach:data-mining ⊕ to_teach:undergrad-ADA ⊕ van_de_geer.sara ⊕ variable_selection ⊖Copy this bookmark: