cshalizi + model_selection   101

Consistent Model Selection Criteria on High Dimensions
"Asymptotic properties of model selection criteria for high-dimensional regression models are studied where the dimension of covariates is much larger than the sample size. Several sufficient conditions for model selection consistency are provided. Non-Gaussian error distributions are considered and it is shown that the maximal number of covariates for model selection consistency depends on the tail behavior of the error distribution. Also, sufficient conditions for model selection consistency are given when the variance of the noise is neither known nor estimated consistently. Results of simulation studies as well as real data analysis are given to illustrate that finite sample performances of consistent model selection criteria can be quite different."
to:NB  model_selection  statistics  high-dimensional_probability 
25 days ago by cshalizi
Xu , McLeod : Further asymptotic properties of the generalized information criterion
"Asymptotic properties of the generalized information criterion for model selection are examined and new conditions under which this criterion is overfitting, consistent, or underfitting are derived."
in_NB  model_selection  information_criteria  statistics 
5 weeks ago by cshalizi
Ockham's Razor: Foundations - Carnegie Mellon Center for Formal Epistemology
Despite my presence on the program, this should actually be really good.

"Scientific theory choice is guided by judgments of simplicity, a bias frequently referred to as "Ockham's Razor". But what is simplicity and how, if at all, does it help science find the truth?  Should we view simple theories as means for obtaining accurate predictions, as classical statisticians recommend?  Or should we believe the theories themselves, as Bayesian methods seem to justify?  The aim of this workshop is to re-examine the foundations of Ockham's razor, with a firm focus on the connections, if any, between simplicity and truth. "
self-promotion  occams_razor  philosophy_of_science  epistemology  kelly.kevin_t.  kith_and_kin  mayo.deborah  vapnik.v.n.  sober.elliott  leeb.hannes  wasserman.larry  model_selection  statistics  complexity  machine_learning  learning_theory  grunwald.peter 
5 weeks ago by cshalizi
[0802.4192] Maxisets for Model Selection
"We address the statistical issue of determining the maximal spaces (maxisets) where model selection procedures attain a given rate of convergence. By considering first general dictionaries, then orthonormal bases, we characterize these maxisets in terms of approximation spaces. These results are illustrated by classical choices of wavelet model collections. For each of them, the maxisets are described in terms of functional spaces. We take a special care of the issue of calculability and measure the induced loss of performance in terms of maxisets."
in_NB  statistics  model_selection  approximation 
6 weeks ago by cshalizi
[0803.2963] Consistency of cross validation for comparing regression procedures
"Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property."
to:NB  statistics  to_read  cross-validation  model_selection  nonparametrics  to_teach:undergrad-ADA  re:stacs 
11 weeks ago by cshalizi
[0806.4140] Optimal oracle inequalities for model selection
"Model selection is often performed by empirical risk minimization. The quality of selection in a given situation can be assessed by risk bounds, which require assumptions both on the margin and the tails of the losses used. Starting with examples from the 3 basic estimation problems, regression, classification and density estimation, we formulate risk bounds for empirical risk minimization under successively weakening conditions and prove them at a very general level, for general margin and power tail behavior of the excess losses."
in_NB  statistics  learning_theory  cross-validation  model_selection  van_de_geer.sara 
12 weeks ago by cshalizi
[0810.5288] Aggregation of penalized empirical risk minimizers in regression
"We give a general result concerning the rates of convergence of penalized empirical risk minimizers (PERM) in the regression model. Then, we consider the problem of agnostic learning of the regression, and give in this context an oracle inequality and a lower bound for PERM over a finite class. These results hold for a general multivariate random design, the only assumption being the compactness of the support of its law (allowing discrete distributions for instance). Then, using these results, we construct adaptive estimators. We consider as examples adaptive estimation over anisotropic Besov spaces or reproductive kernel Hilbert spaces. Finally, we provide an empirical evidence that aggregation leads to more stable estimators than more standard cross-validation or generalized cross-validation methods for the selection of the smoothing parameter, when the number of observation is small."
to:NB  statistics  ensemble_methods  model_selection 
12 weeks ago by cshalizi
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
"We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature−instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples."
in_NB  information_theory  statistics  variable_selection  model_selection  to_teach:data-mining  to:blog  machine_learning  classifiers  have_read  graphical_models 
february 2012 by cshalizi
Model Selection in Kernel Based Regression using the Influence Function
"Recent results about the robustness of kernel methods involve the analysis of influence functions. By definition the influence function is closely related to leave-one-out criteria. In statistical learning, the latter is often used to assess the generalization of a method. In statistics, the influence function is used in a similar way to analyze the statistical efficiency of a method. Links between both worlds are explored. The influence function is related to the first term of a Taylor expansion. Higher order influence functions are calculated. A recursive relation between these terms is found characterizing the full Taylor expansion. It is shown how to evaluate influence functions at a specific sample distribution to obtain an approximation of the leave-one-out error. A specific implementation is proposed using a L1 loss in the selection of the hyperparameters and a Huber loss in the estimation procedure. The parameter in the Huber loss controlling the degree of robustness is optimized as well. The resulting procedure gives good results, even when outliers are present in the data."
to:NB  statistics  regression  kernel_estimators  model_selection  robustness  nonparametrics  cross-validation 
february 2012 by cshalizi
The Asymmetric Business Cycle
"The business cycle is a fundamental yet elusive concept in macroeconomics. In this paper, we consider the problem of measuring the business cycle. First, we argue for the output-gap view that the business cycle corresponds to transitory deviations in economic activity away from a permanent, or trend, level. Then we investigate the extent to which a general model-based approach to estimating trend and cycle for the U.S. economy leads to measures of the business cycle that reflect models versus the data. We find empirical support for a nonlinear time series model that produces a business cycle measure with an asymmetric shape across NBER expansion and recession phases. Specifically, this business cycle measure suggests that recessions are periods of relatively large and negative transitory fluctuations in output. However, several close competitors to the nonlinear model produce business cycle measures of widely differing shapes and magnitudes. Given this model-based uncertainty, we construct a model-averaged measure of the business cycle. This measure also displays an asymmetric shape and is closely related to other measures of economic slack such as the unemployment rate and capacity utilization."
--- Worthy, but at the same time makes me want to lock them in a room with a copy of Li and Racine's _Nonparametric Econometrics_, or even _The Elements of Statistical Learning_, and not let them out until they understand it.
in_NB  time_series  statistics  economics  macroeconomics  inference_to_latent_objects  re:your_favorite_dsge_sucks  morley.james  have_read  ensemble_methods  model_selection 
february 2012 by cshalizi
Clements , Schoenberg , Schorlemmer : Residual analysis methods for space–time point processes with applications to earthquake forecast models in California
"Modern, powerful techniques for the residual analysis of spatial-temporal point process models are reviewed and compared. These methods are applied to California earthquake forecast models used in the Collaboratory for the Study of Earthquake Predictability (CSEP). Assessments of these earthquake forecasting models have previously been performed using simple, low-power means such as the L-test and N-test. We instead propose residual methods based on rescaling, thinning, superposition, weighted K-functions and deviance residuals. Rescaled residuals can be useful for assessing the overall fit of a model, but as with thinning and superposition, rescaling is generally impractical when the conditional intensity λ is volatile. While residual thinning and superposition may be useful for identifying spatial locations where a model fits poorly, these methods have limited power when the modeled conditional intensity assumes extremely low or high values somewhere in the observation region, and this is commonly the case for earthquake forecasting models. A recently proposed hybrid method of thinning and superposition, called super-thinning, is a more powerful alternative. The weighted K-function is powerful for evaluating the degree of clustering or inhibition in a model. Competing models are also compared using pixel-based approaches, such as Pearson residuals and deviance residuals. The different residual analysis techniques are demonstrated using the CSEP models and are used to highlight certain deficiencies in the models, such as the overprediction of seismicity in inter-fault zones for the model proposed by Helmstetter, Kagan and Jackson [Seismological Research Letters 78 (2007) 78–86], the underprediction of the model proposed by Kagan, Jackson and Rong [Seismological Research Letters 78 (2007) 94–98] in forecasting seismicity around the Imperial, Laguna Salada, and Panamint clusters, and the underprediction of the model proposed by Shen, Jackson and Kagan [Seismological Research Letters 78 (2007) 116–120] in forecasting seismicity around the Laguna Salada, Baja, and Panamint clusters."
to:NB  point_processes  spatial_statistics  time_series  statistics  model_selection  model-checking  prediction  earthquakes  geology 
december 2011 by cshalizi
Shen , Welch , Hughes-Oliver : Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery
"Cross-validation (CV) is widely used for tuning a model with respect to user-selected parameters and for selecting a “best” model. For example, the method of k-nearest neighbors requires the user to choose k, the number of neighbors, and a neural network has several tuning parameters controlling the network complexity. Once such parameters are optimized for a particular data set, the next step is often to compare the various optimized models and choose the method with the best predictive performance. Both tuning and model selection boil down to comparing models, either across different values of the tuning parameters or across different classes of statistical models and/or sets of explanatory variables. For multiple large sets of data, like the PubChem drug discovery cheminformatics data which motivated this work, reliable CV comparisons are computationally demanding, or even infeasible. In this paper we develop an efficient sequential methodology for model comparison based on CV. It also takes into account the randomness in CV. The number of models is reduced via an adaptive, multiplicity-adjusted sequential algorithm, where poor performers are quickly eliminated. By exploiting matching of individual observations, it is sometimes even possible to establish the statistically significant inferiority of some models with just one execution of CV."
in_NB  model_selection  statistics  cross-validation  machine_learning 
december 2011 by cshalizi
[1111.0559] Model Selection in Undirected Graphical Models with the Elastic Net
"Structure learning in random fields has attracted considerable attention due to its difficulty and importance in areas such as remote sensing, computational biology, natural language processing, protein networks, and social network analysis. We consider the problem of estimating the probabilistic graph structure associated with a Gaussian Markov Random Field (GMRF), the Ising model and the Potts model, by extending previous work on $l_1$ regularized neighborhood estimation to include the elastic net $l_1+l_2$ penalty. Additionally, we show numerical evidence that the edge density plays a role in the graph recovery process. Finally, we introduce a novel method for augmenting neighborhood estimation by leveraging pair-wise neighborhood union estimates."
in_NB  graphical_models  model_selection  lasso  sparsity 
november 2011 by cshalizi
Liu , Yang : Parametric or nonparametric? A parametricness index for model selection
"In model selection literature, two classes of criteria perform well asymptotically in different situations: Bayesian information criterion (BIC) (as a representative) is consistent in selection when the true model is finite dimensional (parametric scenario); Akaike’s information criterion (AIC) performs well in an asymptotic efficiency when the true model is infinite dimensional (nonparametric scenario). But there is little work that addresses if it is possible and how to detect the situation that a specific model selection problem is in. In this work, we differentiate the two scenarios theoretically under some conditions. We develop a measure, parametricness index (PI), to assess whether a model selected by a potentially consistent procedure can be practically treated as the true model, which also hints on AIC or BIC is better suited for the data for the goal of estimating the regression function. A consequence is that by switching between AIC and BIC based on the PI, the resulting regression estimator is simultaneously asymptotically efficient for both parametric and nonparametric scenarios. In addition, we systematically investigate the behaviors of PI in simulation and real data and show its usefulness."
to:NB  model_selection  statistics  nonparametrics  information_criteria 
october 2011 by cshalizi
[1110.4944] Is your phylogeny informative? Measuring the power of comparative methods
"Phylogenetic comparative methods may fail to produce meaningful results when either the underlying model is inappropriate or the data contain insufficient information to inform the inference. The ability to measure the statistical power of these methods has become crucial to ensure that data quantity keeps pace with growing model complexity. Through simulations, we show that commonly applied model choice methods based on information criteria can have remarkably high error rates; this can be a problem because methods to estimate the uncertainty or power are not widely known or applied. Furthermore, the power of comparative methods can depend significantly on the structure of the data. We describe a Monte Carlo based method which addresses both of these challenges, and show how this approach both quantifies and substantially reduces errors relative to information criteria. The method also produces meaningful confidence intervals for model parameters. We illustrate how the power to distinguish different models, such as varying levels of selection, varies both with number of taxa and structure of the phylogeny. We provide an open-source implementation in the pmc ("Phylogenetic Monte Carlo") package for the R programming language. We hope such power analysis becomes a routine part of model comparison in comparative methods."
to:NB  statistics  evolutionary_biology  comparative_methods  model_selection  information_criteria 
october 2011 by cshalizi
[1110.4700] Relevant statistics for Bayesian model choice
"The choice of the summary statistics in Bayesian inference and in particular in ABC algorithms is paramount to produce a valid outcome. We derive necessary and sufficient conditions on those statistics for the corresponding Bayes factor to be convergent, namely to asymptotically select the true model. Those conditions which amount to the means of the summary statistics to asymptotically differ under both models are then usable in ABC settings to determine which summary statistics are appropriate, most generally via a standard Monte Carlo validation."
to:NB  statistics  model_selection  approximate_bayesian_computation  indirect_inference 
october 2011 by cshalizi
[1110.3860] Contending Parties: A Logistic Choice Analysis of Inter- and Intra-group Blog Citation Dynamics in the 2004 US Presidential Election
"The 2004 US Presidential Election cycle marked the debut of Internet-based media such as blogs and social networking websites as institutionally recognized features of the American political landscape. Using a longitudinal sample of all DNC/RNC-designated blog-citation networks we are able to test the influence of various strategic, institutional, and balance-theoretic mechanisms and exogenous factors such as seasonality and political events on the propensity of blogs to cite one another over time. Capitalizing on the temporal resolution of our data, we utilize an autoregressive network regression framework to carry out inference for a logistic choice process. Using a combination of deviance-based model selection criteria and simulation-based model adequacy tests, we identify the combination of processes that best characterizes the choice behavior of the contending blogs."
to:NB  network_data_analysis  blogs  us_politics  model_selection  simulation 
october 2011 by cshalizi
Reality Checks and Comparisons of Nested Predictive Models - Journal of Business and Economic Statistics - 0(0):1
"This article develops a simple bootstrap method for simulating asymptotic critical values for tests of equal forecast accuracy and encompassing among many nested models. Our method combines elements of fixed regressor and wild bootstraps. We first derive the asymptotic distributions of tests of equal forecast accuracy and encompassing applied to forecasts from multiple models that nest the benchmark model—that is, reality check tests. We then prove the validity of the bootstrap for these tests. Monte Carlo experiments indicate that our proposed bootstrap has better finite-sample size and power than other methods designed for comparison of nonnested models."
statistics  model_checking  model_selection  time_series  bootstrap  to_read  to_teach:undergrad-ADA  encompassing 
september 2011 by cshalizi
[1107.0189] The Lasso, correlated design, and improved oracle inequalities
"We study high-dimensional linear models and the $\ell_1$-penalized least squares estimator, also known as the Lasso estimator. In literature, oracle inequalities have been derived under restricted eigenvalue or compatibility conditions. In this paper, we complement this with entropy conditions which allow one to improve the dual norm bound, and demonstrate how this leads to new oracle inequalities. The new oracle inequalities show that a smaller choice for the tuning parameter and a trade-off between $\ell_1$-norms and small compatibility constants are possible. This implies, in particular for correlated design, improved bounds for the prediction error of the Lasso estimator as compared to the methods based on restricted eigenvalue or compatibility conditions only."
lasso  regression  model_selection  van_de_geer.sara 
july 2011 by cshalizi
[1012.3795] Estimating Networks With Jumps
"We study the problem of estimating a temporally varying coefficient and varying structure (VCVS) graphical model underlying nonstationary time series data, such as social states of interacting individuals or microarray expression profiles of gene networks, as opposed to i.i.d. data from an invariant model widely considered in current literature of structural estimation. In particular, we consider the scenario in which the model evolves in a piece-wise constant fashion. We propose a procedure that minimizes the so-called TESLA loss (i.e., temporally smoothed L1 regularized regression), which allows jointly estimating the partition boundaries of the VCVS model and the coefficient of the sparse precision matrix on each block of the partition. "
graphical_models  network_data_analysis  time_series  model_selection  statistics  xing.eric 
december 2010 by cshalizi
On the behaviour of marginal and conditional AIC in linear mixed models — Biometrika
"In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion, AIC, have been used, based either on the marginal or on the conditional distribution. We show that the marginal AIC is not an asymptotically unbiased estimator of the Akaike information, and favours smaller models without random effects. For the conditional AIC, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that can lead to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional AIC, which avoids the high computational cost and imprecision of available numerical approximations. ... All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia."
regression  model_selection  information_criteria  statistics 
december 2010 by cshalizi
[1011.3396] PAC-Bayesian aggregation and multi-armed bandits
"This habilitation thesis presents several contributions to (1) the PAC-Bayesian analysis of statistical learning, (2) the three aggregation problems: given d functions, how to predict as well as (i) the best of these d functions (model selection type aggregation), (ii) the best convex combination of these d functions, (iii) the best linear combination of these d functions, (3) the multi-armed bandit problems."
statistics  learning_theory  pac-bayesian  model_selection  ensemble_methods  to:NB 
november 2010 by cshalizi
[1010.6202] Sequential Data-Adaptive Bandwidth Selection by Cross-Validation for Nonparametric Prediction
"We consider the problem of bandwidth selection by cross-validation from a sequential point of view in a nonparametric regression model. Having in mind that in applications one often aims at estimation, prediction and change detection simultaneously, we investigate that approach for sequential kernel smoothers in order to base these tasks on a single statistic. We provide uniform weak laws of large numbers and weak consistency results for the cross-validated bandwidth. Extensions to weakly dependent error terms are discussed as well. The errors may be {\alpha}-mixing or L2-near epoch dependent, which guarantees that the uniform convergence of the cross validation sum and the consistency of the cross-validated bandwidth hold true for a large class of time series. The method is illustrated by analyzing photovoltaic data."
cross-validation  prediction  time_series  model_selection  to_read 
november 2010 by cshalizi
[1007.3230] Selecting an exponential random graph model for complex brain networks
Shorter authors: What do you know, all that stuff about how to fit ERGMs to social networks totally works for brain networks too. (I mock, but I'd be flabbergasted if it didn't, if only because there is so little _social_ content in the ERGM formalism...) Also: yay model checking!
neural_data_analysis  network_data_analysis  exponential_family_random_graphs  model_selection  statistics  model-checking  to_read  re:functional_communities  re:stacs 
july 2010 by cshalizi
[1005.5483] Model Selection Principles in Misspecified Models
So-so.  Suspect that most of these results are actually in Claeskens and Hjort's book, but am insufficiently motivated to check.
model_selection  misspecification  statistics  re:phil-of-bayes_paper  have_read 
june 2010 by cshalizi
Ehm, Kornmeier, Heinrich: Multiple testing along a tree
"Suitable sequentially rejective multiple test procedures allow to “zoom in" on clusters of relevant variables in high-dimensional regression (Meinshausen [7]), or on regions of interest in some search space (Heinrich et al. [3]; Meinshausen et al. [8]). As a common framework for these schemes we propose to consider multiple testing along a tree of hypotheses together with a “keep rejecting until first acceptance" rule. Particular topics addressed in this note are control of the familywise error, and some variants and basic properties of the procedure."
multiple_testing  hypothesis_testing  model_selection  re:AoS_project  to_read 
may 2010 by cshalizi
10-705 Intermediate Statistics, Fall 2009
Larry's version of the typical masters-level course based on Casella and Berger. Note: half of what he covers is not in Casella and Berger. (For example, he starts with VC theory!)
learning_theory  statistics  estimation  hypothesis_testing  prediction  minimax  bootstrap  model_selection  regression  classifiers  confidence_sets  wasserman.larry  kith_and_kin 
april 2010 by cshalizi
[1004.2287] An empirical comparative study of approximate methods for binary graphical models; application to the search of associations among causes of death in French death certificates
"Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. Following the ideas of the LASSO procedure designed for the linear regression framework, recent developments dealing with graphical model selection have been based on $\ell_1$-penalization. In the binary case, however, exact inference is generally very slow or even intractable because of [the] log-partition function. Various approximate methods have recently been proposed in the literature ... Through an extensive simulation study, we show that a simple modification of a method relying on a Gaussian approximation achieves good performance and is very fast. We present a real application in which we search for associations among causes of death recorded on French death certificates."
graphical_models  lasso  model_selection 
april 2010 by cshalizi
[1004.2304] Spatio-Temporal Graphical Model Selection
"We consider the problem of estimating the topology of spatial interactions in a discrete state, discrete time spatio-temporal graphical model where the interactions affect the temporal evolution of each agent in a network. Among other models, the susceptible, infected, recovered ($SIR$) model for interaction events fall into this framework. We pose the problem as a structure learning problem and solve it using an $\ell_1$-penalized likelihood convex program. We evaluate the solution on a simulated spread of infectious over a complex network. Our topology estimates outperform those of a standard spatial Markov random field graphical model selection using $\ell_1$-regularized logistic regression."
graphical_models  random_fields  lasso  model_selection 
april 2010 by cshalizi
Verzelen: Adaptive estimation of stationary Gaussian fields
"We study the nonparametric covariance estimation of a stationary Gaussian field X observed on a regular lattice. In the time series setting, some procedures ... achieve optimal model selection among autoregressive models. ... no such equivalent results of adaptivity in a spatial setting. By considering collections of Gaussian Markov random fields (GMRF) as approximation sets for the distribution of X, we introduce a novel model selection procedure for spatial fields. For all neighborhoods m in a given collection , this procedure first amounts to computing a covariance estimator of X within the GMRFs of neighborhood m. Then it selects a neighborhood ̂m by applying a penalization strategy. The so-defined method satisfies a nonasymptotic oracle-type inequality. If X is a GMRF, the procedure is also minimax adaptive to the sparsity of its neighborhood. More generally, the procedure is adaptive to the rate of approximation of the true distribution by GMRFs with growing neighborhoods."
spatial_statistics  model_selection  statistics  stochastic_processes  random_fields  statistical_inference_for_stochastic_processes 
march 2010 by cshalizi
[0903.3620] Reconciling Model Selection and Prediction
"It is known that there is a dichotomy in the performance of model selectors. Those that are consistent (having the "oracle property") do not achieve the asymptotic minimax rate for prediction error. We look at this phenomenon closely, and argue that the set of parameters on which this dichotomy occurs is extreme, even pathological, and should not be considered when evaluating model selectors. We characterize this set, and show that, when such parameters are dismissed from consideration, consistency and asymptotic minimaxity can be attained simultaneously."
model_selection  statistics  minimax  regression  have_read  prediction 
december 2009 by cshalizi
[0712.0881] On the "degrees of freedom" of the lasso
Reading the abstract and introduction makes me feel al of a sudden like I don't, in fact, understand the concept of "degrees of freedom". (I mean, I understand it in mechanics!)
lasso  regression  sparsity  degrees_of_freedom  statistics  estimation  model_selection  to_read 
august 2009 by cshalizi
Challenges for Econometric Model Selection
"Standard econometric model selection methods are based on four fundamental errors in approach: parametric vision, the assumption of a true DGP, evaluation based on fit, and ignoring the impact of model uncertainty on inference. Instead, econometric model selection methods should be based on a semiparametric vision, models should be viewed as approximations, models should be evaluated based on their purpose, and model uncertainty should be incorporated into inference methods. These problems have been examined individually, but not jointly, and my view is that future research into econometric model selection should attempt to address all four issues. "
model_selection  econometrics  statistics  nonparametrics  have_read  hansen.bruce 
june 2009 by cshalizi
[0711.1036v2] Confidence Sets Based on Sparse Estimators Are Necessarily Large
"Confidence sets based on sparse estimators are shown to be large compared to more standard confidence sets, demonstrating that sparsity of an estimator comes at a substantial price in terms of the quality of the estimator. The results are set in a general parametric or semiparametric framework."
sparsity  confidence_sets  model_selection  statistics  to:NB  via:shivak 
may 2009 by cshalizi
[0901.3202] Model-Consistent Sparse Estimation through the Bootstrap
"if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection"
lasso  linear_regression  model_selection  variable_selection  bootstrap 
january 2009 by cshalizi
[0901.1925] Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems
This "approximate Bayesian computation" scheme sounds like a version of indirect inference, only slower and vulnerable to a bad choice of prior...
time_series  model_selection  statistics  particle_filters  to_read  statistical_inference_for_stochastic_processes 
january 2009 by cshalizi
Asymptotically minimax regret procedures in regression model selection and the magnitude of the dimension penalty
Claims AIC is asymptotically minimax (approximately). Not sure this would actually apply to anything I'd ever be interested in, given the assumptions they have to load on.
model_selection  information_criteria  regression  to_read  via:kevin_kelly 
october 2008 by cshalizi
« earlier      

related tags

approximate_bayesian_computation  approximation  arlot.sylvain  bayesianism  bayesian_nonparametrics  bioinformatics  blanchard.gilles  blei.david  blogs  books:noted  boosting  bootstrap  bousquet.olivier  buhlmann.peter  clarke.kevin  classifiers  clustering  comparative_methods  complexity  computational_statistics  concentration_of_measure  confidence_sets  contingency_tables  cross-validation  data_mining  degrees_of_freedom  density_estimation  deviation_inequalities  dimension_reduction  dudoit.sandrine  earthquakes  econometrics  economics  empirical_processes  encompassing  ensemble_methods  epidemic_models  epistemology  estimation  evolution  evolutionary_biology  evolutionary_optimization  exponential_family_random_graphs  factor_analysis  feature_selection  geology  graphical_models  grunwald.peter  hansen.bruce  have_read  heard_the_talk  hierarchical_models  high-dimensional_probability  hypothesis_testing  indirect_inference  inference_to_latent_objects  information_criteria  information_theory  in_NB  ising_model  k-means  kelly.kevin_t.  kernel_estimators  kernel_methods  kith_and_kin  lafferty.john  lasso  latent_variables  law_of_the_iterated_logarithm  leamer.ed  learning_theory  leeb.hannes  linear_regression  linguistics  liu.han  machine_learning  macroeconomics  markov_models  massart.pascal  mayo.deborah  meinshausen.nicolai  minimax  misspecification  mixing  mixture_models  mizon.grayham  model-checking  model_checking  model_search  model_selection  morley.james  multiple_testing  natural_language_processing  networks  network_data_analysis  neural_data_analysis  neural_networks  nonparametrics  occams_razor  oracle_inequalities  pac-bayesian  particle_filters  philosophy_of_science  point_processes  prediction  principal_components  random_fields  ravikumar.pradeep  re:AoS_project  re:functional_communities  re:phil-of-bayes_paper  re:social-networks-as-sensor-networks  re:stacs  re:XV_for_mixing  re:XV_for_networks  re:your_favorite_dsge_sucks  regression  resampling  richard.jean-francois  ridge_regression  ripley.brian  robustness  self-promotion  simulation  sober.elliott  sparsity  spatial_statistics  stability_of_learning  statistical_inference_for_stochastic_processes  statistics  stochastic_processes  support_vector_machines  tibshirani.robert  tibshirani.ryan  time_series  tishby.naftali  to:blog  to:NB  to_read  to_teach:data-mining  to_teach:undergrad-ADA  track_down_references  us_politics  van_de_geer.sara  van_handel.ramon  vapnik.v.n.  variable-length_markov_models  variable_selection  via:crooked_timber  via:gelman  via:kevin_kelly  via:matthew_berryman  via:shivak  wahba.grace  wainwright.martin  wasserman.larry  xing.eric 

Copy this bookmark:



description:


tags: