cshalizi + model_selection 101
Consistent Model Selection Criteria on High Dimensions
25 days ago by cshalizi
"Asymptotic properties of model selection criteria for high-dimensional regression models are studied where the dimension of covariates is much larger than the sample size. Several sufficient conditions for model selection consistency are provided. Non-Gaussian error distributions are considered and it is shown that the maximal number of covariates for model selection consistency depends on the tail behavior of the error distribution. Also, sufficient conditions for model selection consistency are given when the variance of the noise is neither known nor estimated consistently. Results of simulation studies as well as real data analysis are given to illustrate that finite sample performances of consistent model selection criteria can be quite different."
to:NB
model_selection
statistics
high-dimensional_probability
25 days ago by cshalizi
Xu , McLeod : Further asymptotic properties of the generalized information criterion
5 weeks ago by cshalizi
"Asymptotic properties of the generalized information criterion for model selection are examined and new conditions under which this criterion is overfitting, consistent, or underfitting are derived."
in_NB
model_selection
information_criteria
statistics
5 weeks ago by cshalizi
Ockham's Razor: Foundations - Carnegie Mellon Center for Formal Epistemology
5 weeks ago by cshalizi
Despite my presence on the program, this should actually be really good.
"Scientific theory choice is guided by judgments of simplicity, a bias frequently referred to as "Ockham's Razor". But what is simplicity and how, if at all, does it help science find the truth? Should we view simple theories as means for obtaining accurate predictions, as classical statisticians recommend? Or should we believe the theories themselves, as Bayesian methods seem to justify? The aim of this workshop is to re-examine the foundations of Ockham's razor, with a firm focus on the connections, if any, between simplicity and truth. "
self-promotion
occams_razor
philosophy_of_science
epistemology
kelly.kevin_t.
kith_and_kin
mayo.deborah
vapnik.v.n.
sober.elliott
leeb.hannes
wasserman.larry
model_selection
statistics
complexity
machine_learning
learning_theory
grunwald.peter
"Scientific theory choice is guided by judgments of simplicity, a bias frequently referred to as "Ockham's Razor". But what is simplicity and how, if at all, does it help science find the truth? Should we view simple theories as means for obtaining accurate predictions, as classical statisticians recommend? Or should we believe the theories themselves, as Bayesian methods seem to justify? The aim of this workshop is to re-examine the foundations of Ockham's razor, with a firm focus on the connections, if any, between simplicity and truth. "
5 weeks ago by cshalizi
[0802.4192] Maxisets for Model Selection
6 weeks ago by cshalizi
"We address the statistical issue of determining the maximal spaces (maxisets) where model selection procedures attain a given rate of convergence. By considering first general dictionaries, then orthonormal bases, we characterize these maxisets in terms of approximation spaces. These results are illustrated by classical choices of wavelet model collections. For each of them, the maxisets are described in terms of functional spaces. We take a special care of the issue of calculability and measure the induced loss of performance in terms of maxisets."
in_NB
statistics
model_selection
approximation
6 weeks ago by cshalizi
[0803.2963] Consistency of cross validation for comparing regression procedures
11 weeks ago by cshalizi
"Theoretical developments on cross validation (CV) have mainly focused on selecting one among a list of finite-dimensional models (e.g., subset or order selection in linear regression) or selecting a smoothing parameter (e.g., bandwidth for kernel smoothing). However, little is known about consistency of cross validation when applied to compare between parametric and nonparametric methods or within nonparametric methods. We show that under some conditions, with an appropriate choice of data splitting ratio, cross validation is consistent in the sense of selecting the better procedure with probability approaching 1. Our results reveal interesting behavior of cross validation. When comparing two models (procedures) converging at the same nonparametric rate, in contrast to the parametric case, it turns out that the proportion of data used for evaluation in CV does not need to be dominating in size. Furthermore, it can even be of a smaller order than the proportion for estimation while not affecting the consistency property."
to:NB
statistics
to_read
cross-validation
model_selection
nonparametrics
to_teach:undergrad-ADA
re:stacs
11 weeks ago by cshalizi
[0806.4140] Optimal oracle inequalities for model selection
12 weeks ago by cshalizi
"Model selection is often performed by empirical risk minimization. The quality of selection in a given situation can be assessed by risk bounds, which require assumptions both on the margin and the tails of the losses used. Starting with examples from the 3 basic estimation problems, regression, classification and density estimation, we formulate risk bounds for empirical risk minimization under successively weakening conditions and prove them at a very general level, for general margin and power tail behavior of the excess losses."
in_NB
statistics
learning_theory
cross-validation
model_selection
van_de_geer.sara
12 weeks ago by cshalizi
[0810.5288] Aggregation of penalized empirical risk minimizers in regression
12 weeks ago by cshalizi
"We give a general result concerning the rates of convergence of penalized empirical risk minimizers (PERM) in the regression model. Then, we consider the problem of agnostic learning of the regression, and give in this context an oracle inequality and a lower bound for PERM over a finite class. These results hold for a general multivariate random design, the only assumption being the compactness of the support of its law (allowing discrete distributions for instance). Then, using these results, we construct adaptive estimators. We consider as examples adaptive estimation over anisotropic Besov spaces or reproductive kernel Hilbert spaces. Finally, we provide an empirical evidence that aggregation leads to more stable estimators than more standard cross-validation or generalized cross-validation methods for the selection of the smoothing parameter, when the number of observation is small."
to:NB
statistics
ensemble_methods
model_selection
12 weeks ago by cshalizi
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
february 2012 by cshalizi
"We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature−instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples."
in_NB
information_theory
statistics
variable_selection
model_selection
to_teach:data-mining
to:blog
machine_learning
classifiers
have_read
graphical_models
february 2012 by cshalizi
Model Selection in Kernel Based Regression using the Influence Function
february 2012 by cshalizi
"Recent results about the robustness of kernel methods involve the analysis of influence functions. By definition the influence function is closely related to leave-one-out criteria. In statistical learning, the latter is often used to assess the generalization of a method. In statistics, the influence function is used in a similar way to analyze the statistical efficiency of a method. Links between both worlds are explored. The influence function is related to the first term of a Taylor expansion. Higher order influence functions are calculated. A recursive relation between these terms is found characterizing the full Taylor expansion. It is shown how to evaluate influence functions at a specific sample distribution to obtain an approximation of the leave-one-out error. A specific implementation is proposed using a L1 loss in the selection of the hyperparameters and a Huber loss in the estimation procedure. The parameter in the Huber loss controlling the degree of robustness is optimized as well. The resulting procedure gives good results, even when outliers are present in the data."
to:NB
statistics
regression
kernel_estimators
model_selection
robustness
nonparametrics
cross-validation
february 2012 by cshalizi
The Asymmetric Business Cycle
february 2012 by cshalizi
"The business cycle is a fundamental yet elusive concept in macroeconomics. In this paper, we consider the problem of measuring the business cycle. First, we argue for the output-gap view that the business cycle corresponds to transitory deviations in economic activity away from a permanent, or trend, level. Then we investigate the extent to which a general model-based approach to estimating trend and cycle for the U.S. economy leads to measures of the business cycle that reflect models versus the data. We find empirical support for a nonlinear time series model that produces a business cycle measure with an asymmetric shape across NBER expansion and recession phases. Specifically, this business cycle measure suggests that recessions are periods of relatively large and negative transitory fluctuations in output. However, several close competitors to the nonlinear model produce business cycle measures of widely differing shapes and magnitudes. Given this model-based uncertainty, we construct a model-averaged measure of the business cycle. This measure also displays an asymmetric shape and is closely related to other measures of economic slack such as the unemployment rate and capacity utilization."
--- Worthy, but at the same time makes me want to lock them in a room with a copy of Li and Racine's _Nonparametric Econometrics_, or even _The Elements of Statistical Learning_, and not let them out until they understand it.
in_NB
time_series
statistics
economics
macroeconomics
inference_to_latent_objects
re:your_favorite_dsge_sucks
morley.james
have_read
ensemble_methods
model_selection
--- Worthy, but at the same time makes me want to lock them in a room with a copy of Li and Racine's _Nonparametric Econometrics_, or even _The Elements of Statistical Learning_, and not let them out until they understand it.
february 2012 by cshalizi
Clements , Schoenberg , Schorlemmer : Residual analysis methods for space–time point processes with applications to earthquake forecast models in California
december 2011 by cshalizi
"Modern, powerful techniques for the residual analysis of spatial-temporal point process models are reviewed and compared. These methods are applied to California earthquake forecast models used in the Collaboratory for the Study of Earthquake Predictability (CSEP). Assessments of these earthquake forecasting models have previously been performed using simple, low-power means such as the L-test and N-test. We instead propose residual methods based on rescaling, thinning, superposition, weighted K-functions and deviance residuals. Rescaled residuals can be useful for assessing the overall fit of a model, but as with thinning and superposition, rescaling is generally impractical when the conditional intensity λ is volatile. While residual thinning and superposition may be useful for identifying spatial locations where a model fits poorly, these methods have limited power when the modeled conditional intensity assumes extremely low or high values somewhere in the observation region, and this is commonly the case for earthquake forecasting models. A recently proposed hybrid method of thinning and superposition, called super-thinning, is a more powerful alternative. The weighted K-function is powerful for evaluating the degree of clustering or inhibition in a model. Competing models are also compared using pixel-based approaches, such as Pearson residuals and deviance residuals. The different residual analysis techniques are demonstrated using the CSEP models and are used to highlight certain deficiencies in the models, such as the overprediction of seismicity in inter-fault zones for the model proposed by Helmstetter, Kagan and Jackson [Seismological Research Letters 78 (2007) 78–86], the underprediction of the model proposed by Kagan, Jackson and Rong [Seismological Research Letters 78 (2007) 94–98] in forecasting seismicity around the Imperial, Laguna Salada, and Panamint clusters, and the underprediction of the model proposed by Shen, Jackson and Kagan [Seismological Research Letters 78 (2007) 116–120] in forecasting seismicity around the Laguna Salada, Baja, and Panamint clusters."
to:NB
point_processes
spatial_statistics
time_series
statistics
model_selection
model-checking
prediction
earthquakes
geology
december 2011 by cshalizi
Shen , Welch , Hughes-Oliver : Efficient, adaptive cross-validation for tuning and comparing models, with application to drug discovery
december 2011 by cshalizi
"Cross-validation (CV) is widely used for tuning a model with respect to user-selected parameters and for selecting a “best” model. For example, the method of k-nearest neighbors requires the user to choose k, the number of neighbors, and a neural network has several tuning parameters controlling the network complexity. Once such parameters are optimized for a particular data set, the next step is often to compare the various optimized models and choose the method with the best predictive performance. Both tuning and model selection boil down to comparing models, either across different values of the tuning parameters or across different classes of statistical models and/or sets of explanatory variables. For multiple large sets of data, like the PubChem drug discovery cheminformatics data which motivated this work, reliable CV comparisons are computationally demanding, or even infeasible. In this paper we develop an efficient sequential methodology for model comparison based on CV. It also takes into account the randomness in CV. The number of models is reduced via an adaptive, multiplicity-adjusted sequential algorithm, where poor performers are quickly eliminated. By exploiting matching of individual observations, it is sometimes even possible to establish the statistically significant inferiority of some models with just one execution of CV."
in_NB
model_selection
statistics
cross-validation
machine_learning
december 2011 by cshalizi
Selecting Amongst Large Classes of Models
december 2011 by cshalizi
Chatty slides from Brian Ripley. Approvable.
in_NB
via:gelman
model_selection
statistics
ripley.brian
cross-validation
information_criteria
december 2011 by cshalizi
[1111.0559] Model Selection in Undirected Graphical Models with the Elastic Net
november 2011 by cshalizi
"Structure learning in random fields has attracted considerable attention due to its difficulty and importance in areas such as remote sensing, computational biology, natural language processing, protein networks, and social network analysis. We consider the problem of estimating the probabilistic graph structure associated with a Gaussian Markov Random Field (GMRF), the Ising model and the Potts model, by extending previous work on $l_1$ regularized neighborhood estimation to include the elastic net $l_1+l_2$ penalty. Additionally, we show numerical evidence that the edge density plays a role in the graph recovery process. Finally, we introduce a novel method for augmenting neighborhood estimation by leveraging pair-wise neighborhood union estimates."
in_NB
graphical_models
model_selection
lasso
sparsity
november 2011 by cshalizi
Liu , Yang : Parametric or nonparametric? A parametricness index for model selection
october 2011 by cshalizi
"In model selection literature, two classes of criteria perform well asymptotically in different situations: Bayesian information criterion (BIC) (as a representative) is consistent in selection when the true model is finite dimensional (parametric scenario); Akaike’s information criterion (AIC) performs well in an asymptotic efficiency when the true model is infinite dimensional (nonparametric scenario). But there is little work that addresses if it is possible and how to detect the situation that a specific model selection problem is in. In this work, we differentiate the two scenarios theoretically under some conditions. We develop a measure, parametricness index (PI), to assess whether a model selected by a potentially consistent procedure can be practically treated as the true model, which also hints on AIC or BIC is better suited for the data for the goal of estimating the regression function. A consequence is that by switching between AIC and BIC based on the PI, the resulting regression estimator is simultaneously asymptotically efficient for both parametric and nonparametric scenarios. In addition, we systematically investigate the behaviors of PI in simulation and real data and show its usefulness."
to:NB
model_selection
statistics
nonparametrics
information_criteria
october 2011 by cshalizi
[1110.4944] Is your phylogeny informative? Measuring the power of comparative methods
october 2011 by cshalizi
"Phylogenetic comparative methods may fail to produce meaningful results when either the underlying model is inappropriate or the data contain insufficient information to inform the inference. The ability to measure the statistical power of these methods has become crucial to ensure that data quantity keeps pace with growing model complexity. Through simulations, we show that commonly applied model choice methods based on information criteria can have remarkably high error rates; this can be a problem because methods to estimate the uncertainty or power are not widely known or applied. Furthermore, the power of comparative methods can depend significantly on the structure of the data. We describe a Monte Carlo based method which addresses both of these challenges, and show how this approach both quantifies and substantially reduces errors relative to information criteria. The method also produces meaningful confidence intervals for model parameters. We illustrate how the power to distinguish different models, such as varying levels of selection, varies both with number of taxa and structure of the phylogeny. We provide an open-source implementation in the pmc ("Phylogenetic Monte Carlo") package for the R programming language. We hope such power analysis becomes a routine part of model comparison in comparative methods."
to:NB
statistics
evolutionary_biology
comparative_methods
model_selection
information_criteria
october 2011 by cshalizi
[1110.4700] Relevant statistics for Bayesian model choice
october 2011 by cshalizi
"The choice of the summary statistics in Bayesian inference and in particular in ABC algorithms is paramount to produce a valid outcome. We derive necessary and sufficient conditions on those statistics for the corresponding Bayes factor to be convergent, namely to asymptotically select the true model. Those conditions which amount to the means of the summary statistics to asymptotically differ under both models are then usable in ABC settings to determine which summary statistics are appropriate, most generally via a standard Monte Carlo validation."
to:NB
statistics
model_selection
approximate_bayesian_computation
indirect_inference
october 2011 by cshalizi
[1110.3860] Contending Parties: A Logistic Choice Analysis of Inter- and Intra-group Blog Citation Dynamics in the 2004 US Presidential Election
october 2011 by cshalizi
"The 2004 US Presidential Election cycle marked the debut of Internet-based media such as blogs and social networking websites as institutionally recognized features of the American political landscape. Using a longitudinal sample of all DNC/RNC-designated blog-citation networks we are able to test the influence of various strategic, institutional, and balance-theoretic mechanisms and exogenous factors such as seasonality and political events on the propensity of blogs to cite one another over time. Capitalizing on the temporal resolution of our data, we utilize an autoregressive network regression framework to carry out inference for a logistic choice process. Using a combination of deviance-based model selection criteria and simulation-based model adequacy tests, we identify the combination of processes that best characterizes the choice behavior of the contending blogs."
to:NB
network_data_analysis
blogs
us_politics
model_selection
simulation
october 2011 by cshalizi
Reality Checks and Comparisons of Nested Predictive Models - Journal of Business and Economic Statistics - 0(0):1
september 2011 by cshalizi
"This article develops a simple bootstrap method for simulating asymptotic critical values for tests of equal forecast accuracy and encompassing among many nested models. Our method combines elements of fixed regressor and wild bootstraps. We first derive the asymptotic distributions of tests of equal forecast accuracy and encompassing applied to forecasts from multiple models that nest the benchmark model—that is, reality check tests. We then prove the validity of the bootstrap for these tests. Monte Carlo experiments indicate that our proposed bootstrap has better finite-sample size and power than other methods designed for comparison of nonnested models."
statistics
model_checking
model_selection
time_series
bootstrap
to_read
to_teach:undergrad-ADA
encompassing
september 2011 by cshalizi
[0908.3666] On the minimal penalty for Markov order estimation
august 2011 by cshalizi
I think this is the only time I have seen the LIL actually _used_ for anything.
van_handel.ramon
empirical_processes
model_selection
markov_models
statistics
stochastic_processes
law_of_the_iterated_logarithm
in_NB
august 2011 by cshalizi
[1107.0189] The Lasso, correlated design, and improved oracle inequalities
july 2011 by cshalizi
"We study high-dimensional linear models and the $\ell_1$-penalized least squares estimator, also known as the Lasso estimator. In literature, oracle inequalities have been derived under restricted eigenvalue or compatibility conditions. In this paper, we complement this with entropy conditions which allow one to improve the dual norm bound, and demonstrate how this leads to new oracle inequalities. The new oracle inequalities show that a smaller choice for the tuning parameter and a trade-off between $\ell_1$-norms and small compatibility constants are possible. This implies, in particular for correlated design, improved bounds for the prediction error of the Lasso estimator as compared to the methods based on restricted eigenvalue or compatibility conditions only."
lasso
regression
model_selection
van_de_geer.sara
july 2011 by cshalizi
[1012.3795] Estimating Networks With Jumps
december 2010 by cshalizi
"We study the problem of estimating a temporally varying coefficient and varying structure (VCVS) graphical model underlying nonstationary time series data, such as social states of interacting individuals or microarray expression profiles of gene networks, as opposed to i.i.d. data from an invariant model widely considered in current literature of structural estimation. In particular, we consider the scenario in which the model evolves in a piece-wise constant fashion. We propose a procedure that minimizes the so-called TESLA loss (i.e., temporally smoothed L1 regularized regression), which allows jointly estimating the partition boundaries of the VCVS model and the coefficient of the sparse precision matrix on each block of the partition. "
graphical_models
network_data_analysis
time_series
model_selection
statistics
xing.eric
december 2010 by cshalizi
On the behaviour of marginal and conditional AIC in linear mixed models — Biometrika
december 2010 by cshalizi
"In linear mixed models, model selection frequently includes the selection of random effects. Two versions of the Akaike information criterion, AIC, have been used, based either on the marginal or on the conditional distribution. We show that the marginal AIC is not an asymptotically unbiased estimator of the Akaike information, and favours smaller models without random effects. For the conditional AIC, we show that ignoring estimation uncertainty in the random effects covariance matrix, as is common practice, induces a bias that can lead to the selection of any random effect not predicted to be exactly zero. We derive an analytic representation of a corrected version of the conditional AIC, which avoids the high computational cost and imprecision of available numerical approximations. ... All theoretical results are illustrated in simulation studies, and their impact in practice is investigated in an analysis of childhood malnutrition in Zambia."
regression
model_selection
information_criteria
statistics
december 2010 by cshalizi
[1011.3396] PAC-Bayesian aggregation and multi-armed bandits
november 2010 by cshalizi
"This habilitation thesis presents several contributions to (1) the PAC-Bayesian analysis of statistical learning, (2) the three aggregation problems: given d functions, how to predict as well as (i) the best of these d functions (model selection type aggregation), (ii) the best convex combination of these d functions, (iii) the best linear combination of these d functions, (3) the multi-armed bandit problems."
statistics
learning_theory
pac-bayesian
model_selection
ensemble_methods
to:NB
november 2010 by cshalizi
[1010.6202] Sequential Data-Adaptive Bandwidth Selection by Cross-Validation for Nonparametric Prediction
november 2010 by cshalizi
"We consider the problem of bandwidth selection by cross-validation from a sequential point of view in a nonparametric regression model. Having in mind that in applications one often aims at estimation, prediction and change detection simultaneously, we investigate that approach for sequential kernel smoothers in order to base these tasks on a single statistic. We provide uniform weak laws of large numbers and weak consistency results for the cross-validated bandwidth. Extensions to weakly dependent error terms are discussed as well. The errors may be {\alpha}-mixing or L2-near epoch dependent, which guarantees that the uniform convergence of the cross validation sum and the consistency of the cross-validated bandwidth hold true for a large class of time series. The method is illustrated by analyzing photovoltaic data."
cross-validation
prediction
time_series
model_selection
to_read
november 2010 by cshalizi
[1007.3230] Selecting an exponential random graph model for complex brain networks
july 2010 by cshalizi
Shorter authors: What do you know, all that stuff about how to fit ERGMs to social networks totally works for brain networks too. (I mock, but I'd be flabbergasted if it didn't, if only because there is so little _social_ content in the ERGM formalism...) Also: yay model checking!
neural_data_analysis
network_data_analysis
exponential_family_random_graphs
model_selection
statistics
model-checking
to_read
re:functional_communities
re:stacs
july 2010 by cshalizi
[1005.5483] Model Selection Principles in Misspecified Models
june 2010 by cshalizi
So-so. Suspect that most of these results are actually in Claeskens and Hjort's book, but am insufficiently motivated to check.
model_selection
misspecification
statistics
re:phil-of-bayes_paper
have_read
june 2010 by cshalizi
Ehm, Kornmeier, Heinrich: Multiple testing along a tree
may 2010 by cshalizi
"Suitable sequentially rejective multiple test procedures allow to “zoom in" on clusters of relevant variables in high-dimensional regression (Meinshausen [7]), or on regions of interest in some search space (Heinrich et al. [3]; Meinshausen et al. [8]). As a common framework for these schemes we propose to consider multiple testing along a tree of hypotheses together with a “keep rejecting until first acceptance" rule. Particular topics addressed in this note are control of the familywise error, and some variants and basic properties of the procedure."
multiple_testing
hypothesis_testing
model_selection
re:AoS_project
to_read
may 2010 by cshalizi
10-705 Intermediate Statistics, Fall 2009
april 2010 by cshalizi
Larry's version of the typical masters-level course based on Casella and Berger. Note: half of what he covers is not in Casella and Berger. (For example, he starts with VC theory!)
learning_theory
statistics
estimation
hypothesis_testing
prediction
minimax
bootstrap
model_selection
regression
classifiers
confidence_sets
wasserman.larry
kith_and_kin
april 2010 by cshalizi
[1004.2287] An empirical comparative study of approximate methods for binary graphical models; application to the search of associations among causes of death in French death certificates
april 2010 by cshalizi
"Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. Following the ideas of the LASSO procedure designed for the linear regression framework, recent developments dealing with graphical model selection have been based on $\ell_1$-penalization. In the binary case, however, exact inference is generally very slow or even intractable because of [the] log-partition function. Various approximate methods have recently been proposed in the literature ... Through an extensive simulation study, we show that a simple modification of a method relying on a Gaussian approximation achieves good performance and is very fast. We present a real application in which we search for associations among causes of death recorded on French death certificates."
graphical_models
lasso
model_selection
april 2010 by cshalizi
[1004.2304] Spatio-Temporal Graphical Model Selection
april 2010 by cshalizi
"We consider the problem of estimating the topology of spatial interactions in a discrete state, discrete time spatio-temporal graphical model where the interactions affect the temporal evolution of each agent in a network. Among other models, the susceptible, infected, recovered ($SIR$) model for interaction events fall into this framework. We pose the problem as a structure learning problem and solve it using an $\ell_1$-penalized likelihood convex program. We evaluate the solution on a simulated spread of infectious over a complex network. Our topology estimates outperform those of a standard spatial Markov random field graphical model selection using $\ell_1$-regularized logistic regression."
graphical_models
random_fields
lasso
model_selection
april 2010 by cshalizi
Verzelen: Adaptive estimation of stationary Gaussian fields
march 2010 by cshalizi
"We study the nonparametric covariance estimation of a stationary Gaussian field X observed on a regular lattice. In the time series setting, some procedures ... achieve optimal model selection among autoregressive models. ... no such equivalent results of adaptivity in a spatial setting. By considering collections of Gaussian Markov random fields (GMRF) as approximation sets for the distribution of X, we introduce a novel model selection procedure for spatial fields. For all neighborhoods m in a given collection , this procedure first amounts to computing a covariance estimator of X within the GMRFs of neighborhood m. Then it selects a neighborhood ̂m by applying a penalization strategy. The so-defined method satisfies a nonasymptotic oracle-type inequality. If X is a GMRF, the procedure is also minimax adaptive to the sparsity of its neighborhood. More generally, the procedure is adaptive to the rate of approximation of the true distribution by GMRFs with growing neighborhoods."
spatial_statistics
model_selection
statistics
stochastic_processes
random_fields
statistical_inference_for_stochastic_processes
march 2010 by cshalizi
[1003.0516] Model Selection with the Loss Rank Principle
march 2010 by cshalizi
This is a simplified form of Mayo's severity.
model_selection
regression
march 2010 by cshalizi
[0903.3620] Reconciling Model Selection and Prediction
december 2009 by cshalizi
"It is known that there is a dichotomy in the performance of model selectors. Those that are consistent (having the "oracle property") do not achieve the asymptotic minimax rate for prediction error. We look at this phenomenon closely, and argue that the set of parameters on which this dichotomy occurs is extreme, even pathological, and should not be considered when evaluating model selectors. We characterize this set, and show that, when such parameters are dismissed from consideration, consistency and asymptotic minimaxity can be attained simultaneously."
model_selection
statistics
minimax
regression
have_read
prediction
december 2009 by cshalizi
Oracle inequalities for multi-fold cross validation
october 2009 by cshalizi
Gah, does no one have a copy? --- thanks to A. v.d.V. for a reprint.
model_selection
oracle_inequalities
cross-validation
have_read
october 2009 by cshalizi
[0712.0881] On the "degrees of freedom" of the lasso
august 2009 by cshalizi
Reading the abstract and introduction makes me feel al of a sudden like I don't, in fact, understand the concept of "degrees of freedom". (I mean, I understand it in mechanics!)
lasso
regression
sparsity
degrees_of_freedom
statistics
estimation
model_selection
to_read
august 2009 by cshalizi
[0908.2904] A bias correction for the minimum error rate in cross-validation
august 2009 by cshalizi
How is this different from Burman's old bias correction for CV? And, how much noise does this correction add?
cross-validation
statistics
machine_learning
model_selection
have_read
to_teach:data-mining
tibshirani.robert
tibshirani.ryan
to_teach:undergrad-ADA
august 2009 by cshalizi
Challenges for Econometric Model Selection
june 2009 by cshalizi
"Standard econometric model selection methods are based on four fundamental errors in approach: parametric vision, the assumption of a true DGP, evaluation based on fit, and ignoring the impact of model uncertainty on inference. Instead, econometric model selection methods should be based on a semiparametric vision, models should be viewed as approximations, models should be evaluated based on their purpose, and model uncertainty should be incorporated into inference methods. These problems have been examined individually, but not jointly, and my view is that future research into econometric model selection should attempt to address all four issues. "
model_selection
econometrics
statistics
nonparametrics
have_read
hansen.bruce
june 2009 by cshalizi
"Statistical Theory and Methods for Complex, High-Dimensional Data"
june 2009 by cshalizi
Loads of talks.
statistics
machine_learning
model_selection
graphical_models
regression
latent_variables
principal_components
factor_analysis
dimension_reduction
lasso
bioinformatics
track_down_references
via:shivak
june 2009 by cshalizi
[0711.1036v2] Confidence Sets Based on Sparse Estimators Are Necessarily Large
may 2009 by cshalizi
"Confidence sets based on sparse estimators are shown to be large compared to more standard confidence sets, demonstrating that sparsity of an estimator comes at a substantial price in terms of the quality of the estimator. The results are set in a general parametric or semiparametric framework."
sparsity
confidence_sets
model_selection
statistics
to:NB
via:shivak
may 2009 by cshalizi
[0901.3202] Model-Consistent Sparse Estimation through the Bootstrap
january 2009 by cshalizi
"if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection"
lasso
linear_regression
model_selection
variable_selection
bootstrap
january 2009 by cshalizi
[0901.1925] Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems
january 2009 by cshalizi
This "approximate Bayesian computation" scheme sounds like a version of indirect inference, only slower and vulnerable to a bad choice of prior...
time_series
model_selection
statistics
particle_filters
to_read
statistical_inference_for_stochastic_processes
january 2009 by cshalizi
Asymptotically minimax regret procedures in regression model selection and the magnitude of the dimension penalty
october 2008 by cshalizi
Claims AIC is asymptotically minimax (approximately). Not sure this would actually apply to anything I'd ever be interested in, given the assumptions they have to load on.
model_selection
information_criteria
regression
to_read
via:kevin_kelly
october 2008 by cshalizi
related tags
approximate_bayesian_computation ⊕ approximation ⊕ arlot.sylvain ⊕ bayesianism ⊕ bayesian_nonparametrics ⊕ bioinformatics ⊕ blanchard.gilles ⊕ blei.david ⊕ blogs ⊕ books:noted ⊕ boosting ⊕ bootstrap ⊕ bousquet.olivier ⊕ buhlmann.peter ⊕ clarke.kevin ⊕ classifiers ⊕ clustering ⊕ comparative_methods ⊕ complexity ⊕ computational_statistics ⊕ concentration_of_measure ⊕ confidence_sets ⊕ contingency_tables ⊕ cross-validation ⊕ data_mining ⊕ degrees_of_freedom ⊕ density_estimation ⊕ deviation_inequalities ⊕ dimension_reduction ⊕ dudoit.sandrine ⊕ earthquakes ⊕ econometrics ⊕ economics ⊕ empirical_processes ⊕ encompassing ⊕ ensemble_methods ⊕ epidemic_models ⊕ epistemology ⊕ estimation ⊕ evolution ⊕ evolutionary_biology ⊕ evolutionary_optimization ⊕ exponential_family_random_graphs ⊕ factor_analysis ⊕ feature_selection ⊕ geology ⊕ graphical_models ⊕ grunwald.peter ⊕ hansen.bruce ⊕ have_read ⊕ heard_the_talk ⊕ hierarchical_models ⊕ high-dimensional_probability ⊕ hypothesis_testing ⊕ indirect_inference ⊕ inference_to_latent_objects ⊕ information_criteria ⊕ information_theory ⊕ in_NB ⊕ ising_model ⊕ k-means ⊕ kelly.kevin_t. ⊕ kernel_estimators ⊕ kernel_methods ⊕ kith_and_kin ⊕ lafferty.john ⊕ lasso ⊕ latent_variables ⊕ law_of_the_iterated_logarithm ⊕ leamer.ed ⊕ learning_theory ⊕ leeb.hannes ⊕ linear_regression ⊕ linguistics ⊕ liu.han ⊕ machine_learning ⊕ macroeconomics ⊕ markov_models ⊕ massart.pascal ⊕ mayo.deborah ⊕ meinshausen.nicolai ⊕ minimax ⊕ misspecification ⊕ mixing ⊕ mixture_models ⊕ mizon.grayham ⊕ model-checking ⊕ model_checking ⊕ model_search ⊕ model_selection ⊖ morley.james ⊕ multiple_testing ⊕ natural_language_processing ⊕ networks ⊕ network_data_analysis ⊕ neural_data_analysis ⊕ neural_networks ⊕ nonparametrics ⊕ occams_razor ⊕ oracle_inequalities ⊕ pac-bayesian ⊕ particle_filters ⊕ philosophy_of_science ⊕ point_processes ⊕ prediction ⊕ principal_components ⊕ random_fields ⊕ ravikumar.pradeep ⊕ re:AoS_project ⊕ re:functional_communities ⊕ re:phil-of-bayes_paper ⊕ re:social-networks-as-sensor-networks ⊕ re:stacs ⊕ re:XV_for_mixing ⊕ re:XV_for_networks ⊕ re:your_favorite_dsge_sucks ⊕ regression ⊕ resampling ⊕ richard.jean-francois ⊕ ridge_regression ⊕ ripley.brian ⊕ robustness ⊕ self-promotion ⊕ simulation ⊕ sober.elliott ⊕ sparsity ⊕ spatial_statistics ⊕ stability_of_learning ⊕ statistical_inference_for_stochastic_processes ⊕ statistics ⊕ stochastic_processes ⊕ support_vector_machines ⊕ tibshirani.robert ⊕ tibshirani.ryan ⊕ time_series ⊕ tishby.naftali ⊕ to:blog ⊕ to:NB ⊕ to_read ⊕ to_teach:data-mining ⊕ to_teach:undergrad-ADA ⊕ track_down_references ⊕ us_politics ⊕ van_de_geer.sara ⊕ van_handel.ramon ⊕ vapnik.v.n. ⊕ variable-length_markov_models ⊕ variable_selection ⊕ via:crooked_timber ⊕ via:gelman ⊕ via:kevin_kelly ⊕ via:matthew_berryman ⊕ via:shivak ⊕ wahba.grace ⊕ wainwright.martin ⊕ wasserman.larry ⊕ xing.eric ⊕Copy this bookmark: