cshalizi + sparsity   51

Colombo , Maathuis , Kalisch , Richardson : Learning high-dimensional directed acyclic graphs with latent and selection variables
"We consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables. The FCI (Fast Causal Inference) algorithm has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose the new RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct in the asymptotic limit. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."

--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
in_NB  have_read  statistics  graphical_models  causal_inference  sparsity  to_teach:undergrad-ADA 
7 weeks ago by cshalizi
Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
"We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples."
--- I wonder whether this hasn't some application to the PC algorithm?
to:NB  graphical_models  lasso  sparsity  statistics  heard_the_talk 
7 weeks ago by cshalizi
Structured Sparsity and Generalization
"We present a data dependent generalization bound for a large class of regularized algorithms which implement structured sparsity constraints. The bound can be applied to standard squared-norm regularization, the Lasso, the group Lasso, some versions of the group Lasso with overlapping groups, multiple kernel learning and other regularization schemes. In all these cases competitive results are obtained. A novel feature of our bound is that it can be applied in an infinite dimensional setting such as the Lasso in a separable Hilbert space or multiple kernel learning with a countable number of kernels."
to:NB  learning_theory  regression  sparsity  statistics  lasso 
7 weeks ago by cshalizi
Fan , Liao , Mincheva : High-dimensional covariance matrix estimation in approximate factor models
"The variance–covariance matrix plays a central role in the inferential theories of high-dimensional factor models in finance and economics. Popular regularization methods of directly exploiting sparsity are not directly applicable to many financial problems. Classical methods of estimating the covariance matrices are based on the strict factor models, assuming independent idiosyncratic components. This assumption, however, is restrictive in practical applications. By assuming sparse error covariance matrix, we allow the presence of the cross-sectional correlation even after taking out common factors, and it enables us to combine the merits of both methods. We estimate the sparse covariance using the adaptive thresholding technique as in Cai and Liu [J. Amer. Statist. Assoc. 106 (2011) 672–684], taking into account the fact that direct observations of the idiosyncratic components are unavailable. The impact of high dimensionality on the covariance matrix estimation based on the factor structure is then studied."
to:NB  statistics  estimation  factor_analysis  sparsity 
12 weeks ago by cshalizi
Optimization with Sparsity-Inducing Penalties
"Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate nonsmooth norms. The goal of this monograph is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted l2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view."
to:NB  statistics  machine_learning  optimization  sparsity  lasso 
12 weeks ago by cshalizi
[0805.1179] Autoregressive Process Modeling via the Lasso Procedure
"The Lasso is a popular model selection and estimation procedure for linear models that enjoys nice theoretical properties. In this paper, we study the Lasso estimator for fitting autoregressive time series models. We adopt a double asymptotic framework where the maximal lag may increase with the sample size. We derive theoretical results establishing various types of consistency. In particular, we derive conditions under which the Lasso estimator for the autoregressive coefficients is model selection consistent, estimation consistent and prediction consistent. Simulation study results are reported."
to:NB  time_series  statistics  lasso  sparsity  variable_selection  kith_and_kin  heard_the_talk  rinaldo.alessandro  nardi.yuval 
12 weeks ago by cshalizi
Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming
"Sparse additive models are families of d-variate functions with the additive decomposition f* = ∑j ∈ S f*j, where S is an unknown subset of cardinality s << d. In this paper, we consider the case where each univariate component function f*j lies in a reproducing kernel Hilbert space (RKHS), and analyze a method for estimating the unknown function f* based on kernels combined with l1-type convex regularization. Working within a high-dimensional framework that allows both the dimension d and sparsity s to increase with n, we derive convergence rates in the L2(P) and L2(Pn) norms over the class Fd,s,H of sparse additive models with each univariate function f*j in the unit ball of a univariate RKHS with bounded kernel function. We complement our upper bounds by deriving minimax lower bounds on the L2(P) error, thereby showing the optimality of our method. Thus, we obtain optimal minimax rates for many interesting classes of sparse additive models, including polynomials, splines, and Sobolev classes. We also show that if, in contrast to our univariate conditions, the d-variate function class is assumed to be globally bounded, then much faster estimation rates are possible for any sparsity s = Ω(√n), showing that global boundedness is a significant restriction in the high-dimensional setting."
to:NB  statistics  regression  estimation  minimax  additive_models  sparsity  wainright.martin  yu.bin 
12 weeks ago by cshalizi
[0810.3177] Inferring sparse Gaussian graphical models with latent structure
"Our concern is selecting the concentration matrix's nonzero coefficients for a sparse Gaussian graphical model in a high-dimensional setting. This corresponds to estimating the graph of conditional dependencies between the variables. We describe a novel framework taking into account a latent structure on the concentration matrix. This latent structure is used to drive a penalty matrix and thus to recover a graphical model with a constrained topology. Our method uses an $ell_1$ penalized likelihood criterion. Inference of the graph of conditional dependencies between the variates and of the hidden variables is performed simultaneously in an iterative textsc{em}-like algorithm. The performances of our method is illustrated on synthetic as well as real data, the latter concerning breast cancer."
to:NB  graphical_models  lasso  sparsity  statistics  inference_to_latent_objects 
february 2012 by cshalizi
[1201.0224] Estimation of Treatment Effects with High-Dimensional Controls
"We propose methods for inference on the average effect of a treatment on a scalar outcome in the presence of very many controls. Our setting is a partially linear regression model containing the treatment/policy variable and a large number $p$ of controls or series terms, with $p$ that is possibly much larger than the sample size $n$, but where only $s < n$ unknown controls or series terms are needed to approximate the regression function accurately. The latter sparsity condition makes it possible to estimate the entire regression function as well as the average treatment effect by selecting an approximately the right set of controls using Lasso and related methods. We develop estimation and inference methods for the average treatment effect in this setting, proposing a novel "post double selection" method that provides attractive inferential and estimation properties. In our analysis, in order to cover realistic applications, we expressly allow for imperfect selection of the controls and account for the impact of selection errors on estimation and inference. In order to cover typical applications in economics, we employ the selection methods designed to deal with non-Gaussian and heteroscedastic disturbances. We illustrate the use of new methods with numerical simulations and an application to the effect of abortion on crime rates."
to:NB  to_teach:undergrad-ADA  regression  causal_inference  lasso  sparsity  econometrics  instrumental_variables  hansen.christian 
january 2012 by cshalizi
[1201.0220] Inference for High-Dimensional Sparse Econometric Models
"This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors. We discuss methods for identifying this set of regressors and estimating their coefficients based on $ell_1$-penalization and describe key theoretical results. In order to capture realistic practical situations, we expressly allow for imperfect selection of regressors and study the impact of this imperfect selection on estimation and inference results. We focus the main part of the article on the use of HDS models and methods in the instrumental variables model and the partially linear model. We present a set of novel inference results for these models and illustrate their use with applications to returns to schooling and growth regression."
to:NB  regression  sparsity  instrumental_variables  econometrics  to_teach:undergrad-ADA  lasso  hansen.christian 
january 2012 by cshalizi
[1111.4494] Structured Sparse Aggregation
"We introduce a method for aggregating many least squares estimator so that the resulting estimate has two properties: sparsity and structure. That is, only a few candidate covariates are used in the resulting model, and the selected covariates follow some structure over the candidate covariates that is assumed to be known a priori. While sparsity is well studied in many settings, including aggregation, structured sparse methods are still emerging. We demonstrate a general framework for structured sparse aggregation that allows for a wide variety of structures, including overlapping grouped structures and general structural penalties defined as set functions on the set of covariates. We show that such estimators satisfy structured sparse oracle inequalities --- their finite sample risk adapts to the structured sparsity of the target. These inequalities reveal that under suitable settings, the structured sparse estimator performs at least as well as, and potentially much better than, a sparse aggregation estimator. We empirically establish the effectiveness of the method using simulation and an application to HIV drug resistance."
to:NB  to_read  regression  sparsity  statistics  percival.daniel 
november 2011 by cshalizi
[1111.4416] Sparse Group Selection Through Co-Adaptive Penalties
"Recent work has focused on the problem of conducting linear regression when the number of covariates is very large, potentially greater than the sample size. To facilitate this, one useful tool is to assume that the model can be well approximated by a fit involving only a small number of covariates -- a so called sparsity assumption, which leads to the Lasso and other methods. In many situations, however, the covariates can be considered to be structured, in that the selection of some variables favours the selection of others -- with variables organised into groups entering or leaving the model simultaneously as a special case. This structure creates a different form of sparsity. In this paper, we suggest the Co-adaptive Lasso to fit models accommodating this form of `group sparsity'. The Co-adaptive Lasso is fast and simple to calculate, and we show that it holds theoretical advantages over the Lasso, performs well under a broad set of conclusions, and is very competitive in empirical simulations in comparison with previously suggested algorithms like the Group Lasso and the Adaptive Lasso."
to:NB  regression  sparsity  lasso  statistics 
november 2011 by cshalizi
[1111.0559] Model Selection in Undirected Graphical Models with the Elastic Net
"Structure learning in random fields has attracted considerable attention due to its difficulty and importance in areas such as remote sensing, computational biology, natural language processing, protein networks, and social network analysis. We consider the problem of estimating the probabilistic graph structure associated with a Gaussian Markov Random Field (GMRF), the Ising model and the Potts model, by extending previous work on $l_1$ regularized neighborhood estimation to include the elastic net $l_1+l_2$ penalty. Additionally, we show numerical evidence that the edge density plays a role in the graph recovery process. Finally, we introduce a novel method for augmenting neighborhood estimation by leveraging pair-wise neighborhood union estimates."
in_NB  graphical_models  model_selection  lasso  sparsity 
november 2011 by cshalizi
A Perturbation Method for Inference on Regularized Regression Estimates
"Analysis of high-dimensional data often seeks to identify a subset of important features and to assess the effects of these features on outcomes. Traditional statistical inference procedures based on standard regression methods often fail in the presence of high-dimensional features. In recent years, regularization methods have emerged as promising tools for analyzing high-dimensional data. These methods simultaneously select important features and provide stable estimation of their effects. Adaptive LASSO and SCAD, for instance, give consistent and asymptotically normal estimates with oracle properties. However, in finite samples, it remains difficult to obtain interval estimators for the regression parameters. In this article, we propose perturbation resampling-based procedures to approximate the distribution of a general class of penalized parameter estimates. Our proposal, justified by asymptotic theory, provides a simple way to estimate the covariance matrix and confidence regions. Through finite-sample simulations, we verify the ability of this method to give accurate inference and compare it with other widely used standard deviation and confidence interval estimates. We also illustrate our proposals with a dataset used to study the association of HIV drug resistance and a large number of genetic mutations."
in_NB  regression  sparsity  confidence_sets  statistics  bootstrap 
september 2011 by cshalizi
[1109.2397] Structured sparsity through convex optimization
"Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. While naturally cast as a combinatorial optimization problem, variable or feature selection admits a convex relaxation through the regularization by the $\ell_1$-norm. In this paper, we consider situations where we are not only interested in sparsity, but where some structural prior knowledge is available as well. We show that the $\ell_1$-norm can then be extended to structured norms built on either disjoint or overlapping groups of variables, leading to a flexible framework that can deal with various structures. We present applications to supervised learning in the context of non-linear variable selection, and to unsupervised learning, for structured sparse principal component analysis, and hierarchical dictionary learning."
sparsity  regression  statistics  estimation  convexity  machine_learning  to:NB 
september 2011 by cshalizi
[1108.0775] Optimization with Sparsity-Inducing Penalties
"Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted $\ell_2$-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view."
learning_theory  sparsity  optimization  lasso  to:NB  machine_learning  statistics 
august 2011 by cshalizi
[1107.3536] Inverse Ising inference using all the data
"We show that a method based on logistic regression, using all the data, solves the inverse Ising problem far better than mean-field calculations relying only on sample pairwise correlation functions, while still computationally feasible for numbers of nodes in the range of hundreds. The largest improvement in reconstruction occurs for strong interactions. Using two examples, a diluted Sherrington-Kirkpatrick model and a two-dimensional lattice, we also show that interaction topologies can be recovered from few samples with good accuracy (even below the phase transitions in the corresponding forward Ising problems), and that using $l_1$-regularization is beneficial in this process."
graphical_models  logistic_regression  ising_model  sparsity 
july 2011 by cshalizi
[1102.1615] Sparsity considerations for dependent observations
"The aim of this paper is to provide a comprehensive introduction for the study of L1-penalized estimators in the context of dependent observations. We define a general $\ell_{1}$-penalized estimator for solving problems of stochastic optimization. This estimator turns out to be the LASSO in the regression estimation setting. Powerful theoretical guarantees on the statistical performances of the LASSO were provided in recent papers, however, they usually only deal with the iid case. Here, we study our estimator under various dependence assumptions."
statistics  sparsity  regression  to:NB 
february 2011 by cshalizi
[1010.2731] A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
"...models in which the the number of parameters $p$ is comparable to or larger than the sample size $n$. ... line of recent work has studied models with various types of low-dimensional structure (e.g., sparse vectors; block-structured matrices; low-rank matrices; Markov assumptions) ... a general approach to estimation is to solve a regularized convex program (known as a regularized $M$-estimator) ... loss function (measuring how well the model fits the data) with some regularization function that encourages the assumed structure. ... unified framework for ... consistency and convergence rates for such regularized $M$-estimators under high-dimensional scaling. ... main theorem ... can be used to re-derive some existing results, and ... obtain ... new results ... identifies two key properties of loss and regularization functions, ... restricted strong convexity and decomposability ... ensure ... fast convergence rates ... optimal in many ... cases."
estimation  sparsity  statistics 
october 2010 by cshalizi
[1009.2302] The Predictive Lasso
"We propose a shrinkage procedure for simultaneous variable selection and estimation in generalized linear models (GLMs) with an explicit predictive motivation. The procedure estimates the coefficients by minimizing the Kullback-Leibler divergence of a set of predictive distributions to the corresponding predictive distributions for the full model, subject to an $l_1$ constraint on the coefficient vector. This results in selection of a parsimonious model with similar predictive performance to the full model. Thanks to its similar form to the original lasso problem for GLMs, our procedure can benefit from available $l_1$-regularization path algorithms. Simulation studies and real-data examples confirm the efficiency of our method in terms of predictive performance on future observations."
regression  lasso  variable_selection  sparsity  information_theory  statistics 
september 2010 by cshalizi
Phys. Rev. Lett. 104, 188701 (2010): Statistical Mechanics of Compressed Sensing
"Compressed sensing (CS) is an important recent advance that shows how to reconstruct sparse high dimensional signals from surprisingly small numbers of random measurements. The nonlinear nature of the reconstruction process poses a challenge to understanding the performance of CS. We employ techniques from the statistical physics of disordered systems to compute the typical behavior of CS as a function of the signal sparsity and measurement density. We find surprising and useful regularities in the nature of errors made by CS, a new phase transition which reveals the possibility of CS for nonnegative signals without optimization, and a new null model for sparse regression."
compressed_sensing  sparsity  regression  learning_theory  statistics  statistical_mechanics 
may 2010 by cshalizi
Efficient Learning and Feature Selection in High-Dimensional Regression
"We present a novel algorithm for efficient learning and feature selection in high-dimensional regression problems. We arrive at this model through a modification of the standard regression model, enabling us to derive a probabilistic version of the well-known statistical regression technique of backfitting. Using the expectation-maximization algorithm, along with variational approximation methods to overcome intractability, we extend our algorithm to include automatic relevance detection of the input features. This variational Bayesian least squares (VBLS) approach retains its simplicity as a linear model, but offers a novel statistically robust black-box approach to generalized linear regression with high-dimensional inputs. It can be easily extended to nonlinear regression and classification problems. In particular, we derive the framework of sparse Bayesian learning, the relevance vector machine, with VBLS at its core..."
regression  sparsity  statistics  machine_learning 
april 2010 by cshalizi
[1003.0205] Detecting Weak but Hierarchically-Structured Patterns in Networks
"The ability to detect weak distributed activation patterns in networks ... identifying the onset of anomalous activity or incipient congestion in the Internet, or faint traces of a biochemical spread by a sensor network ... weak distributed patterns can be invisible in per node statistics as well as a global network-wide aggregate. Most prior work considers situations in which ... each node is statistically independent ... we consider structured patterns arising from statistical dependencies in the activation process. ... a sparsifying transform that succinctly represents structured activation patterns that conform to a hierarchical dependency graph. ... transform facilitates detection of very weak activation patterns that cannot be detected with existing methods.... the hierarchical dependency graph governing the activation process, and hence the network transform, can be learnt from very few (logarithmic in network size) independent snapshots of network activity."
network_data_analysis  anomaly_detection  hierarchical_structure  machine_learning  statistics  sparsity 
march 2010 by cshalizi
[0712.0881] On the "degrees of freedom" of the lasso
Reading the abstract and introduction makes me feel al of a sudden like I don't, in fact, understand the concept of "degrees of freedom". (I mean, I understand it in mechanics!)
lasso  regression  sparsity  degrees_of_freedom  statistics  estimation  model_selection  to_read 
august 2009 by cshalizi
[0906.3590] Forest Garrote
We have got to do something about the nams of techniques in this area. I don't mind the whimsy, it's just that combinations like this don't work, metaphorically.
ensemble_methods  classifiers  statistics  machine_learning  sparsity  variable_selection  lasso 
june 2009 by cshalizi
[0711.1036v2] Confidence Sets Based on Sparse Estimators Are Necessarily Large
"Confidence sets based on sparse estimators are shown to be large compared to more standard confidence sets, demonstrating that sparsity of an estimator comes at a substantial price in terms of the quality of the estimator. The results are set in a general parametric or semiparametric framework."
sparsity  confidence_sets  model_selection  statistics  to:NB  via:shivak 
may 2009 by cshalizi
[0903.0649] The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
Getting many of the advantages of Gaussian graphical models without having to assume Gaussian distributions, though at the price of committing one of the most awful puns I have ever seen.

Journal version: http://jmlr.csail.mit.edu/papers/v10/liu09a.html
graphical_models  nonparametrics  statistics  sparsity  copulas  kith_and_kin  liu.han  lafferty.john  wasserman.larry  have_read 
march 2009 by cshalizi
Rajat Raina's Research: Self-taught Learning etc.
"Self-taught learning" = use unlabeled data to learn some sort of abstract, higher-level representation/set of features; then use those features in straightforward classifier learning on the labeled data. Nice idea, cute examples; completely separable from the specific technique for learning sparse basis vectors.
machine_learning  semi-supervised_learning  via:klk  sparsity 
february 2009 by cshalizi
[0901.2234] Sparse Causal Discovery in Multivariate Time Series
Umm. I'd have been a lot happier with this if they'd called it "sparse vector autoregression". Still, interesting and worth a second read.
time_series  causal_inference  regression  sparsity  lasso 
january 2009 by cshalizi
[0901.2044] Spades and Mixture Models
"This paper studies sparse density estimation via l1 penalization (SPADES). We focus on estimation in high-dimensional mixture models and nonparametric adaptive density estimation. We show, respectively, that SPADES can recover, with high probability, the unknown components of a mixture of probability densities and that it yields minimax adaptive density estimates."
density_estimation  sparsity  mixture_models  to_read  nonparametrics  statistics  in_NB 
january 2009 by cshalizi
[0806.4115] High-Dimensional Additive Modeling
"We propose a new sparsity-smoothness penalty for high-dimensional generalized additive models. The combination of sparsity and smoothness is crucial for mathematical theory as well as performance for finite-sample data. We present a computationally efficient algorithm, with provable numerical convergence properties, for optimizing the penalized likelihood. Furthermore, we provide oracle results which yield asymptotic optimality of our estimator for high-dimensional but sparse additive models. Finally, an adaptive version of our sparsity-smoothness penalized approach yields large additional performance gains."
regression  additive_models  sparsity  nonparametrics  statistics  buhlmann.peter  van_de_geer.sara  meier.lukas 
december 2008 by cshalizi
[0707.0704] Model Selection Through Sparse Maximum Likelihood Estimation
"We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse."
statistics  graphical_models  sparsity  via:klk 
february 2008 by cshalizi

related tags

additive_models  anomaly_detection  biochemical_networks  books:recommended  bootstrap  buhlmann.peter  causal_inference  classifiers  compressed_sensing  conditional_random_fields  confidence_sets  convexity  copulas  data_analysis  degrees_of_freedom  density_estimation  econometrics  ensemble_methods  estimation  exponential_families  factor_analysis  feature_selection  gene_expression_data_analysis  graphical_models  hansen.christian  have_read  heard_the_talk  hierarchical_structure  hofling.holger  inference_to_latent_objects  information_theory  instrumental_variables  inverse_problems  in_NB  ising_model  kith_and_kin  lafferty.john  lasso  learning_theory  liu.han  logistic_regression  machine_learning  markov_models  meier.lukas  method_of_sieves  minimax  mixture_models  model_selection  nardi.yuval  network_data_analysis  nonparametrics  optimization  percival.daniel  principal_components  random_fields  ravikumar.pradeep  re:what_is_the_right_null_model_for_linear_regression  regression  ridge_regression  rinaldo.alessandro  semi-supervised_learning  signal_processing  sparsity  statistical_mechanics  statistics  text_mining  tibshirani.robert  time_series  to:NB  to_read  to_teach:data-mining  to_teach:undergrad-ADA  van_de_geer.sara  variable_selection  via:klk  via:shivak  wainright.martin  wasserman.larry  yu.bin  zhang.tong 

Copy this bookmark:



description:


tags: