cshalizi + dimension_reduction   31

Lam , Yao : Factor modeling for high-dimensional time series: Inference for the number of factors
"This paper deals with the factor modeling for high-dimensional time series based on a dimension-reduction viewpoint. Under stationary settings, the inference is simple in the sense that both the number of factors and the factor loadings are estimated in terms of an eigenanalysis for a nonnegative definite matrix, and is therefore applicable when the dimension of time series is on the order of a few thousands. Asymptotic properties of the proposed method are investigated under two settings: (i) the sample size goes to infinity while the dimension of time series is fixed; and (ii) both the sample size and the dimension of time series go to infinity together. In particular, our estimators for zero-eigenvalues enjoy faster convergence (or slower divergence) rates, hence making the estimation for the number of factors easier. In particular, when the sample size and the dimension of time series go to infinity together, the estimators for the eigenvalues are no longer consistent. However, our estimator for the number of the factors, which is based on the ratios of the estimated eigenvalues, still works fine. Furthermore, this estimation shows the so-called “blessing of dimensionality” property in the sense that the performance of the estimation may improve when the dimension of time series increases. A two-step procedure is investigated when the factors are of different degrees of strength. Numerical illustration with both simulated and real data is also reported."
to:NB  dimension_reduction  factor_analysis  time_series  high-dimensional_statistics  inference_to_latent_objects 
9 days ago by cshalizi
[1205.2609] Which Spatial Partition Trees are Adaptive to Intrinsic Dimension?
"Recent theory work has found that a special type of spatial partition tree - called a random projection tree - is adaptive to the intrinsic dimension of the data from which it is built. Here we examine this same question, with a combination of theory and experiments, for a broader class of trees that includes k-d trees, dyadic trees, and PCA trees. Our motivation is to get a feel for (i) the kind of intrinsic low dimensional structure that can be empirically verified, (ii) the extent to which a spatial partition can exploit such structure, and (iii) the implications for standard statistical tasks such as regression, vector quantization, and nearest neighbor search."
to:NB  decision_trees  prediction  regression  statistics  dimension_reduction  machine_learning 
13 days ago by cshalizi
Cook , Forzani , Rothman : Estimating sufficient reductions of the predictors in abundant high-dimensional regressions
"We study the asymptotic behavior of a class of methods for sufficient dimension reduction in high-dimension regressions, as the sample size and number of predictors grow in various alignments. It is demonstrated that these methods are consistent in a variety of settings, particularly in abundant regressions where most predictors contribute some information on the response, and oracle rates are possible. Simulation results are presented to support the theoretical conclusion."
to:NB  regression  dimension_reduction  sufficiency  statistics 
7 weeks ago by cshalizi
[1203.6130] Spectral dimensionality reduction for HMMs
"Hidden Markov Models (HMMs) can be accurately approximated using co-occurrence frequencies of pairs and triples of observations by using a fast spectral method in contrast to the usual slow methods like EM or Gibbs sampling. We provide a new spectral method which significantly reduces the number of model parameters that need to be estimated, and generates a sample complexity that does not depend on the size of the observation vocabulary. We present an elementary proof giving bounds on the relative accuracy of probability estimates from our model. (Correlaries show our bounds can be weakened to provide either L1 bounds or KL bounds which provide easier direct comparisons to previous work.) Our theorem uses conditions that are checkable from the data, instead of putting conditions on the unobservable Markov transition matrix."
to:NB  to_read  markov_models  statistics  machine_learning  dimension_reduction  re:AoS_project  spectral_clustering 
8 weeks ago by cshalizi
Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
"We design an online algorithm for Principal Component Analysis. In each trial the current instance is centered and projected into a probabilistically chosen low dimensional subspace. The regret of our online algorithm, that is, the total expected quadratic compression loss of the online algorithm minus the total quadratic compression loss of the batch algorithm, is bounded by a term whose dependence on the dimension of the instances is only logarithmic.
"We first develop our methodology in the expert setting of online learning by giving an algorithm for learning as well as the best subset of experts of a certain size. This algorithm is then lifted to the matrix setting where the subsets of experts correspond to subspaces. The algorithm represents the uncertainty over the best subspace as a density matrix whose eigenvalues are bounded. The running time is O(n2) per trial, where n is the dimension of the instances."
to:NB  online_learning  dimension_reduction  machine_learning  learning_theory  warmuth.manfred  principal_components  low-regret_learning 
february 2012 by cshalizi
A General Framework for Dimensionality-Reducing Data Visualization Mapping
"In recent years, a wealth of dimension-reduction techniques for data visualization and preprocessing has been established. Nonparametric methods require additional effort for out-of-sample extensions, because they provide only a mapping of a given finite set of points. In this letter, we propose a general view on nonparametric dimension reduction based on the concept of cost functions and properties of the data. Based on this general principle, we transfer nonparametric dimension reduction to explicit mappings of the data manifold such that direct out-of-sample extensions become possible. Furthermore, this concept offers the possibility of investigating the generalization ability of data visualization to new data points. We demonstrate the approach based on a simple global linear mapping, as well as prototype-based local linear mappings. In addition, we can bias the functional form according to given auxiliary information. This leads to explicit supervised visualization mappings with discriminative properties comparable to state-of-the-art approaches."
in_NB  dimension_reduction  visual_display_of_quantitative_information  data_analysis  data_mining  manifold_learning  to_teach:data-mining 
february 2012 by cshalizi
Modified Locally Linear Embedding Using Multiple Weights
Stabilizing LLE by (as it were) model averaging. Needs at least a reference in the data-mining lecture on LLE.

"The locally linear embedding (LLE) is improved by introducing multiple linearly independent local weight vectors for each neighborhood. We characterize the reconstruction weights and show the existence of the linearly independent weight vectors at each neighborhood. The modified locally linear embedding (MLLE) proposed in this paper is much stable. It can retrieve the ideal embedding if MLLE is applied on data points sampled from an isometric manifold. MLLE is also compared with the local tangent space alignment (LTSA). Numerical examples are given that show the improvement and efficiency of MLLE."
to:NB  to_teach:data-mining  dimension_reduction  manifold_learning  via:gmg  spectral_clustering 
december 2011 by cshalizi
[1110.3917] How to Evaluate Dimensionality Reduction? - Improving the Co-ranking Matrix
"The growing number of dimensionality reduction methods available for data visualization has recently inspired the development of quality assessment measures, in order to evaluate the resulting low-dimensional representation independently from a methods' inherent criteria. Several (existing) quality measures can be (re)formulated based on the so-called co-ranking matrix, which subsumes all rank errors (i.e. differences between the ranking of distances from every point to all others, comparing the low-dimensional representation to the original data). The measures are often based on the partioning of the co-ranking matrix into 4 submatrices, divided at the K-th row and column, calculating a weighted combination of the sums of each submatrix. Hence, the evaluation process typically involves plotting a graph over several (or even all possible) settings of the parameter K. Considering simple artificial examples, we argue that this parameter controls two notions at once, that need not necessarily be combined, and that the rectangular shape of submatrices is disadvantageous for an intuitive interpretation of the parameter. We debate that quality measures, as general and flexible evaluation tools, should have parameters with a direct and intuitive interpretation as to which specific error types are tolerated or penalized. Therefore, we propose to replace K with two parameters to control these notions separately, and introduce a differently shaped weighting on the co-ranking matrix. The two new parameters can then directly be interpreted as a threshold up to which rank errors are tolerated, and a threshold up to which the rank-distances are significant for the evaluation. Moreover, we propose a color representation of local quality to visually support the evaluation process for a given mapping, where every point in the mapping is colored according to its local contribution to the overall quality." --- Look at this carefully, and see if it could be taught in data mining (and whether it's worth doing so.)
to:NB  dimension_reduction  statistics  data_analysis  visual_display_of_quantitative_information  to_teach:data-mining 
october 2011 by cshalizi
"Dimension Reduction and Adaptation in Conditional Density Estimation" - Journal of the American Statistical Association - 105(490):761
Sounds neat: "An orthogonal series estimator of the conditional density of a response given a vector of continuous and ordinal/nominal categorical predictors ... The estimator is based on writing a conditional density as a sum of orthogonal projections on all possible subspaces of reduced dimensionality and then estimating each projection via a shrinkage procedure [which] uses a universal thresholding and a dyadic-blockwise shrinkage for low and high frequencies, respectively. The estimator is data-driven, is adaptive to underlying smoothness of a conditional density, and attains a minimax rate of the mean integrated squared error convergence. ... if a conditional density depends only on a subgroup of predictors, then the estimator seizes the opportunity and attains a corresponding minimax rate of convergence. The latter property relaxes the notorious “curse of dimensionality.” ... the estimator is fast, because neither projections nor shrinkages are computation-intensive."
density_estimation  dimension_reduction  statistics  estimation  shrinkage 
july 2010 by cshalizi
[1003.0529] A Unified Algorithmic Framework for Multi-Dimensional Scaling
"In this paper, we propose a unified algorithmic framework for solving many known variants of \mds. Our algorithm is a simple iterative scheme with guaranteed convergence, and is \emph{modular}; by changing the internals of a single subroutine in the algorithm, we can switch cost functions and target spaces easily. In addition to the formal guarantees of convergence, our algorithms are accurate; in most cases, they converge to better quality solutions than existing methods, in comparable time. "
multidimensional_scaling  dimension_reduction  visual_display_of_quantitative_information  to_teach:data-mining  data_mining 
march 2010 by cshalizi
Robust Manifold Unfolding with Kernel Regularization
Cool stuff, and I kind of want to teach this, or at least a kiddy version of it, in 350, but I was planning on doing dimensionality reduction long before the kernel trick. Hmmm....
kernel_methods  dimension_reduction  manifold_learning  wahba.grace  statistics  machine_learning  to_read  to_teach:data-mining 
september 2009 by cshalizi
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation (Belkin and Niyogi)
Using the spectral graph theory to get new features, for visualization, clustering, &c. Also includes a good discussion of the local linear embedding algorithm. Will try to teach this in 350 but may make heads explode.
to_teach:data-mining  clustering  graph_theory  spectral_clustering  dimension_reduction  niyogi.partha 
august 2009 by cshalizi
[0908.3400] Decomposing data sets into skewness modes
"We derive the nonlinear equations satisfied by the coefficients of linear combinations that maximize their skewness when their variance is constrained to take a specific value. In order to numerically solve these nonlinear equations we develop a gradient-type flow that preserves the constraint. In combination with the Karhunen-Lo\`eve decomposition this leads to a set of orthogonal modes with maximal skewness. For illustration purposes we apply these techniques to atmospheric data; in this case the maximal-skewness modes correspond to strongly localized atmospheric flows. We show how these ideas can be extended, for example to maximal-flatness modes."
dimension_reduction  data_analysis  principal_components  karhunen-loeve_decomposition  statistics 
august 2009 by cshalizi
A Meta-Analysis of Variance Accounted for and Factor Loadings in Exploratory Factor Analysis
Shorter Peterson: Your results look like a factor analysis of pure noise. Have a nice day. (Also, a citation in support of the folk wisdom that factor analysis doesn't work any better as data reduction than simple principal components analysis.)
factor_analysis  statistics  to:NB  to_teach:data-mining  via:moritz-heene  re:g_paper  dimension_reduction  principal_components  to_teach:undergrad-ADA 
may 2009 by cshalizi

related tags

bioinformatics  books:recommended  bootstrap  chaos  classifiers  clustering  compressed_sensing  cross-validation  data_analysis  data_mining  decision_trees  density_estimation  diffusion_maps  dimension_reduction  dsm  dynamical_systems  ensemble_methods  entropy_estimation  estimation  factor_analysis  gaussian_processes  genetics  graphical_models  graph_theory  have_read  high-dimensional_statistics  inference_to_latent_objects  in_NB  karhunen-loeve_decomposition  kernel_estimators  kernel_methods  kith_and_kin  lasso  latent_variables  learning_theory  lee.ann  linear_regression  low-regret_learning  luca.diana  machine_learning  manifold_learning  markov_models  model_selection  multidimensional_scaling  nadler.boaz  niyogi.partha  online_learning  popular_social_science  prediction  principal_components  R  re:AoS_project  re:g_paper  regression  richards.joey  roeder.kathryn  shrinkage  spectral_clustering  statistics  sufficiency  time_series  to:NB  to_read  to_teach:data-mining  to_teach:undergrad-ADA  track_down_references  via:arthegall  via:gmg  via:moritz-heene  via:shivak  visual_display_of_quantitative_information  wahba.grace  warmuth.manfred  wasserman.larry 

Copy this bookmark:



description:


tags: