cshalizi + latent_variables   20

[1204.6703] Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation
"Topic models can be seen as a generalization of the clustering problem, in that they posit that observations are generated due to multiple latent factors (e.g. the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic probability vectors (the distributions over words for each topic), when only the words are observed and the corresponding topics are hidden.
"We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e. third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on k by k matrices, where k is the number of latent factors (e.g. the number of topics), rather than in the d-dimensional observed space (typically d >> k)."

That's a really remarkable claim, and I'd tag it to_be_shot_after_a_fair_trial if it weren't being made by genuinely serious people.
in_NB  to_read  latent_variables  topic_models  text_mining  mixture_models  statistics  machine_learning  cool_if_true  spectral_clustering 
27 days ago by cshalizi
Nonlinear Models of Measurement Errors
"Measurement errors in economic data are pervasive and nontrivial in size. The presence of measurement errors causes biased and inconsistent parameter estimates and leads to erroneous conclusions to various degrees in economic analysis. While linear errors-in-variables models are usually handled with well-known instrumental variable methods, this article provides an overview of recent research papers that derive estimation methods that provide consistent estimates for nonlinear models with measurement errors. We review models with both classical and nonclassical measurement errors, and with misclassification of discrete variables. For each of the methods surveyed, we describe the key ideas for identification and estimation, and discuss its application whenever it is currently available." (Not read, reconsider to_teach tag later.)
to:NB  statistics  latent_variables  inference_to_latent_objects  instrumental_variables  econometrics  to_teach:undergrad-ADA 
december 2011 by cshalizi
[1002.4802] Gaussian Process Structural Equation Models with Latent Variables
"In a variety of disciplines such as social sciences, psychology, medicine and economics, the recorded data are considered to be noisy measurements of latent variables connected by some causal structure. This corresponds to a family of graphical models known as the structural equation model with latent variables. While linear non-Gaussian variants have been well-studied, inference in nonparametric structural equation models is still underdeveloped. We introduce a sparse Gaussian process parameterization that defines a non-linear structure connecting latent variables, unlike common formulations of Gaussian process latent variable models. An efficient Markov chain Monte Carlo procedure is described. We evaluate the stability of the sampling procedure and the predictive ability of the model compared against the current practice."
statistics  graphical_models  latent_variables  nonparametrics  estimation  heard_the_talk 
february 2010 by cshalizi
Inverse problems as statistics (Evans and Stark, 2001)
"For a statistician, an inverse problem is an inference or estimation problem. The data are finite in number and contain errors, as they do in classical ... problems, and the unknown typically is infinite-dimensional, as it is in nonparametric regression. The additional complication in an inverse problem is that the data are only indirectly related to the unknown. Canonical abstract formulations of statistical estimation problems subsume this complication by allowing probability distributions to be indexed in more-or-less arbitrary ways by parameters, which can be infinite-dimensional. Standard statistical concepts, questions, and considerations such as bias, variance, mean-squared error, identifiability, consistency, efficiency, and various forms of optimality, apply to inverse problems. This article discusses inverse problems as statistical estimation and inference problems, and points to the literature for a variety of techniques and results."
inverse_problems  statistics  nonparametrics  estimation  latent_variables  to_read  to_teach:complexity-and-inference 
june 2009 by cshalizi
Partisan Influence in Congress and Institutional Change
I am not surprised that Nominate is unstable under subsampling, but I had no idea it was _that_ unstable.
congress  nominate  clustering  statistics  political_science  latent_variables  via:justin 
may 2009 by cshalizi
Applying Discrete PCA in Data Analysis
I heard Alek talk about this at UAI 2004... and then forgot about it completely when I taught data mining. My bad.
to_teach:data-mining  principal_components  independent_components_analysis  statistics  latent_variables  latent_semantic_analysis  to:NB  jakulin.aleks  buntine.wray 
may 2009 by cshalizi

related tags

bacanu.silviu-alin  bioinformatics  books:recommended  borsboom.denny  buntine.wray  causal_inference  change-point_problem  clustering  community_discovery  computational_statistics  confounding  congress  cool_if_true  devlin.bernie  dimension_reduction  econometrics  estimation  exponential_families  factor_analysis  fox.emily  genetics  genomic_control  graphical_models  gustafson.paul  heard_the_talk  identifiability  independent_components_analysis  inference_to_latent_objects  instrumental_variables  inverse_problems  in_NB  iq  i_told_you_so  jakulin.aleks  jordan.michael_i.  kith_and_kin  lasso  latent_semantic_analysis  latent_variables  machine_learning  markov_models  mental_testing  mixture_models  model_selection  network_data_analysis  nominate  nonparametrics  particle_filters  philosophy_of_science  political_science  principal_components  psychometrics  re:g_paper  re:stacs  regression  roeder.kathryn  spearman.charles  spectral_clustering  statistics  stochastic_processes  tetrad  text_mining  thomson.godfrey  to:blog  to:NB  topic_models  to_read  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:undergrad-ADA  track_down_references  via:guslacerda  via:justin  via:moritz-heene  via:shivak 

Copy this bookmark:



description:


tags: