cshalizi + latent_variables 20
[1204.6703] Two SVDs Suffice: Spectral decompositions for probabilistic topic modeling and latent Dirichlet allocation
27 days ago by cshalizi
"Topic models can be seen as a generalization of the clustering problem, in that they posit that observations are generated due to multiple latent factors (e.g. the words in each document are generated as a mixture of several active topics, as opposed to just one). This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic probability vectors (the distributions over words for each topic), when only the words are observed and the corresponding topics are hidden.
"We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e. third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on k by k matrices, where k is the number of latent factors (e.g. the number of topics), rather than in the d-dimensional observed space (typically d >> k)."
That's a really remarkable claim, and I'd tag it to_be_shot_after_a_fair_trial if it weren't being made by genuinely serious people.
in_NB
to_read
latent_variables
topic_models
text_mining
mixture_models
statistics
machine_learning
cool_if_true
spectral_clustering
"We provide a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of mixture models, including the popular latent Dirichlet allocation (LDA) model. For LDA, the procedure correctly recovers both the topic probability vectors and the prior over the topics, using only trigram statistics (i.e. third order moments, which may be estimated with documents containing just three words). The method, termed Excess Correlation Analysis (ECA), is based on a spectral decomposition of low order moments (third and fourth order) via two singular value decompositions (SVDs). Moreover, the algorithm is scalable since the SVD operations are carried out on k by k matrices, where k is the number of latent factors (e.g. the number of topics), rather than in the d-dimensional observed space (typically d >> k)."
That's a really remarkable claim, and I'd tag it to_be_shot_after_a_fair_trial if it weren't being made by genuinely serious people.
27 days ago by cshalizi
Nonlinear Models of Measurement Errors
december 2011 by cshalizi
"Measurement errors in economic data are pervasive and nontrivial in size. The presence of measurement errors causes biased and inconsistent parameter estimates and leads to erroneous conclusions to various degrees in economic analysis. While linear errors-in-variables models are usually handled with well-known instrumental variable methods, this article provides an overview of recent research papers that derive estimation methods that provide consistent estimates for nonlinear models with measurement errors. We review models with both classical and nonclassical measurement errors, and with misclassification of discrete variables. For each of the methods surveyed, we describe the key ideas for identification and estimation, and discuss its application whenever it is currently available." (Not read, reconsider to_teach tag later.)
to:NB
statistics
latent_variables
inference_to_latent_objects
instrumental_variables
econometrics
to_teach:undergrad-ADA
december 2011 by cshalizi
[1002.4802] Gaussian Process Structural Equation Models with Latent Variables
february 2010 by cshalizi
"In a variety of disciplines such as social sciences, psychology, medicine and economics, the recorded data are considered to be noisy measurements of latent variables connected by some causal structure. This corresponds to a family of graphical models known as the structural equation model with latent variables. While linear non-Gaussian variants have been well-studied, inference in nonparametric structural equation models is still underdeveloped. We introduce a sparse Gaussian process parameterization that defines a non-linear structure connecting latent variables, unlike common formulations of Gaussian process latent variable models. An efficient Markov chain Monte Carlo procedure is described. We evaluate the stability of the sampling procedure and the predictive ability of the model compared against the current practice."
statistics
graphical_models
latent_variables
nonparametrics
estimation
heard_the_talk
february 2010 by cshalizi
A New Lease on Life for Thomson's Bonds Model of Intelligence (Bartholomew, Deary and Lawn, 2009)
august 2009 by cshalizi
I _told_ you so. (Though they are _shockingly_ naive about fMRI and brain organization.)
to:blog
iq
mental_testing
factor_analysis
psychometrics
thomson.godfrey
spearman.charles
latent_variables
re:g_paper
via:moritz-heene
i_told_you_so
august 2009 by cshalizi
Inverse problems as statistics (Evans and Stark, 2001)
june 2009 by cshalizi
"For a statistician, an inverse problem is an inference or estimation problem. The data are finite in number and contain errors, as they do in classical ... problems, and the unknown typically is infinite-dimensional, as it is in nonparametric regression. The additional complication in an inverse problem is that the data are only indirectly related to the unknown. Canonical abstract formulations of statistical estimation problems subsume this complication by allowing probability distributions to be indexed in more-or-less arbitrary ways by parameters, which can be infinite-dimensional. Standard statistical concepts, questions, and considerations such as bias, variance, mean-squared error, identifiability, consistency, efficiency, and various forms of optimality, apply to inverse problems. This article discusses inverse problems as statistical estimation and inference problems, and points to the literature for a variety of techniques and results."
inverse_problems
statistics
nonparametrics
estimation
latent_variables
to_read
to_teach:complexity-and-inference
june 2009 by cshalizi
Tetrad Project Homepage
june 2009 by cshalizi
Have I really not bookmarked this before?
tetrad
causal_inference
graphical_models
machine_learning
statistics
philosophy_of_science
latent_variables
june 2009 by cshalizi
"Statistical Theory and Methods for Complex, High-Dimensional Data"
june 2009 by cshalizi
Loads of talks.
statistics
machine_learning
model_selection
graphical_models
regression
latent_variables
principal_components
factor_analysis
dimension_reduction
lasso
bioinformatics
track_down_references
via:shivak
june 2009 by cshalizi
Partisan Influence in Congress and Institutional Change
may 2009 by cshalizi
I am not surprised that Nominate is unstable under subsampling, but I had no idea it was _that_ unstable.
congress
nominate
clustering
statistics
political_science
latent_variables
via:justin
may 2009 by cshalizi
Applying Discrete PCA in Data Analysis
may 2009 by cshalizi
I heard Alek talk about this at UAI 2004... and then forgot about it completely when I taught data mining. My bad.
to_teach:data-mining
principal_components
independent_components_analysis
statistics
latent_variables
latent_semantic_analysis
to:NB
jakulin.aleks
buntine.wray
may 2009 by cshalizi
Measuring the Mind: Conceptual Issues in Contemporary Psychometrics - Borsboom [@Labyrinth]
august 2008 by cshalizi
Probably the best book available on the status of psychological measurements. Micro-review with links at http://bactra.org/weblog/algae-2008-01.html
books:recommended
psychometrics
philosophy_of_science
borsboom.denny
latent_variables
inference_to_latent_objects
august 2008 by cshalizi
related tags
bacanu.silviu-alin ⊕ bioinformatics ⊕ books:recommended ⊕ borsboom.denny ⊕ buntine.wray ⊕ causal_inference ⊕ change-point_problem ⊕ clustering ⊕ community_discovery ⊕ computational_statistics ⊕ confounding ⊕ congress ⊕ cool_if_true ⊕ devlin.bernie ⊕ dimension_reduction ⊕ econometrics ⊕ estimation ⊕ exponential_families ⊕ factor_analysis ⊕ fox.emily ⊕ genetics ⊕ genomic_control ⊕ graphical_models ⊕ gustafson.paul ⊕ heard_the_talk ⊕ identifiability ⊕ independent_components_analysis ⊕ inference_to_latent_objects ⊕ instrumental_variables ⊕ inverse_problems ⊕ in_NB ⊕ iq ⊕ i_told_you_so ⊕ jakulin.aleks ⊕ jordan.michael_i. ⊕ kith_and_kin ⊕ lasso ⊕ latent_semantic_analysis ⊕ latent_variables ⊖ machine_learning ⊕ markov_models ⊕ mental_testing ⊕ mixture_models ⊕ model_selection ⊕ network_data_analysis ⊕ nominate ⊕ nonparametrics ⊕ particle_filters ⊕ philosophy_of_science ⊕ political_science ⊕ principal_components ⊕ psychometrics ⊕ re:g_paper ⊕ re:stacs ⊕ regression ⊕ roeder.kathryn ⊕ spearman.charles ⊕ spectral_clustering ⊕ statistics ⊕ stochastic_processes ⊕ tetrad ⊕ text_mining ⊕ thomson.godfrey ⊕ to:blog ⊕ to:NB ⊕ topic_models ⊕ to_read ⊕ to_teach:complexity-and-inference ⊕ to_teach:data-mining ⊕ to_teach:undergrad-ADA ⊕ track_down_references ⊕ via:guslacerda ⊕ via:justin ⊕ via:moritz-heene ⊕ via:shivak ⊕Copy this bookmark: