cshalizi + inference_to_latent_objects 36
Lam , Yao : Factor modeling for high-dimensional time series: Inference for the number of factors
9 days ago by cshalizi
"This paper deals with the factor modeling for high-dimensional time series based on a dimension-reduction viewpoint. Under stationary settings, the inference is simple in the sense that both the number of factors and the factor loadings are estimated in terms of an eigenanalysis for a nonnegative definite matrix, and is therefore applicable when the dimension of time series is on the order of a few thousands. Asymptotic properties of the proposed method are investigated under two settings: (i) the sample size goes to infinity while the dimension of time series is fixed; and (ii) both the sample size and the dimension of time series go to infinity together. In particular, our estimators for zero-eigenvalues enjoy faster convergence (or slower divergence) rates, hence making the estimation for the number of factors easier. In particular, when the sample size and the dimension of time series go to infinity together, the estimators for the eigenvalues are no longer consistent. However, our estimator for the number of the factors, which is based on the ratios of the estimated eigenvalues, still works fine. Furthermore, this estimation shows the so-called “blessing of dimensionality” property in the sense that the performance of the estimation may improve when the dimension of time series increases. A two-step procedure is investigated when the factors are of different degrees of strength. Numerical illustration with both simulated and real data is also reported."
to:NB
dimension_reduction
factor_analysis
time_series
high-dimensional_statistics
inference_to_latent_objects
9 days ago by cshalizi
[1203.3504] On Measurement Bias in Causal Inference
18 days ago by cshalizi
"This paper addresses the problem of measurement errors in causal inference and highlights several algebraic and graphical methods for eliminating systematic bias induced by such errors. In particulars, the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models."
to:NB
causal_inference
inference_to_latent_objects
pearl.judea
to_teach:undergrad-ADA
statistics
error_in_variables
via:arthegall
18 days ago by cshalizi
Accurately estimating neuronal correlation requires a new spike-sorting paradigm
20 days ago by cshalizi
"Neurophysiology is increasingly focused on identifying coincident activity among neurons. Strong inferences about neural computation are made from the results of such studies, so it is important that these results be accurate. However, the preliminary step in the analysis of such data, the assignment of spike waveforms to individual neurons (“spike-sorting”), makes a critical assumption which undermines the analysis: that spikes, and hence neurons, are independent. We show that this assumption guarantees that coincident spiking estimates such as correlation coefficients are biased. We also show how to eliminate this bias. Our solution involves sorting spikes jointly, which contrasts with the current practice of sorting spikes independently of other spikes. This new “ensemble sorting” yields unbiased estimates of coincident spiking, and permits more data to be analyzed with confidence, improving the quality and quantity of neurophysiological inferences. These results should be of interest outside the context of neuronal correlations studies. Indeed, simultaneous recording of many neurons has become the rule rather than the exception in experiments, so it is essential to spike sort correctly if we are to make valid inferences about any properties of, and relationships between, neurons."
to:NB
heard_the_talk
neuroscience
neural_data_analysis
ventura.valerie
kith_and_kin
statistics
inference_to_latent_objects
20 days ago by cshalizi
[1204.2581] Modeling Relational Data via Latent Factor Blockmodel
6 weeks ago by cshalizi
"In this paper we address the problem of modeling relational data, which appear in many applications such as social network analysis, recommender systems and bioinformatics. Previous studies either consider latent feature based models but disregarding local structure in the network, or focus exclusively on capturing local structure of objects based on latent blockmodels without coupling with latent characteristics of objects. To combine the benefits of the previous work, we propose a novel model that can simultaneously incorporate the effect of latent features and covariates if any, as well as the effect of latent structure that may exist in the data. To achieve this, we model the relation graph as a function of both latent feature factors and latent cluster memberships of objects to collectively discover globally predictive intrinsic properties of objects and capture latent block structure in the network to improve prediction performance. We also develop an optimization transfer algorithm based on the generalized EM-style strategy to learn the latent factors. We prove the efficacy of our proposed model through the link prediction task and cluster analysis task, and extensive experiments on the synthetic data and several real world datasets suggest that our proposed LFBM model outperforms the other state of the art approaches in the evaluated tasks."
in_NB
network_data_analysis
community_discovery
statistics
inference_to_latent_objects
factor_analysis
relational_learning
6 weeks ago by cshalizi
[1204.0492] Non-detection of the Tooth Fairy at Optical Wavelengths
7 weeks ago by cshalizi
"We report a non-detection, to a limiting magnitude of V = 18.4 (9), of the elusive entity commonly described as the Tooth Fairy. We review various physical models and conclude that follow-up observations must precede an interpretation of our result."
funny:geeky
tooth_fairy
physics
astronomy
inference_to_latent_objects
absence_of_evidence
via:mejn
7 weeks ago by cshalizi
[1203.3887] Learning Loopy Graphical Models with Latent Variables: Efficient Methods and Guarantees
9 weeks ago by cshalizi
"The problem of structure estimation in latent graphical models is considered, where some nodes are latent or hidden. A novel method is proposed which attempts to locally reconstruct latent trees and outputs a loopy graph structure with hidden variables. Correctness of the method is established when the underlying graph has a large girth and the model is in the regime of correlation decay, and PAC guarantees for the method are also derived. For the special case of the Ising model, the number of samples $n$ required for structural consistency scales as $n = Omega(theta_{min}^{-2delta eta(eta+1)-2}log p)$, where $theta_{min}$ is the minimum edge potential, $delta$ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and $eta$ is a parameter which depends on the bounds on node and edge potentials in the Ising model. The results are further specialized for the case when the observed nodes are uniformly sampled from the model. Finally, necessary conditions for structural consistency under any algorithm are derived."
to:NB
graphical_models
learning_theory
machine_learning
statistics
ising_model
inference_to_latent_objects
9 weeks ago by cshalizi
[0809.5032] Identifiability of parameters in latent structure models with many observed variables
12 weeks ago by cshalizi
"While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latent-class model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions."
in_NB
statistics
identifiability
mixture_models
inference_to_latent_objects
re:homophily_and_confounding
to_read
12 weeks ago by cshalizi
[0810.3177] Inferring sparse Gaussian graphical models with latent structure
february 2012 by cshalizi
"Our concern is selecting the concentration matrix's nonzero coefficients for a sparse Gaussian graphical model in a high-dimensional setting. This corresponds to estimating the graph of conditional dependencies between the variables. We describe a novel framework taking into account a latent structure on the concentration matrix. This latent structure is used to drive a penalty matrix and thus to recover a graphical model with a constrained topology. Our method uses an $ell_1$ penalized likelihood criterion. Inference of the graph of conditional dependencies between the variates and of the hidden variables is performed simultaneously in an iterative textsc{em}-like algorithm. The performances of our method is illustrated on synthetic as well as real data, the latter concerning breast cancer."
to:NB
graphical_models
lasso
sparsity
statistics
inference_to_latent_objects
february 2012 by cshalizi
The Asymmetric Business Cycle
february 2012 by cshalizi
"The business cycle is a fundamental yet elusive concept in macroeconomics. In this paper, we consider the problem of measuring the business cycle. First, we argue for the output-gap view that the business cycle corresponds to transitory deviations in economic activity away from a permanent, or trend, level. Then we investigate the extent to which a general model-based approach to estimating trend and cycle for the U.S. economy leads to measures of the business cycle that reflect models versus the data. We find empirical support for a nonlinear time series model that produces a business cycle measure with an asymmetric shape across NBER expansion and recession phases. Specifically, this business cycle measure suggests that recessions are periods of relatively large and negative transitory fluctuations in output. However, several close competitors to the nonlinear model produce business cycle measures of widely differing shapes and magnitudes. Given this model-based uncertainty, we construct a model-averaged measure of the business cycle. This measure also displays an asymmetric shape and is closely related to other measures of economic slack such as the unemployment rate and capacity utilization."
--- Worthy, but at the same time makes me want to lock them in a room with a copy of Li and Racine's _Nonparametric Econometrics_, or even _The Elements of Statistical Learning_, and not let them out until they understand it.
in_NB
time_series
statistics
economics
macroeconomics
inference_to_latent_objects
re:your_favorite_dsge_sucks
morley.james
have_read
ensemble_methods
model_selection
--- Worthy, but at the same time makes me want to lock them in a room with a copy of Li and Racine's _Nonparametric Econometrics_, or even _The Elements of Statistical Learning_, and not let them out until they understand it.
february 2012 by cshalizi
It isn’t simple to infer cognitive modules from behaviour – idiolect
january 2012 by cshalizi
"The conclusion is straightforward. Although inferring different processing stages (or 'modules') from additive factors in data is a venerable tradition in psychology, and one that remains popular (Sternberg, 2011), it is a mistake. As Henson (2011) points out, there's too much non-linearity in cognitive processing, so that you need additional constraints if you want to make inferences about cognitive modules."
--- I find it astonishing that anyone would ever have been tempted to make this inference at all.
cognitive_science
track_down_references
inference_to_latent_objects
experimental_psychology
--- I find it astonishing that anyone would ever have been tempted to make this inference at all.
january 2012 by cshalizi
Nonlinear Models of Measurement Errors
december 2011 by cshalizi
"Measurement errors in economic data are pervasive and nontrivial in size. The presence of measurement errors causes biased and inconsistent parameter estimates and leads to erroneous conclusions to various degrees in economic analysis. While linear errors-in-variables models are usually handled with well-known instrumental variable methods, this article provides an overview of recent research papers that derive estimation methods that provide consistent estimates for nonlinear models with measurement errors. We review models with both classical and nonclassical measurement errors, and with misclassification of discrete variables. For each of the methods surveyed, we describe the key ideas for identification and estimation, and discuss its application whenever it is currently available." (Not read, reconsider to_teach tag later.)
to:NB
statistics
latent_variables
inference_to_latent_objects
instrumental_variables
econometrics
to_teach:undergrad-ADA
december 2011 by cshalizi
[1112.2774] Measuring Tie Strength in Implicit Social Networks
december 2011 by cshalizi
"Given a set of people and a set of events they attend, we address the problem of measuring connectedness or tie strength between each pair of persons given that attendance at mutual events gives an implicit social network between people. We take an axiomatic approach to this problem. Starting from a list of axioms that a measure of tie strength must satisfy, we characterize functions that satisfy all the axioms and show that there is a range of measures that satisfy this characterization. A measure of tie strength induces a ranking on the edges (and on the set of neighbors for every person). We show that for applications where the ranking, and not the absolute value of the tie strength, is the important thing about the measure, the axioms are equivalent to a natural partial order. Also, to settle on a particular measure, we must make a non-obvious decision about extending this partial order to a total order, and that this decision is best left to particular applications. We classify measures found in prior literature according to the axioms that they satisfy. In our experiments, we measure tie strength and the coverage of our axioms in several datasets. Also, for each dataset, we bound the maximum Kendall's Tau divergence (which measures the number of pairwise disagreements between two lists) between all measures that satisfy the axioms using the partial order. This informs us if particular datasets are well behaved where we do not have to worry about which measure to choose, or we have to be careful about the exact choice of measure we make."
to:NB
network_data_analysis
inference_to_latent_objects
december 2011 by cshalizi
PLoS ONE: The Small World of Psychopathology
november 2011 by cshalizi
"Background
Mental disorders are highly comorbid: people having one disorder are likely to have another as well. We explain empirical comorbidity patterns based on a network model of psychiatric symptoms, derived from an analysis of symptom overlap in the Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV).
Principal Findings
We show that a) half of the symptoms in the DSM-IV network are connected, b) the architecture of these connections conforms to a small world structure, featuring a high degree of clustering but a short average path length, and c) distances between disorders in this structure predict empirical comorbidity rates. Network simulations of Major Depressive Episode and Generalized Anxiety Disorder show that the model faithfully reproduces empirical population statistics for these disorders.
Conclusions
In the network model, mental disorders are inherently complex. This explains the limited successes of genetic, neuroscientific, and etiological approaches to unravel their causes. We outline a psychosystems approach to investigate the structure and dynamics of mental disorders."
to:NB
psychometrics
psychiatry
network_data_analysis
inference_to_latent_objects
borsboom.denny
have_read
to:blog
Mental disorders are highly comorbid: people having one disorder are likely to have another as well. We explain empirical comorbidity patterns based on a network model of psychiatric symptoms, derived from an analysis of symptom overlap in the Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV).
Principal Findings
We show that a) half of the symptoms in the DSM-IV network are connected, b) the architecture of these connections conforms to a small world structure, featuring a high degree of clustering but a short average path length, and c) distances between disorders in this structure predict empirical comorbidity rates. Network simulations of Major Depressive Episode and Generalized Anxiety Disorder show that the model faithfully reproduces empirical population statistics for these disorders.
Conclusions
In the network model, mental disorders are inherently complex. This explains the limited successes of genetic, neuroscientific, and etiological approaches to unravel their causes. We outline a psychosystems approach to investigate the structure and dynamics of mental disorders."
november 2011 by cshalizi
[1110.3076] Efficient Latent Variable Graphical Model Selection via Split Bregman Method
october 2011 by cshalizi
We consider the problem of covariance matrix estimation in the presence of latent variables. Under suitable conditions, it is possible to learn the marginal covariance matrix of the observed variables via a tractable convex program, where the concentration matrix of the observed variables is decomposed into a sparse matrix (representing the graphical structure of the observed variables) and a low rank matrix (representing the marginalization effect of latent variables). We present an efficient first-order method based on split Bregman to solve the convex problem. The algorithm is guaranteed to converge under mild conditions. We show that our algorithm is significantly faster than the state-of-the-art algorithm on both artificial and real-world data. Applying the algorithm to a gene expression data involving thousands of genes, we show that most of the correlation between observed variables can be explained by only a few dozen latent factors.
in_NB
graphical_models
inference_to_latent_objects
statistics
october 2011 by cshalizi
[1107.1283] Spectral Methods for Learning Multivariate Latent Tree Structure
july 2011 by cshalizi
Huh, sounds like they're using tetrad equations?
markov_models
to_read
re:AoS_project
in_NB
graphical_models
inference_to_latent_objects
zhang.tong
kakade.sham
hsu.daniel
song.le
july 2011 by cshalizi
Evidence for a Collective Intelligence Factor in the Performance of Human Groups | Science/AAAS
december 2010 by cshalizi
I will give this a fair shot, but the abstract is not promising at all. A great fit to the one-factor model is, after all, precisely what you should expect if there are really an immense number of factors, but your measurement procedures are all crap and depend on random subsets of them. (Perhaps I need to turn http://bactra.org/weblog/523.html into a proper paper after all.)
to_be_shot_after_a_fair_trial
collective_cognition
experimental_psychology
factor_analysis
via:nielsen
re:g_paper
inference_to_latent_objects
december 2010 by cshalizi
Mariadassou, Robin, Vacher: Uncovering latent structure in valued graphs: A variational approach
august 2010 by cshalizi
"As more and more network-structured data sets are available, the statistical analysis of valued graphs has become common place. Looking for a latent structure is one of the many strategies used to better understand the behavior of a network. Several methods already exist for the binary case.
We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational tools allow us to achieve approximate maximum likelihood estimation of the parameters of these models. We provide a simulation study showing that our estimation method performs well over a broad range of situations. We apply this method to analyze host–parasite interaction networks in forest ecosystems."
network_data_analysis
inference_to_latent_objects
community_discovery
statistics
estimation
We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational tools allow us to achieve approximate maximum likelihood estimation of the parameters of these models. We provide a simulation study showing that our estimation method performs well over a broad range of situations. We apply this method to analyze host–parasite interaction networks in forest ecosystems."
august 2010 by cshalizi
Alan J. Rocke: Image and Reality: Kekule, Kopp, and the Scientific Imagination
june 2010 by cshalizi
"Nineteenth-century chemists were faced with a particular problem: how to depict the atoms and molecules that are beyond the direct reach of our bodily senses. In visualizing this microworld, these scientists were the first to move beyond high-level philosophical speculations regarding the unseen. In Image and Reality, Alan Rocke focuses on the community of organic chemists in Germany to provide the basis for a fuller understanding of the nature of scientific creativity.
Arguing that visual mental images regularly assisted many of these scientists in thinking through old problems and new possibilities, Rocke uses a variety of sources, including private correspondence, diagrams and illustrations, scientific papers, and public statements, to investigate their ability to not only imagine the invisibly tiny atoms and molecules upon which they operated daily, but to build detailed and empirically based pictures of how all of the atoms in complicated molecules were interconnected."
books:noted
history_of_science
chemistry
inference_to_latent_objects
Arguing that visual mental images regularly assisted many of these scientists in thinking through old problems and new possibilities, Rocke uses a variety of sources, including private correspondence, diagrams and illustrations, scientific papers, and public statements, to investigate their ability to not only imagine the invisibly tiny atoms and molecules upon which they operated daily, but to build detailed and empirically based pictures of how all of the atoms in complicated molecules were interconnected."
june 2010 by cshalizi
Phantom of Heilbronn - Wikipedia, the free encyclopedia
may 2010 by cshalizi
In which the combined police forces of Europe spend years chasing a female serial killer known only from DNA evidence, only to find that it's all down to contaminated cotton swabs from a single supplier!
Teaching note for data mining: This should make a great example of the importance of getting the data right, before worrying about the statistical processing...
via:arsyed
serial_killers
to_teach:data-mining
bad_data
DNA_testing
forensics
wtf
inference_to_latent_objects
blogged
Teaching note for data mining: This should make a great example of the importance of getting the data right, before worrying about the statistical processing...
may 2010 by cshalizi
Limits of declustering methods for disentangling exogenous from endogenous events in time series with foreshocks, main shocks, and aftershocks
july 2009 by cshalizi
"Many time series in natural and social sciences can be seen as resulting from an interplay between exogenous influences and an endogenous organization. We use a simple epidemic-type aftershock model of events occurring sequentially, in which future events are influenced (partially triggered) by past events to ask the question of how well can one disentangle the exogenous events from the endogenous ones. We apply both model-dependent and model-independent stochastic declustering methods to reconstruct the tree of ancestry and estimate key parameters. In contrast with previously reported positive results, we have to conclude that declustered catalogs are rather unreliable for the synthetic catalogs that we have investigated, which contains of the order of thousands of events, typical of realistic applications. The estimated rates of exogenous events suffer from large errors. The branching ratio n, quantifying the fraction of events that have been triggered by previous events, is also badly estimated in general from declustered catalogs. We find, however, that the errors tend to be smaller and perhaps acceptable in some cases for small triggering efficiency and branching ratios. The high level of randomness together with the long memory makes the stochastic reconstruction of trees of ancestry and the estimation of the key parameters perhaps intrinsically unreliable for long-memory processes. For shorter memories (larger “bare” Omori exponent), the results improve significantly."
statistics
time_series
branching_processes
in_NB
earthquakes
prediction
inference_to_latent_objects
sornette.didier
long-range_dependence
july 2009 by cshalizi
Measuring the Mind: Conceptual Issues in Contemporary Psychometrics - Borsboom [@Labyrinth]
august 2008 by cshalizi
Probably the best book available on the status of psychological measurements. Micro-review with links at http://bactra.org/weblog/algae-2008-01.html
books:recommended
psychometrics
philosophy_of_science
borsboom.denny
latent_variables
inference_to_latent_objects
august 2008 by cshalizi
Self-Similarity of Complex Networks and Hidden Metric Spaces
february 2008 by cshalizi
In what sense is this constructed geometry an _explanation_ of clustering???
networks
self-similarity
inference_to_latent_objects
february 2008 by cshalizi
[0710.4975] Node discovery problem for a social network
october 2007 by cshalizi
"We propose a heuristic algorithm to infer an invisible, functionally relevant person. Its performance (precision, recall, and F value) is demonstrated with a simulation experiment using a network derived from the Watts-Strogatz (WS) model."
network_data_analysis
inference_to_latent_objects
october 2007 by cshalizi
related tags
absence_of_evidence ⊕ astronomy ⊕ bad_data ⊕ blogged ⊕ books:noted ⊕ books:recommended ⊕ borsboom.denny ⊕ branching_processes ⊕ causal_inference ⊕ chemistry ⊕ cognitive_science ⊕ collective_cognition ⊕ community_discovery ⊕ congress ⊕ dimension_reduction ⊕ DNA_testing ⊕ dsges ⊕ dynamics_in_cognition ⊕ earthquakes ⊕ econometrics ⊕ economics ⊕ ensemble_methods ⊕ error_in_variables ⊕ estimation ⊕ experimental_psychology ⊕ factor_analysis ⊕ forensics ⊕ funny:geeky ⊕ gordon.geoff ⊕ graphical_models ⊕ have_read ⊕ heard_the_talk ⊕ high-dimensional_statistics ⊕ history_of_science ⊕ hsu.daniel ⊕ ideal-point_models ⊕ identifiability ⊕ inference_to_latent_objects ⊖ influence ⊕ instrumental_variables ⊕ in_NB ⊕ ising_model ⊕ kakade.sham ⊕ kith_and_kin ⊕ krijnen.wim ⊕ lasso ⊕ latent_variables ⊕ learning_theory ⊕ levina.liza ⊕ long-range_dependence ⊕ low-rank_approximation ⊕ machine_learning ⊕ macroeconomics ⊕ markov_models ⊕ mixture_models ⊕ model_selection ⊕ morley.james ⊕ networks ⊕ network_data_analysis ⊕ neural_data_analysis ⊕ neuroscience ⊕ nominate ⊕ pearl.judea ⊕ philosophy_of_science ⊕ physics ⊕ prediction ⊕ psychiatry ⊕ psychometrics ⊕ re:AoS_project ⊕ re:critique_of_diffusion ⊕ re:g_paper ⊕ re:homophily_and_confounding ⊕ re:smoothing_adjacency_matrices ⊕ re:what_is_the_right_null_model_for_linear_regression ⊕ re:your_favorite_dsge_sucks ⊕ relational_learning ⊕ self-similarity ⊕ semantics_from_syntax ⊕ serial_killers ⊕ social_media ⊕ social_networks ⊕ song.le ⊕ sornette.didier ⊕ sparsity ⊕ spectral_methods ⊕ statistics ⊕ structural_equations ⊕ time_series ⊕ to:blog ⊕ to:NB ⊕ tooth_fairy ⊕ to_be_shot_after_a_fair_trial ⊕ to_read ⊕ to_teach:data-mining ⊕ to_teach:undergrad-ADA ⊕ track_down_references ⊕ transaction_networks ⊕ van_der_maas.h.l.j. ⊕ ventura.valerie ⊕ via:? ⊕ via:arsyed ⊕ via:arthegall ⊕ via:justin ⊕ via:mejn ⊕ via:nielsen ⊕ wtf ⊕ zhang.tong ⊕ zhu.ji ⊕Copy this bookmark: