cshalizi + inference_to_latent_objects   36

Lam , Yao : Factor modeling for high-dimensional time series: Inference for the number of factors
"This paper deals with the factor modeling for high-dimensional time series based on a dimension-reduction viewpoint. Under stationary settings, the inference is simple in the sense that both the number of factors and the factor loadings are estimated in terms of an eigenanalysis for a nonnegative definite matrix, and is therefore applicable when the dimension of time series is on the order of a few thousands. Asymptotic properties of the proposed method are investigated under two settings: (i) the sample size goes to infinity while the dimension of time series is fixed; and (ii) both the sample size and the dimension of time series go to infinity together. In particular, our estimators for zero-eigenvalues enjoy faster convergence (or slower divergence) rates, hence making the estimation for the number of factors easier. In particular, when the sample size and the dimension of time series go to infinity together, the estimators for the eigenvalues are no longer consistent. However, our estimator for the number of the factors, which is based on the ratios of the estimated eigenvalues, still works fine. Furthermore, this estimation shows the so-called “blessing of dimensionality” property in the sense that the performance of the estimation may improve when the dimension of time series increases. A two-step procedure is investigated when the factors are of different degrees of strength. Numerical illustration with both simulated and real data is also reported."
to:NB  dimension_reduction  factor_analysis  time_series  high-dimensional_statistics  inference_to_latent_objects 
9 days ago by cshalizi
[1203.3504] On Measurement Bias in Causal Inference
"This paper addresses the problem of measurement errors in causal inference and highlights several algebraic and graphical methods for eliminating systematic bias induced by such errors. In particulars, the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models."
to:NB  causal_inference  inference_to_latent_objects  pearl.judea  to_teach:undergrad-ADA  statistics  error_in_variables  via:arthegall 
18 days ago by cshalizi
Accurately estimating neuronal correlation requires a new spike-sorting paradigm
"Neurophysiology is increasingly focused on identifying coincident activity among neurons. Strong inferences about neural computation are made from the results of such studies, so it is important that these results be accurate. However, the preliminary step in the analysis of such data, the assignment of spike waveforms to individual neurons (“spike-sorting”), makes a critical assumption which undermines the analysis: that spikes, and hence neurons, are independent. We show that this assumption guarantees that coincident spiking estimates such as correlation coefficients are biased. We also show how to eliminate this bias. Our solution involves sorting spikes jointly, which contrasts with the current practice of sorting spikes independently of other spikes. This new “ensemble sorting” yields unbiased estimates of coincident spiking, and permits more data to be analyzed with confidence, improving the quality and quantity of neurophysiological inferences. These results should be of interest outside the context of neuronal correlations studies. Indeed, simultaneous recording of many neurons has become the rule rather than the exception in experiments, so it is essential to spike sort correctly if we are to make valid inferences about any properties of, and relationships between, neurons."
to:NB  heard_the_talk  neuroscience  neural_data_analysis  ventura.valerie  kith_and_kin  statistics  inference_to_latent_objects 
20 days ago by cshalizi
[1204.2581] Modeling Relational Data via Latent Factor Blockmodel
"In this paper we address the problem of modeling relational data, which appear in many applications such as social network analysis, recommender systems and bioinformatics. Previous studies either consider latent feature based models but disregarding local structure in the network, or focus exclusively on capturing local structure of objects based on latent blockmodels without coupling with latent characteristics of objects. To combine the benefits of the previous work, we propose a novel model that can simultaneously incorporate the effect of latent features and covariates if any, as well as the effect of latent structure that may exist in the data. To achieve this, we model the relation graph as a function of both latent feature factors and latent cluster memberships of objects to collectively discover globally predictive intrinsic properties of objects and capture latent block structure in the network to improve prediction performance. We also develop an optimization transfer algorithm based on the generalized EM-style strategy to learn the latent factors. We prove the efficacy of our proposed model through the link prediction task and cluster analysis task, and extensive experiments on the synthetic data and several real world datasets suggest that our proposed LFBM model outperforms the other state of the art approaches in the evaluated tasks."
in_NB  network_data_analysis  community_discovery  statistics  inference_to_latent_objects  factor_analysis  relational_learning 
6 weeks ago by cshalizi
[1204.0492] Non-detection of the Tooth Fairy at Optical Wavelengths
"We report a non-detection, to a limiting magnitude of V = 18.4 (9), of the elusive entity commonly described as the Tooth Fairy. We review various physical models and conclude that follow-up observations must precede an interpretation of our result."
funny:geeky  tooth_fairy  physics  astronomy  inference_to_latent_objects  absence_of_evidence  via:mejn 
7 weeks ago by cshalizi
[1203.3887] Learning Loopy Graphical Models with Latent Variables: Efficient Methods and Guarantees
"The problem of structure estimation in latent graphical models is considered, where some nodes are latent or hidden. A novel method is proposed which attempts to locally reconstruct latent trees and outputs a loopy graph structure with hidden variables. Correctness of the method is established when the underlying graph has a large girth and the model is in the regime of correlation decay, and PAC guarantees for the method are also derived. For the special case of the Ising model, the number of samples $n$ required for structural consistency scales as $n = Omega(theta_{min}^{-2delta eta(eta+1)-2}log p)$, where $theta_{min}$ is the minimum edge potential, $delta$ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and $eta$ is a parameter which depends on the bounds on node and edge potentials in the Ising model. The results are further specialized for the case when the observed nodes are uniformly sampled from the model. Finally, necessary conditions for structural consistency under any algorithm are derived."
to:NB  graphical_models  learning_theory  machine_learning  statistics  ising_model  inference_to_latent_objects 
9 weeks ago by cshalizi
[0809.5032] Identifiability of parameters in latent structure models with many observed variables
"While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latent-class model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions."
in_NB  statistics  identifiability  mixture_models  inference_to_latent_objects  re:homophily_and_confounding  to_read 
12 weeks ago by cshalizi
[0810.3177] Inferring sparse Gaussian graphical models with latent structure
"Our concern is selecting the concentration matrix's nonzero coefficients for a sparse Gaussian graphical model in a high-dimensional setting. This corresponds to estimating the graph of conditional dependencies between the variables. We describe a novel framework taking into account a latent structure on the concentration matrix. This latent structure is used to drive a penalty matrix and thus to recover a graphical model with a constrained topology. Our method uses an $ell_1$ penalized likelihood criterion. Inference of the graph of conditional dependencies between the variates and of the hidden variables is performed simultaneously in an iterative textsc{em}-like algorithm. The performances of our method is illustrated on synthetic as well as real data, the latter concerning breast cancer."
to:NB  graphical_models  lasso  sparsity  statistics  inference_to_latent_objects 
february 2012 by cshalizi
The Asymmetric Business Cycle
"The business cycle is a fundamental yet elusive concept in macroeconomics. In this paper, we consider the problem of measuring the business cycle. First, we argue for the output-gap view that the business cycle corresponds to transitory deviations in economic activity away from a permanent, or trend, level. Then we investigate the extent to which a general model-based approach to estimating trend and cycle for the U.S. economy leads to measures of the business cycle that reflect models versus the data. We find empirical support for a nonlinear time series model that produces a business cycle measure with an asymmetric shape across NBER expansion and recession phases. Specifically, this business cycle measure suggests that recessions are periods of relatively large and negative transitory fluctuations in output. However, several close competitors to the nonlinear model produce business cycle measures of widely differing shapes and magnitudes. Given this model-based uncertainty, we construct a model-averaged measure of the business cycle. This measure also displays an asymmetric shape and is closely related to other measures of economic slack such as the unemployment rate and capacity utilization."
--- Worthy, but at the same time makes me want to lock them in a room with a copy of Li and Racine's _Nonparametric Econometrics_, or even _The Elements of Statistical Learning_, and not let them out until they understand it.
in_NB  time_series  statistics  economics  macroeconomics  inference_to_latent_objects  re:your_favorite_dsge_sucks  morley.james  have_read  ensemble_methods  model_selection 
february 2012 by cshalizi
It isn’t simple to infer cognitive modules from behaviour – idiolect
"The conclusion is straightforward. Although inferring different processing stages (or 'modules') from additive factors in data is a venerable tradition in psychology, and one that remains popular (Sternberg, 2011), it is a mistake. As Henson (2011) points out, there's too much non-linearity in cognitive processing, so that you need additional constraints if you want to make inferences about cognitive modules."

--- I find it astonishing that anyone would ever have been tempted to make this inference at all.
cognitive_science  track_down_references  inference_to_latent_objects  experimental_psychology 
january 2012 by cshalizi
Nonlinear Models of Measurement Errors
"Measurement errors in economic data are pervasive and nontrivial in size. The presence of measurement errors causes biased and inconsistent parameter estimates and leads to erroneous conclusions to various degrees in economic analysis. While linear errors-in-variables models are usually handled with well-known instrumental variable methods, this article provides an overview of recent research papers that derive estimation methods that provide consistent estimates for nonlinear models with measurement errors. We review models with both classical and nonclassical measurement errors, and with misclassification of discrete variables. For each of the methods surveyed, we describe the key ideas for identification and estimation, and discuss its application whenever it is currently available." (Not read, reconsider to_teach tag later.)
to:NB  statistics  latent_variables  inference_to_latent_objects  instrumental_variables  econometrics  to_teach:undergrad-ADA 
december 2011 by cshalizi
[1112.2774] Measuring Tie Strength in Implicit Social Networks
"Given a set of people and a set of events they attend, we address the problem of measuring connectedness or tie strength between each pair of persons given that attendance at mutual events gives an implicit social network between people. We take an axiomatic approach to this problem. Starting from a list of axioms that a measure of tie strength must satisfy, we characterize functions that satisfy all the axioms and show that there is a range of measures that satisfy this characterization. A measure of tie strength induces a ranking on the edges (and on the set of neighbors for every person). We show that for applications where the ranking, and not the absolute value of the tie strength, is the important thing about the measure, the axioms are equivalent to a natural partial order. Also, to settle on a particular measure, we must make a non-obvious decision about extending this partial order to a total order, and that this decision is best left to particular applications. We classify measures found in prior literature according to the axioms that they satisfy. In our experiments, we measure tie strength and the coverage of our axioms in several datasets. Also, for each dataset, we bound the maximum Kendall's Tau divergence (which measures the number of pairwise disagreements between two lists) between all measures that satisfy the axioms using the partial order. This informs us if particular datasets are well behaved where we do not have to worry about which measure to choose, or we have to be careful about the exact choice of measure we make."
to:NB  network_data_analysis  inference_to_latent_objects 
december 2011 by cshalizi
PLoS ONE: The Small World of Psychopathology
"Background
Mental disorders are highly comorbid: people having one disorder are likely to have another as well. We explain empirical comorbidity patterns based on a network model of psychiatric symptoms, derived from an analysis of symptom overlap in the Diagnostic and Statistical Manual of Mental Disorders-IV (DSM-IV).

Principal Findings
We show that a) half of the symptoms in the DSM-IV network are connected, b) the architecture of these connections conforms to a small world structure, featuring a high degree of clustering but a short average path length, and c) distances between disorders in this structure predict empirical comorbidity rates. Network simulations of Major Depressive Episode and Generalized Anxiety Disorder show that the model faithfully reproduces empirical population statistics for these disorders.

Conclusions
In the network model, mental disorders are inherently complex. This explains the limited successes of genetic, neuroscientific, and etiological approaches to unravel their causes. We outline a psychosystems approach to investigate the structure and dynamics of mental disorders."
to:NB  psychometrics  psychiatry  network_data_analysis  inference_to_latent_objects  borsboom.denny  have_read  to:blog 
november 2011 by cshalizi
[1110.3076] Efficient Latent Variable Graphical Model Selection via Split Bregman Method
We consider the problem of covariance matrix estimation in the presence of latent variables. Under suitable conditions, it is possible to learn the marginal covariance matrix of the observed variables via a tractable convex program, where the concentration matrix of the observed variables is decomposed into a sparse matrix (representing the graphical structure of the observed variables) and a low rank matrix (representing the marginalization effect of latent variables). We present an efficient first-order method based on split Bregman to solve the convex problem. The algorithm is guaranteed to converge under mild conditions. We show that our algorithm is significantly faster than the state-of-the-art algorithm on both artificial and real-world data. Applying the algorithm to a gene expression data involving thousands of genes, we show that most of the correlation between observed variables can be explained by only a few dozen latent factors.
in_NB  graphical_models  inference_to_latent_objects  statistics 
october 2011 by cshalizi
Evidence for a Collective Intelligence Factor in the Performance of Human Groups | Science/AAAS
I will give this a fair shot, but the abstract is not promising at all.  A great fit to the one-factor model is, after all, precisely what you should expect if there are really an immense number of factors, but your measurement procedures are all crap and depend on random subsets of them.  (Perhaps I need to turn http://bactra.org/weblog/523.html into a proper paper after all.)
to_be_shot_after_a_fair_trial  collective_cognition  experimental_psychology  factor_analysis  via:nielsen  re:g_paper  inference_to_latent_objects 
december 2010 by cshalizi
Mariadassou, Robin, Vacher: Uncovering latent structure in valued graphs: A variational approach
"As more and more network-structured data sets are available, the statistical analysis of valued graphs has become common place. Looking for a latent structure is one of the many strategies used to better understand the behavior of a network. Several methods already exist for the binary case.

We present a model-based strategy to uncover groups of nodes in valued graphs. This framework can be used for a wide span of parametric random graphs models and allows to include covariates. Variational tools allow us to achieve approximate maximum likelihood estimation of the parameters of these models. We provide a simulation study showing that our estimation method performs well over a broad range of situations. We apply this method to analyze host–parasite interaction networks in forest ecosystems."
network_data_analysis  inference_to_latent_objects  community_discovery  statistics  estimation 
august 2010 by cshalizi
Alan J. Rocke: Image and Reality: Kekule, Kopp, and the Scientific Imagination
"Nineteenth-century chemists were faced with a particular problem: how to depict the atoms and molecules that are beyond the direct reach of our bodily senses. In visualizing this microworld, these scientists were the first to move beyond high-level philosophical speculations regarding the unseen. In Image and Reality, Alan Rocke focuses on the community of organic chemists in Germany to provide the basis for a fuller understanding of the nature of scientific creativity.
Arguing that visual mental images regularly assisted many of these scientists in thinking through old problems and new possibilities, Rocke uses a variety of sources, including private correspondence, diagrams and illustrations, scientific papers, and public statements, to investigate their ability to not only imagine the invisibly tiny atoms and molecules upon which they operated daily, but to build detailed and empirically based pictures of how all of the atoms in complicated molecules were interconnected."
books:noted  history_of_science  chemistry  inference_to_latent_objects 
june 2010 by cshalizi
Phantom of Heilbronn - Wikipedia, the free encyclopedia
In which the combined police forces of Europe spend years chasing a female serial killer known only from DNA evidence, only to find that it's all down to contaminated cotton swabs from a single supplier!

Teaching note for data mining: This should make a great example of the importance of getting the data right, before worrying about the statistical processing...
via:arsyed  serial_killers  to_teach:data-mining  bad_data  DNA_testing  forensics  wtf  inference_to_latent_objects  blogged 
may 2010 by cshalizi
Limits of declustering methods for disentangling exogenous from endogenous events in time series with foreshocks, main shocks, and aftershocks
"Many time series in natural and social sciences can be seen as resulting from an interplay between exogenous influences and an endogenous organization. We use a simple epidemic-type aftershock model of events occurring sequentially, in which future events are influenced (partially triggered) by past events to ask the question of how well can one disentangle the exogenous events from the endogenous ones. We apply both model-dependent and model-independent stochastic declustering methods to reconstruct the tree of ancestry and estimate key parameters. In contrast with previously reported positive results, we have to conclude that declustered catalogs are rather unreliable for the synthetic catalogs that we have investigated, which contains of the order of thousands of events, typical of realistic applications. The estimated rates of exogenous events suffer from large errors. The branching ratio n, quantifying the fraction of events that have been triggered by previous events, is also badly estimated in general from declustered catalogs. We find, however, that the errors tend to be smaller and perhaps acceptable in some cases for small triggering efficiency and branching ratios. The high level of randomness together with the long memory makes the stochastic reconstruction of trees of ancestry and the estimation of the key parameters perhaps intrinsically unreliable for long-memory processes. For shorter memories (larger “bare” Omori exponent), the results improve significantly."
statistics  time_series  branching_processes  in_NB  earthquakes  prediction  inference_to_latent_objects  sornette.didier  long-range_dependence 
july 2009 by cshalizi
[0710.4975] Node discovery problem for a social network
"We propose a heuristic algorithm to infer an invisible, functionally relevant person. Its performance (precision, recall, and F value) is demonstrated with a simulation experiment using a network derived from the Watts-Strogatz (WS) model."
network_data_analysis  inference_to_latent_objects 
october 2007 by cshalizi

related tags

absence_of_evidence  astronomy  bad_data  blogged  books:noted  books:recommended  borsboom.denny  branching_processes  causal_inference  chemistry  cognitive_science  collective_cognition  community_discovery  congress  dimension_reduction  DNA_testing  dsges  dynamics_in_cognition  earthquakes  econometrics  economics  ensemble_methods  error_in_variables  estimation  experimental_psychology  factor_analysis  forensics  funny:geeky  gordon.geoff  graphical_models  have_read  heard_the_talk  high-dimensional_statistics  history_of_science  hsu.daniel  ideal-point_models  identifiability  inference_to_latent_objects  influence  instrumental_variables  in_NB  ising_model  kakade.sham  kith_and_kin  krijnen.wim  lasso  latent_variables  learning_theory  levina.liza  long-range_dependence  low-rank_approximation  machine_learning  macroeconomics  markov_models  mixture_models  model_selection  morley.james  networks  network_data_analysis  neural_data_analysis  neuroscience  nominate  pearl.judea  philosophy_of_science  physics  prediction  psychiatry  psychometrics  re:AoS_project  re:critique_of_diffusion  re:g_paper  re:homophily_and_confounding  re:smoothing_adjacency_matrices  re:what_is_the_right_null_model_for_linear_regression  re:your_favorite_dsge_sucks  relational_learning  self-similarity  semantics_from_syntax  serial_killers  social_media  social_networks  song.le  sornette.didier  sparsity  spectral_methods  statistics  structural_equations  time_series  to:blog  to:NB  tooth_fairy  to_be_shot_after_a_fair_trial  to_read  to_teach:data-mining  to_teach:undergrad-ADA  track_down_references  transaction_networks  van_der_maas.h.l.j.  ventura.valerie  via:?  via:arsyed  via:arthegall  via:justin  via:mejn  via:nielsen  wtf  zhang.tong  zhu.ji 

Copy this bookmark:



description:


tags: