cshalizi + graphical_models 122
[1205.0241] Counterfactual Graphical Models for Mediation Analysis via Path-Specific Effects
17 days ago by cshalizi
"Potential outcome counterfactuals represent variation in the outcome of interest after a hypothetical treatment or intervention is performed. Causal graphical models are a concise, intuitive way of representing causal assumptions, including independence constraints among such counterfactuals. Much of modern causal inference is concerned with expressing cause effect relationships of interest in counterfactual form, showing how the resulting counterfactuals can be identified (that is expressed in terms of available data, using domain-specific causal assumptions), and subsequently estimated using statistical methods. In this paper we will use causal graphical models to analyze the identification problem of the so-called emph{path-specific effects}, that is effects of treatment on outcome along certain specified causal paths. Such effects arise in mediation analysis settings where it's important to distinguish direct and indirect effects of treatment. We review existing results on path-specific effects in the fully observable, static treatment setting, and extend them to settings with time-varying treatments, and latent variables."
to:NB
causal_inference
shpister.ilya
graphical_models
17 days ago by cshalizi
Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies
25 days ago by cshalizi
"We present methods able to predict the presence and strength of conditional and unconditional dependencies (correlations) between two variables Y and Z never jointly measured on the same samples, based on multiple data sets measuring a set of common variables. The algorithms are specializations of prior work on learning causal structures from overlapping variable sets. This problem has also been addressed in the field of statistical matching. The proposed methods are applied to a wide range of domains and are shown to accurately predict the presence of thousands of dependencies. Compared against prototypical statistical matching algorithms and within the scope of our experiments, the proposed algorithms make predictions that are better correlated with the sample estimates of the unknown parameters on test data ; this is particularly the case when the number of commonly measured variables is low.
"The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causally-inspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org."
to:NB
to_read
causal_inference
graphical_models
to_teach:undergrad-ADA
"The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causally-inspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org."
25 days ago by cshalizi
"The huge Package for High-dimensional Undirected Graph Estimation in R"
25 days ago by cshalizi
"We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efficiency."
to:NB
to_teach:undergrad-ADA
graphical_models
statistics
kith_and_kin
wasserman.larry
roeder.kathryn
liu.han
25 days ago by cshalizi
[1204.5540] Learning Graph Structure in Discrete Markov Random Fields
4 weeks ago by cshalizi
"We present a general algorithm for learning the structure of discrete Markov random fields from i.i.d. samples. Several algorithms have been proposed for structure learning algorithms earlier and each of these address the learning problem under different assumptions. Our algorithm provides a unified view in the following sense: when our algorithm is applied to each of the special cases, it results in a the same computational complexity as earlier algorithms. More importantly, our approach also provides a new low-computational complexity algorithm for the case of Ising models where the underlying graph is the Erdos-Renyi random graph G(p,c/p)."
When would you ever want to learn an Ising model on an E-R graph?
to:NB
graphical_models
machine_learning
networks
When would you ever want to learn an Ising model on an E-R graph?
4 weeks ago by cshalizi
[1204.3941] A Log-Linear Graphical Model for Inferring Genetic Networks from High-Throughput Sequencing Data
5 weeks ago by cshalizi
"We develop a novel method for estimating high-dimensional Poisson graphical models, the Log-Linear Graphical Model, allowing us to infer networks based on high-throughput sequencing data. Our model assumes that conditional on all other genes, each gene is Poisson, jointly defining a pair-wise Poisson Markov random field. We estimate our genetic networks via neighborhood selection by fitting `1-norm penalized log-linear models, an approach we call the Poisson Graphical Lasso. Additionally, we develop a fast parallel algorithm, permitting us to fit our graphical models to high-dimensional genomic data sets."
in_NB
graphical_models
gene_expression_data_analysis
lasso
network_data_analysis
statistics
regression
5 weeks ago by cshalizi
[1204.2003] Directed Information Graphs
6 weeks ago by cshalizi
"We propose two graphical models to represent a concise description of the causal statistical dependence structure between a group of coupled stochastic processes. The first, minimum generative model graphs, is motivated by generative models. The second, directed information graphs, is motivated by Granger causality. We show that under mild assumptions, the graphs are identical. In fact, these are analogous to Bayesian and Markov networks respectively, in terms of Markov blankets and I-map properties. Furthermore, the underlying variable dependence structure is the unique causal Bayesian network. Lastly, we present a method using minimal-dimension statistics to identify the structure when upper bounds on the in-degrees are known. Simulations show the effectiveness of the approach."
to:NB
graphical_models
to_read
re:functional_communities
causality
information_theory
coleman.todd
6 weeks ago by cshalizi
Colombo , Maathuis , Kalisch , Richardson : Learning high-dimensional directed acyclic graphs with latent and selection variables
7 weeks ago by cshalizi
"We consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables. The FCI (Fast Causal Inference) algorithm has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose the new RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct in the asymptotic limit. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."
--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
in_NB
have_read
statistics
graphical_models
causal_inference
sparsity
to_teach:undergrad-ADA
--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
7 weeks ago by cshalizi
[1203.3037] Expanding the Transfer Entropy to Identify Information Subgraphs in Complex Systems
7 weeks ago by cshalizi
"We propose a formal expansion of the transfer entropy to put in evidence irreducible sets of variables which provide information for the future state of each assigned target. Multiplets characterized by an high value will be associated to informational circuits present in the system, with an informational character (synergetic or redundant) which can be associated to the sign of the contribution. We also present preliminary results on fMRI and EEG data sets."
in_NB
graphical_models
information_theory
community_discovery
time_series
re:functional_communities
7 weeks ago by cshalizi
[1203.0697] Learning High-Dimensional Mixtures of Graphical Models
7 weeks ago by cshalizi
"We consider the problem of learning mixtures of discrete graphical models in high dimensions and propose a novel method for estimating the mixture components with provable guarantees. The method proceeds mainly in three stages. In the first stage, it estimates the union of the Markov graphs of the mixture components (referred to as the union graph) via a series of rank tests. It then uses this estimated union graph to compute the mixture components via a spectral decomposition method. The spectral decomposition method was originally proposed for latent class models, and we adapt this method for learning the more general class of graphical model mixtures. In the end, the method produces tree approximations of the mixture components via the Chow-Liu algorithm. Our output is thus a tree-mixture model which serves as a good approximation to the underlying graphical model mixture. When the union graph has sparse node separators, we prove that our method has sample and computational complexities scaling as poly(p, d, r), for an r-component mixture of p-variate graphical models, where d is the cardinality of the sample space of each node variable. We also extend our results to the case when the union graph has sparse local separators, which is a weaker criterion than having sparse exact separators, and when the mixture components are in the regime of correlation decay. The computational and sample complexities of our method for this class are significantly improved, since they involve an upper bound on the cardinality of local separators (as opposed to exact separators). Our results push the realm of tractable model classes for high-dimensional learning, which includes the class of tree mixtures."
in_NB
mixture_models
ensemble_methods
graphical_models
machine_learning
to_read
chow-liu_algorithm
7 weeks ago by cshalizi
Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
7 weeks ago by cshalizi
"We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples."
--- I wonder whether this hasn't some application to the PC algorithm?
to:NB
graphical_models
lasso
sparsity
statistics
heard_the_talk
--- I wonder whether this hasn't some application to the PC algorithm?
7 weeks ago by cshalizi
[1203.6502] Quantifying causal influences
8 weeks ago by cshalizi
"Common methods of causal inference generate directed acyclic graphs (DAGs) that formalize causal relations between n variables. Given the joint distribution of all these variables, the DAG contains all information about how intervening on one variable would change the distribution of the other n-1 variables. It remains, however, a non-trivial question how to quantify the causal influence of one variable on another one.
Here we propose a measure for causal strength that refers to direct effects and measure the "strength of an arrow" or a set of arrows. It is based on a hypothetical intervention that modifies the joint distribution by cutting the corresponding edge. The causal strength is then the relative entropy distance between the old and the new distribution.
We discuss other measures of causal strength like the average causal effect, transfer entropy and information flow and describe their limitations. We argue that our measure is also more appropriate for time series than the known ones.
Finally, we discuss conceptual problems in defining the strength of indirect effects."
to:NB
to_read
causality
graphical_models
information_theory
statistics
via:ded-maxim
Here we propose a measure for causal strength that refers to direct effects and measure the "strength of an arrow" or a set of arrows. It is based on a hypothetical intervention that modifies the joint distribution by cutting the corresponding edge. The causal strength is then the relative entropy distance between the old and the new distribution.
We discuss other measures of causal strength like the average causal effect, transfer entropy and information flow and describe their limitations. We argue that our measure is also more appropriate for time series than the known ones.
Finally, we discuss conceptual problems in defining the strength of indirect effects."
8 weeks ago by cshalizi
[1203.3887] Learning Loopy Graphical Models with Latent Variables: Efficient Methods and Guarantees
9 weeks ago by cshalizi
"The problem of structure estimation in latent graphical models is considered, where some nodes are latent or hidden. A novel method is proposed which attempts to locally reconstruct latent trees and outputs a loopy graph structure with hidden variables. Correctness of the method is established when the underlying graph has a large girth and the model is in the regime of correlation decay, and PAC guarantees for the method are also derived. For the special case of the Ising model, the number of samples $n$ required for structural consistency scales as $n = Omega(theta_{min}^{-2delta eta(eta+1)-2}log p)$, where $theta_{min}$ is the minimum edge potential, $delta$ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and $eta$ is a parameter which depends on the bounds on node and edge potentials in the Ising model. The results are further specialized for the case when the observed nodes are uniformly sampled from the model. Finally, necessary conditions for structural consistency under any algorithm are derived."
to:NB
graphical_models
learning_theory
machine_learning
statistics
ising_model
inference_to_latent_objects
9 weeks ago by cshalizi
[0704.2551] Inferring dynamic genetic networks with low order independencies
10 weeks ago by cshalizi
"In this paper, we propose a novel inference method for dynamic genetic networks which makes it possible to face with a number of time measurements n much smaller than the number of genes p. The approach is based on the concept of low order conditional dependence graph that we extend here in the case of Dynamic Bayesian Networks. Most of our results are based on the theory of graphical models associated with the Directed Acyclic Graphs (DAGs). In this way, we define a minimal DAG G which describes exactly the full order conditional dependencies given the past of the process. Then, to face with the large p and small n estimation case, we propose to approximate DAG G by considering low order conditional independencies. We introduce partial qth order conditional dependence DAGs G(q) and analyze their probabilistic properties. In general, DAGs G(q) differ from DAG G but still reflect relevant dependence facts for sparse networks such as genetic networks. By using this approximation, we set out a non-bayesian inference method and demonstrate the effectiveness of this approach on both simulated and real data analysis. The inference procedure is implemented in the R package 'G1DBN' freely available from the CRAN archive."
to:NB
graphical_models
gene_expression_data_analysis
biochemical_networks
statistics
10 weeks ago by cshalizi
[1202.3765] Learning mixed graphical models from data with p larger than n
11 weeks ago by cshalizi
"Structure learning of Gaussian graphical models is an extensively studied problem in the classical multivariate setting where the sample size n is larger than the number of random variables p, as well as in the more challenging setting when p>>n. However, analogous approaches for learning the structure of graphical models with mixed discrete and continuous variables when p>>n remain largely unexplored. Here we describe a statistical learning procedure for this problem based on limited-order correlations and assess its performance with synthetic and real data."
to:NB
graphical_models
statistics
11 weeks ago by cshalizi
[1202.3733] Lipschitz Parametrization of Probabilistic Graphical Models
11 weeks ago by cshalizi
"We show that the log-likelihood of several probabilistic graphical models is Lipschitz continuous with respect to the lp-norm of the parameters. We discuss several implications of Lipschitz parametrization. We present an upper bound of the Kullback-Leibler divergence that allows understanding methods that penalize the lp-norm of differences of parameters as the minimization of that upper bound. The expected log-likelihood is lower bounded by the negative lp-norm, which allows understanding the generalization ability of probabilistic models. The exponential of the negative lp-norm is involved in the lower bound of the Bayes error rate, which shows that it is reasonable to use parameters as features in algorithms that rely on metric spaces (e.g. classification, dimensionality reduction, clustering). Our results do not rely on specific algorithms for learning the structure or parameters. We show preliminary results for activity recognition and temporal segmentation."
to:NB
graphical_models
11 weeks ago by cshalizi
[1202.3731] What Cannot be Learned with Bethe Approximations
11 weeks ago by cshalizi
"We address the problem of learning the parameters in graphical models when inference is intractable. A common strategy in this case is to replace the partition function with its Bethe approximation. We show that there exists a regime of empirical marginals where such Bethe learning will fail. By failure we mean that the empirical marginals cannot be recovered from the approximated maximum likelihood parameters (i.e., moment matching is not achieved). We provide several conditions on empirical marginals that yield outer and inner bounds on the set of Bethe learnable marginals. An interesting implication of our results is that there exists a large class of marginals that cannot be obtained as stable fixed points of belief propagation. Taken together our results provide a novel approach to analyzing learning with Bethe approximations and highlight when it can be expected to work or fail."
to:NB
graphical_models
11 weeks ago by cshalizi
[1202.1787] Greedy Learning of Markov Network Structure
12 weeks ago by cshalizi
"We propose a new yet natural algorithm for learning the graph structure of general discrete graphical models (a.k.a. Markov random fields) from samples. Our algorithm finds the neighborhood of a node by sequentially adding nodes that produce the largest reduction in empirical conditional entropy; it is greedy in the sense that the choice of addition is based only on the reduction achieved at that iteration. Its sequential nature gives it a lower computational complexity as compared to other existing comparison-based techniques, all of which involve exhaustive searches over every node set of a certain size. Our main result characterizes the sample complexity of this procedure, as a function of node degrees, graph size and girth in factor-graph representation. We subsequently specialize this result to the case of Ising models, where we provide a simple transparent characterization of sample complexity as a function of model and graph parameters.
For tree graphs, our algorithm is the same as the classical Chow-Liu algorithm, and in that sense can be considered the extension of the same to graphs with cycles."
to:NB
graphical_models
machine_learning
information_theory
For tree graphs, our algorithm is the same as the classical Chow-Liu algorithm, and in that sense can be considered the extension of the same to graphs with cycles."
12 weeks ago by cshalizi
[1202.6590] Uniform generation of random acyclic digraphs
12 weeks ago by cshalizi
"We show how to sample acyclic digraphs uniformly at random through recursive enumeration. This provides an exact method which avoids the convergence issues of the alternative Markov chain methods. The limiting behaviour of the distribution of acyclic digraphs also allows us to sample arbitrarily large acyclic digraphs. Finally we discuss how to include various restrictions in the combinatorial enumeration for efficient uniform sampling of the corresponding graphs."
to:NB
graph_theory
combinatorics
graphical_models
12 weeks ago by cshalizi
Graphical Models with R
12 weeks ago by cshalizi
"Graphical models in their modern form have been around since the late 1970s and appear today in many areas of the sciences. Along with the ongoing developments of graphical models, a number of different graphical modeling software programs have been written over the years. In recent years many of these software developments have taken place within the R community, either in the form of new packages or by providing an R interface to existing software. This book attempts to give the reader a gentle introduction to graphical modeling using R and the main features of some of these packages. In addition, the book provides examples of how more advanced aspects of graphical modeling can be represented and handled within R. Topics covered in the seven chapters include graphical models for contingency tables, Gaussian and mixed graphical models, Bayesian networks and modeling high dimensional data."
to:NB
books:noted
R
statistics
graphical_models
lauritzen.steffen
computational_statistics
12 weeks ago by cshalizi
[0810.3177] Inferring sparse Gaussian graphical models with latent structure
february 2012 by cshalizi
"Our concern is selecting the concentration matrix's nonzero coefficients for a sparse Gaussian graphical model in a high-dimensional setting. This corresponds to estimating the graph of conditional dependencies between the variables. We describe a novel framework taking into account a latent structure on the concentration matrix. This latent structure is used to drive a penalty matrix and thus to recover a graphical model with a constrained topology. Our method uses an $ell_1$ penalized likelihood criterion. Inference of the graph of conditional dependencies between the variates and of the hidden variables is performed simultaneously in an iterative textsc{em}-like algorithm. The performances of our method is illustrated on synthetic as well as real data, the latter concerning breast cancer."
to:NB
graphical_models
lasso
sparsity
statistics
inference_to_latent_objects
february 2012 by cshalizi
"Trygve Haavelmo and the Emergence of Causal Calculus" (Judea Pearl, 2011)
february 2012 by cshalizi
"Haavelmo was the first to recognize the capacity of economic models to guide poli- cies. This paper describes some of the barriers that Haavelmo’s ideas have had (and still have) to overcome, and lays out a logical framework for capturing the relationships between theory, data and policy questions. The mathematical tools that emerge from this framework now enable investigators to answer complex policy and counterfactual questions using embarrassingly simple routines, some by mere inspection of the model’s structure. Several such problems are illustrated by examples, including misspecification tests, identification, mediation and introspection."
to:NB
causal_inference
economics
econometrics
haavelmo.trygve
pearl.judea
graphical_models
to_read
february 2012 by cshalizi
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
february 2012 by cshalizi
"We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature−instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples."
in_NB
information_theory
statistics
variable_selection
model_selection
to_teach:data-mining
to:blog
machine_learning
classifiers
have_read
graphical_models
february 2012 by cshalizi
If correlation doesn’t imply causation, then what does? | DDI
january 2012 by cshalizi
Michael preaches the Gospel According to Pearl; and very nicely too. (I would dispute however that DAGs don't give us a handle on mechanisms.)
causal_inference
graphical_models
statistics
causality
nielsen.michael
kith_and_kin
january 2012 by cshalizi
[1201.0794] Sparse Nonparametric Graphical Models
january 2012 by cshalizi
"We present some nonparametric methods for graphical modeling. In the discrete case, where the data are binary or drawn from a finite alphabet, Markov random fields are already essentially nonparametric, since the cliques can take only a finite number of values. Continuous data are different. The Gaussian graphical model is the standard parametric model for continuous data, but it makes distributional assumptions that are often unrealistic. We discuss two approaches to building more flexible graphical models. One allows arbitrary graphs and a nonparametric extension of the Gaussian; the other uses kernel density estimation and restricts the graphs to trees and forests. Examples of both methods are presented. We also discuss possible future research directions for nonparametric graphical modeling."
(Review/good parts version of previous papers.)
in_NB
kith_and_kin
statistics
machine_learning
graphical_models
nonparametrics
density_estimation
wasserman.larry
liu.han
lafferty.john
(Review/good parts version of previous papers.)
january 2012 by cshalizi
An Asymptotic Behavior of the Marginal Likelihood for General Markov Models
december 2011 by cshalizi
"The standard Bayesian Information Criterion (BIC) is derived under regularity conditions which are not always satisfied in the case of graphical models with hidden variables. In this paper we derive the BIC for the binary graphical tree models where all the inner nodes of a tree represent binary hidden variables. This provides an extension of a similar formula given by Rusakov and Geiger for naive Bayes models. The main tool used in this paper is the connection between the growth behavior of marginal likelihood integrals and the real log-canonical threshold."
in_NB
markov_models
graphical_models
statistics
information_criteria
december 2011 by cshalizi
[1111.0559] Model Selection in Undirected Graphical Models with the Elastic Net
november 2011 by cshalizi
"Structure learning in random fields has attracted considerable attention due to its difficulty and importance in areas such as remote sensing, computational biology, natural language processing, protein networks, and social network analysis. We consider the problem of estimating the probabilistic graph structure associated with a Gaussian Markov Random Field (GMRF), the Ising model and the Potts model, by extending previous work on $l_1$ regularized neighborhood estimation to include the elastic net $l_1+l_2$ penalty. Additionally, we show numerical evidence that the edge density plays a role in the graph recovery process. Finally, we introduce a novel method for augmenting neighborhood estimation by leveraging pair-wise neighborhood union estimates."
in_NB
graphical_models
model_selection
lasso
sparsity
november 2011 by cshalizi
[1110.4821] Factor models on locally tree-like graphs
october 2011 by cshalizi
"We consider homogeneous factor models on uniformly sparse graph sequences converging locally to a (unimodular) random tree T, and study the existence of the free energy density phi, the limit of the log-partition function divided by the number of vertices n as n tends to infinity. We provide a new interpolation scheme and use it to prove existence of, and to explicitly compute, the quantity phi subject to uniqueness of a relevant Gibbs measure for the factor model on T. By way of example we compute phi for the independent set (or hard-core) model at low fugacity, for the ferromagnetic Ising model at all parameter values, and for the ferromagnetic Potts model with both weak enough and strong enough interactions. Even beyond uniqueness our interpolation provides useful explicit bounds on phi.
In the regimes in which we establish existence of the limit, we show that it coincides with the Bethe free energy functional evaluated at a suitable fixed point of the belief propagation recursions on T. In the special case that T has a Galton-Watson law, this formula coincides with the non-rigorous "Bethe prediction" obtained by statistical physicists using the "replica" or "cavity" methods. Thus our work is a rigorous generalization of these heuristic calculations to the broader class of sparse graph sequences converging locally to trees. We also provide a variational characterization for the Bethe prediction in this general setting, which is of independent interest."
in_NB
graphical_models
probability
statistical_mechanics
In the regimes in which we establish existence of the limit, we show that it coincides with the Bethe free energy functional evaluated at a suitable fixed point of the belief propagation recursions on T. In the special case that T has a Galton-Watson law, this formula coincides with the non-rigorous "Bethe prediction" obtained by statistical physicists using the "replica" or "cavity" methods. Thus our work is a rigorous generalization of these heuristic calculations to the broader class of sparse graph sequences converging locally to trees. We also provide a variational characterization for the Bethe prediction in this general setting, which is of independent interest."
october 2011 by cshalizi
[1110.3076] Efficient Latent Variable Graphical Model Selection via Split Bregman Method
october 2011 by cshalizi
We consider the problem of covariance matrix estimation in the presence of latent variables. Under suitable conditions, it is possible to learn the marginal covariance matrix of the observed variables via a tractable convex program, where the concentration matrix of the observed variables is decomposed into a sparse matrix (representing the graphical structure of the observed variables) and a low rank matrix (representing the marginalization effect of latent variables). We present an efficient first-order method based on split Bregman to solve the convex problem. The algorithm is guaranteed to converge under mild conditions. We show that our algorithm is significantly faster than the state-of-the-art algorithm on both artificial and real-world data. Applying the algorithm to a gene expression data involving thousands of genes, we show that most of the correlation between observed variables can be explained by only a few dozen latent factors.
in_NB
graphical_models
inference_to_latent_objects
statistics
october 2011 by cshalizi
Robustification of the PC Algorithm for Directed Acyclic Graphs
october 2011 by cshalizi
"The PC-algorithm was shown to be a powerful method for estimating the equivalence class of a potentially very high-dimensional acyclic directed graph (DAG) with the corresponding Gaussian distribution. Here we propose a computationally eficient robustification of the PC-algorithm and prove its consistency. Furthermore, we compare the robustified and standard version of the PC-algorithm on simulated data using the new corresponding R package pcalg."
statistics
causal_inference
graphical_models
buhlmann.peter
in_NB
to_read
to_teach:data-mining
to_teach:undergrad-ADA
kalisch.markus
october 2011 by cshalizi
[1110.1338] Robustness and Conditional Independence Ideals
october 2011 by cshalizi
"We study notions of robustness of Markov kernels and probability distribution of a system that is described by $n$ input random variables and one output random variable. Markov kernels can be expanded in a series of potentials that allow to describe the system's behaviour after knockouts. Robustness imposes structural constraints on these potentials. Robustness of probability distributions is defined via conditional independence statements. These statements can be studied algebraically. The corresponding conditional independence ideals are related to binary edge ideals. The set of robust probability distributions lies on an algebraic variety. We compute a Gr"obner basis of this ideal and study the irreducible decomposition of the variety. These algebraic results allow to parametrize the set of all robust probability distributions."
probability
markov_models
graphical_models
information_geometry
algebraic_geometry
ay.nihat
in_NB
october 2011 by cshalizi
[1110.0718] Directed information and Pearl's causal calculus
october 2011 by cshalizi
"Probabilistic graphical models are a fundamental tool in statistics, machine learning, signal processing, and control. When such a model is defined on a directed acyclic graph (DAG), one can assign a partial ordering to the events occurring in the corresponding stochastic system. Based on the work of Judea Pearl and others, these DAG-based "causal factorizations" of joint probability measures have been used for characterization and inference of functional dependencies (causal links). This mostly expository paper focuses on several connections between Pearl's formalism (and in particular his notion of "intervention") and information-theoretic notions of causality and feedback (such as causal conditioning, directed stochastic kernels, and directed information). As an application, we show how conditional directed information can be used to develop an information-theoretic version of Pearl's "back-door" criterion for identifiability of causal effects from passive observations. This suggests that the back-door criterion can be thought of as a causal analog of statistical sufficiency."
graphical_models
causality
causal_inference
information_theory
statistics
raginsky.maxim
in_NB
to_read
kith_and_kin
sufficiency
october 2011 by cshalizi
Markov Logic: An Interface Layer for Artificial Intelligence
october 2011 by cshalizi
"Most subfields of computer science have an interface layer via which applications communicate with the infrastructure, and this is key to their success (e.g., the Internet in networking, the relational model in databases, etc.). So far this interface layer has been missing in AI. First-order logic and probabilistic graphical models each have some of the necessary features, but a viable interface layer requires combining both. Markov logic is a powerful new language that accomplishes this by attaching weights to first-order formulas and treating them as templates for features of Markov random fields. Most statistical models in wide use are special cases of Markov logic, and first-order logic is its infinite-weight limit. Inference algorithms for Markov logic combine ideas from satisfiability, Markov chain Monte Carlo, belief propagation, and resolution. Learning algorithms make use of conditional likelihood, convex optimization, and inductive logic programming...."
graphical_models
ai
machine_learning
interface_design
domingos.pedro
in_NB
to_read
relational_learning
october 2011 by cshalizi
[1104.5617] Learning high-dimensional directed acyclic graphs with latent and selection variables
september 2011 by cshalizi
"We consider the problem of learning causal information between random variables in directed acyclic graph (DAGs) when allowing arbitrarily many latent and selection variables. The FCI algorithm (Spirtes et al., 1999) has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose a new algorithm, the RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."
have_read
to_teach:undergrad-ADA
graphical_models
causal_inference
in_NB
kalisch.markus
richardson.thomas_s.
september 2011 by cshalizi
[1107.1283] Spectral Methods for Learning Multivariate Latent Tree Structure
july 2011 by cshalizi
Huh, sounds like they're using tetrad equations?
markov_models
to_read
re:AoS_project
in_NB
graphical_models
inference_to_latent_objects
zhang.tong
kakade.sham
hsu.daniel
song.le
july 2011 by cshalizi
[1107.4966] Lifted Graphical Models: A Survey
july 2011 by cshalizi
"This article presents a survey of work on lifted graphical models. We review a general form for a lifted graphical model, a par-factor graph, and show how a number of existing statistical relational representations map to this formalism. We discuss inference algorithms, including lifted inference algorithms, that efficiently compute the answers to probabilistic queries. We also review work in learning lifted graphical models from data. It is our belief that the need for statistical relational models (whether it goes by that name or another) will grow in the coming decades, as we are inundated with data which is a mix of structured and unstructured, with entities and relations extracted in a noisy manner from text, and with the need to reason effectively with this data. We hope that this synthesis of ideas from many different research groups will provide an accessible starting point for new researchers in this expanding field."
graphical_models
machine_learning
getoor.lise
in_NB
to_read
relational_learning
july 2011 by cshalizi
[1107.3536] Inverse Ising inference using all the data
july 2011 by cshalizi
"We show that a method based on logistic regression, using all the data, solves the inverse Ising problem far better than mean-field calculations relying only on sample pairwise correlation functions, while still computationally feasible for numbers of nodes in the range of hundreds. The largest improvement in reconstruction occurs for strong interactions. Using two examples, a diluted Sherrington-Kirkpatrick model and a two-dimensional lattice, we also show that interaction topologies can be recovered from few samples with good accuracy (even below the phase transitions in the corresponding forward Ising problems), and that using $l_1$-regularization is beneficial in this process."
graphical_models
logistic_regression
ising_model
sparsity
july 2011 by cshalizi
Adventures in Data Land, Graphical Models for the Internet
march 2011 by cshalizi
Look at this later and re-consider the to_teach tags.
clustering
graphical_models
tutorials
expectation-maximization
internet
text_mining
to_teach:data-mining
to_teach:undergrad-ADA
smola.alex
ahmed.amr
heard_the_talk
march 2011 by cshalizi
Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
february 2011 by cshalizi
From a quick scan, I can't tell if this is a version of a paper I refereed for another conference, or just everyone condensing on the same idea at once.
graphical_models
community_discovery
network_data_analysis
machine_learning
in_NB
relational_learning
february 2011 by cshalizi
[1102.0316] Partition Functions of Normal Factor Graphs
february 2011 by cshalizi
"One of the most common types of functions in mathematics, physics, and engineering is a sum of products, sometimes called a partition function. After "normalization," a sum of products has a natural graphical representation, called a normal factor graph (NFG), in which vertices represent factors, edges represent internal variables, and half-edges represent the external variables of the partition function. In physics, so-called trace diagrams share similar features. We believe that the conceptual framework of representing sums of products as partition functions of NFGs is an important and intuitive paradigm that, surprisingly, does not seem to have been introduced explicitly in the previous factor graph literature. Of particular interest are NFG modifications that leave the partition function invariant. A simple subclass of such NFG modifications offers a unifying view of the Fourier transform, tree-based reparameterization, loop calculus, and the Legendre transform."
graphical_models
in_NB
probability
february 2011 by cshalizi
[1101.5108] Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes
february 2011 by cshalizi
"We investigate approximating joint distributions of random processes with causal dependence tree distributions. Such distributions are particularly useful in providing parsimonious representation when there exists causal dynamics among processes. By extending the results by Chow and Liu on dependence tree approximations, we show that the best causal dependence tree approximation is the one which maximizes the sum of directed informations on its edges, where best is defined in terms of minimizing the KL-divergence between the original and the approximate distribution. Moreover, we describe a low-complexity algorithm to efficiently pick this approximate distribution."
stochastic_processes
information_theory
graphical_models
chow-liu_trees
in_NB
coleman.todd
heard_the_talk
february 2011 by cshalizi
Didelez , Kreiner , Keiding : Graphical Models for Inference Under Outcome-Dependent Sampling
january 2011 by cshalizi
Probably way too advanced to actually teach in 402... == http://arxiv.org/abs/1101.0901
selection_bias
graphical_models
causal_inference
statistics
to_teach:undergrad-ADA
didelez.vanessa
january 2011 by cshalizi
[1012.3795] Estimating Networks With Jumps
december 2010 by cshalizi
"We study the problem of estimating a temporally varying coefficient and varying structure (VCVS) graphical model underlying nonstationary time series data, such as social states of interacting individuals or microarray expression profiles of gene networks, as opposed to i.i.d. data from an invariant model widely considered in current literature of structural estimation. In particular, we consider the scenario in which the model evolves in a piece-wise constant fashion. We propose a procedure that minimizes the so-called TESLA loss (i.e., temporally smoothed L1 regularized regression), which allows jointly estimating the partition boundaries of the VCVS model and the coefficient of the sparse precision matrix on each block of the partition. "
graphical_models
network_data_analysis
time_series
model_selection
statistics
xing.eric
december 2010 by cshalizi
[1011.4328] Graphical Models Concepts in Compressed Sensing
november 2010 by cshalizi
"This paper surveys recent work in applying ideas from graphical models and message passing algorithms to solve large scale regularized regression problems. In particular, the focus is on compressed sensing reconstruction via $\ell_1$ penalized least-squares (known as LASSO or BPDN). We discuss how to derive fast approximate message passing algorithms to solve this problem. Surprisingly, the analysis of such algorithms allows to prove exact high-dimensional limit results for the LASSO risk. This paper will appear as a chapter in a book on ‘Compressed Sensing’ edited by Yonina Eldar and Gitta Kutyniok."
graphical_models
compressed_sensing
statistics
estimation
november 2010 by cshalizi
[1011.4088] An Introduction to Conditional Random Fields
november 2010 by cshalizi
"Often we wish to predict a large number of variables that depend on each other as well as on other observed variables. Structured prediction methods are essentially a combination of classification and graphical modeling, combining the ability of graphical models to compactly model multivariate data with the ability of classification methods to perform prediction using large sets of input features. This tutorial describes conditional random fields, a popular probabilistic method for structured prediction. CRFs have seen wide application in natural language processing, computer vision, and bioinformatics. We describe methods for inference and parameter estimation for CRFs, including practical issues for implementing large scale CRFs. We do not assume previous knowledge of graphical modeling, so this tutorial is intended to be useful to practitioners in a wide variety of fields."
conditional_random_fields
graphical_models
machine_learning
in_NB
sutton.charles
mccallum.andrew
november 2010 by cshalizi
[1010.5720] Information-theoretic inference of common ancestors
november 2010 by cshalizi
"A directed acyclic graph (DAG) partially represents the conditional independence structure among observations of a system if the local Markov condition holds .... In general, there is a whole class of DAGs that represents a given set of conditional independence relations... properties of this class that can be derived from observations of a subsystem only... we prove an information theoretic inequality that allows for the inference of common ancestors of observed parts in ... some unknown larger system.. a large amount of dependence in terms of mutual information among the observations implies the existence of a common ancestor that distributes this information... our result can be seen as a quantitative extension of Reichenbach's Principle of Common Cause... Our conclusions are valid also for non-probabilistic observations such as binary strings, since we state the proof for an axiomatized notion of mutual information that includes the stochastic as well as the algorithmic version."
graphical_models
information_theory
causal_inference
statistics
ay.nihat
have_read
algorithmic_information_theory
november 2010 by cshalizi
Graphical Gaussian modelling of multivariate time series with latent variables
august 2010 by cshalizi
Just in time to drop on the head of some stupid neuroscientists I'm refereeing.
time_series
causal_inference
graphical_models
granger_causality
august 2010 by cshalizi
Bowsher: Stochastic kinetic models: Dynamic independence, modularity and graphs
july 2010 by cshalizi
"The dynamic properties and independence structure of stochastic kinetic models (SKMs) .., An SKM is a highly multivariate jump process used to model chemical reaction networks.. We identify SKM subprocesses with the corresponding counting processes and propose a directed, cyclic graph (the kinetic independence graph or KIG) that encodes the local independence structure of their conditional intensities. Given a partition [A, D, B] of the vertices, the graphical separation A ⊥ B|D in the undirected KIG has an intuitive chemical interpretation and implies that A is locally independent of B given A ∪ D. ... this separation also results in global independence of the internal histories of A and B conditional on a history of the jumps in D which ... mathematical definition of a modularization of an SKM using its implied dynamics. Graphical decomposition methods are developed for the identification and efficient computation of nested modularizations. "
interacting_particle_systems
graphical_models
biochemical_networks
july 2010 by cshalizi
Homophily and Contagion Are Generically Confounded in Observational Social Network Studies (Shalizi and Thomas, 2010)
re:homophily_and_confounding blogged social_networks network_data_analysis causal_inference graphical_models contagion homophily voter_model social_influence confounding identifiability self-centered re:critique_of_diffusion
april 2010 by cshalizi
re:homophily_and_confounding blogged social_networks network_data_analysis causal_inference graphical_models contagion homophily voter_model social_influence confounding identifiability self-centered re:critique_of_diffusion
april 2010 by cshalizi
[1004.2287] An empirical comparative study of approximate methods for binary graphical models; application to the search of associations among causes of death in French death certificates
april 2010 by cshalizi
"Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. Following the ideas of the LASSO procedure designed for the linear regression framework, recent developments dealing with graphical model selection have been based on $\ell_1$-penalization. In the binary case, however, exact inference is generally very slow or even intractable because of [the] log-partition function. Various approximate methods have recently been proposed in the literature ... Through an extensive simulation study, we show that a simple modification of a method relying on a Gaussian approximation achieves good performance and is very fast. We present a real application in which we search for associations among causes of death recorded on French death certificates."
graphical_models
lasso
model_selection
april 2010 by cshalizi
[1004.2304] Spatio-Temporal Graphical Model Selection
april 2010 by cshalizi
"We consider the problem of estimating the topology of spatial interactions in a discrete state, discrete time spatio-temporal graphical model where the interactions affect the temporal evolution of each agent in a network. Among other models, the susceptible, infected, recovered ($SIR$) model for interaction events fall into this framework. We pose the problem as a structure learning problem and solve it using an $\ell_1$-penalized likelihood convex program. We evaluate the solution on a simulated spread of infectious over a complex network. Our topology estimates outperform those of a standard spatial Markov random field graphical model selection using $\ell_1$-regularized logistic regression."
graphical_models
random_fields
lasso
model_selection
april 2010 by cshalizi
[1002.4802] Gaussian Process Structural Equation Models with Latent Variables
february 2010 by cshalizi
"In a variety of disciplines such as social sciences, psychology, medicine and economics, the recorded data are considered to be noisy measurements of latent variables connected by some causal structure. This corresponds to a family of graphical models known as the structural equation model with latent variables. While linear non-Gaussian variants have been well-studied, inference in nonparametric structural equation models is still underdeveloped. We introduce a sparse Gaussian process parameterization that defines a non-linear structure connecting latent variables, unlike common formulations of Gaussian process latent variable models. An efficient Markov chain Monte Carlo procedure is described. We evaluate the stability of the sampling procedure and the predictive ability of the model compared against the current practice."
statistics
graphical_models
latent_variables
nonparametrics
estimation
heard_the_talk
february 2010 by cshalizi
related tags
ahmed.amr ⊕ ai ⊕ airoldi.edo ⊕ algebraic_geometry ⊕ algorithmic_information_theory ⊕ anomaly_detection ⊕ artificial_intelligence ⊕ ay.nihat ⊕ biochemical_networks ⊕ bioinformatics ⊕ blogged ⊕ books:noted ⊕ books:recommended ⊕ book_reviews ⊕ bootstrap ⊕ buhlmann.peter ⊕ causality ⊕ causal_inference ⊕ change-point_problem ⊕ chow-liu_algorithm ⊕ chow-liu_trees ⊕ chu.tianjiao ⊕ classifiers ⊕ climatology ⊕ clustering ⊕ coleman.todd ⊕ combinatorics ⊕ community_discovery ⊕ compressed_sensing ⊕ computational_complexity ⊕ computational_statistics ⊕ conditional_random_fields ⊕ confidence_sets ⊕ confounding ⊕ contagion ⊕ contingency_tables ⊕ copulas ⊕ coveted ⊕ cross-validation ⊕ das.kaustav ⊕ data_mining ⊕ decision-making ⊕ density_estimation ⊕ design_for_a_brain ⊕ didelez.vanessa ⊕ dimension_reduction ⊕ domingos.pedro ⊕ econometrics ⊕ economics ⊕ emergence ⊕ ensemble_methods ⊕ estimation ⊕ expectation-maximization ⊕ exponential_families ⊕ factor_analysis ⊕ feature_selection ⊕ feedback ⊕ fmri ⊕ food_webs ⊕ friedman.nir ⊕ functional_connectivity ⊕ gene_expression_data_analysis ⊕ getoor.lise ⊕ glymour.clark ⊕ grammar_induction ⊕ granger_causality ⊕ graphical_models ⊖ graph_theory ⊕ haavelmo.trygve ⊕ have_read ⊕ heard_the_talk ⊕ heavy_tails ⊕ heuristics ⊕ hierarchical_models ⊕ historical_linguistics ⊕ hofling.holger ⊕ hofmann.thomas ⊕ homophily ⊕ hoyer.patrik ⊕ hsu.daniel ⊕ identifiability ⊕ independent_component_analysis ⊕ inference_to_latent_objects ⊕ information_criteria ⊕ information_geometry ⊕ information_retrieval ⊕ information_theory ⊕ interacting_particle_systems ⊕ interface_design ⊕ internet ⊕ in_NB ⊕ ising_model ⊕ janzing.dominik ⊕ jordan.michael_i. ⊕ kakade.sham ⊕ kalisch.markus ⊕ kith_and_kin ⊕ koller.daphne ⊕ lacerda.gustavo ⊕ lafferty.john ⊕ large_deviations ⊕ lasso ⊕ latent_semantic_analysis ⊕ latent_variables ⊕ lauritzen.steffen ⊕ lead ⊕ learning_theory ⊕ linear_regression ⊕ liu.han ⊕ logic ⊕ logistic_regression ⊕ machine_learning ⊕ macro_from_micro ⊕ markov_models ⊕ mccallum.andrew ⊕ meinshausen.nicolai ⊕ methodology ⊕ mixture_models ⊕ model_discovery ⊕ model_search ⊕ model_selection ⊕ multiple_testing ⊕ networks ⊕ network_data_analysis ⊕ network_formation ⊕ neural_data_analysis ⊕ neural_networks ⊕ neuroscience ⊕ nielsen.michael ⊕ non-stationarity ⊕ nonparametrics ⊕ particle_filters ⊕ pearl.judea ⊕ phase_transitions ⊕ philosophy_of_mind ⊕ philosophy_of_science ⊕ point_processes ⊕ principal_components ⊕ probability ⊕ R ⊕ raginsky.maxim ⊕ ramsey.joseph ⊕ random_fields ⊕ ravikumar.pradeep ⊕ re:AoS_project ⊕ re:critique_of_diffusion ⊕ re:functional_communities ⊕ re:g_paper ⊕ re:homophily_and_confounding ⊕ re:institutions_as_collective_degrees_of_freedom ⊕ re:LOB_project ⊕ re:what_is_the_right_null_model_for_linear_regression ⊕ re:your_favorite_dsge_sucks ⊕ rebonato.riccard ⊕ regression ⊕ reichenbach.hans ⊕ relational_learning ⊕ richardson.thomas_s. ⊕ risk_assessment ⊕ robins.james ⊕ roeder.kathryn ⊕ selection_bias ⊕ self-centered ⊕ shpister.ilya ⊕ smola.alex ⊕ social_influence ⊕ social_networks ⊕ song.le ⊕ sparsity ⊕ spirtes.peter ⊕ statistical_learning ⊕ statistical_mechanics ⊕ statistics ⊕ stochastic_processes ⊕ structural_equations ⊕ sufficiency ⊕ support_vector_machines ⊕ sutton.charles ⊕ taskar.ben ⊕ tetrad ⊕ text_mining ⊕ tibshirani.robert ⊕ time_series ⊕ to:blog ⊕ to:NB ⊕ to_read ⊕ to_teach:complexity-and-inference ⊕ to_teach:data-mining ⊕ to_teach:undergrad-ADA ⊕ track_down_references ⊕ tutorials ⊕ variable_selection ⊕ variational_methods ⊕ via:arthegall ⊕ via:ded-maxim ⊕ via:judea_pearl ⊕ via:klk ⊕ via:matthew_berryman ⊕ via:shivak ⊕ voter_model ⊕ wainwright.martin ⊕ wasserman.larry ⊕ wermuth.nanny ⊕ xing.eric ⊕ yee.danny ⊕ zhang.jiji ⊕ zhang.tong ⊕Copy this bookmark: