cshalizi + graphical_models   122

[1205.0241] Counterfactual Graphical Models for Mediation Analysis via Path-Specific Effects
"Potential outcome counterfactuals represent variation in the outcome of interest after a hypothetical treatment or intervention is performed. Causal graphical models are a concise, intuitive way of representing causal assumptions, including independence constraints among such counterfactuals. Much of modern causal inference is concerned with expressing cause effect relationships of interest in counterfactual form, showing how the resulting counterfactuals can be identified (that is expressed in terms of available data, using domain-specific causal assumptions), and subsequently estimated using statistical methods. In this paper we will use causal graphical models to analyze the identification problem of the so-called emph{path-specific effects}, that is effects of treatment on outcome along certain specified causal paths. Such effects arise in mediation analysis settings where it's important to distinguish direct and indirect effects of treatment. We review existing results on path-specific effects in the fully observable, static treatment setting, and extend them to settings with time-varying treatments, and latent variables."
to:NB  causal_inference  shpister.ilya  graphical_models 
17 days ago by cshalizi
Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies
"We present methods able to predict the presence and strength of conditional and unconditional dependencies (correlations) between two variables Y and Z never jointly measured on the same samples, based on multiple data sets measuring a set of common variables. The algorithms are specializations of prior work on learning causal structures from overlapping variable sets. This problem has also been addressed in the field of statistical matching. The proposed methods are applied to a wide range of domains and are shown to accurately predict the presence of thousands of dependencies. Compared against prototypical statistical matching algorithms and within the scope of our experiments, the proposed algorithms make predictions that are better correlated with the sample estimates of the unknown parameters on test data ; this is particularly the case when the number of commonly measured variables is low.
"The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causally-inspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org."
to:NB  to_read  causal_inference  graphical_models  to_teach:undergrad-ADA 
25 days ago by cshalizi
"The huge Package for High-dimensional Undirected Graph Estimation in R"
"We describe an R package named huge which provides easy-to-use functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. (2007), Liu et al. (2009, 2012) and Liu et al. (2010). Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like data-dependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up large-scale problems, making a tradeoff between computational and statistical efficiency."
to:NB  to_teach:undergrad-ADA  graphical_models  statistics  kith_and_kin  wasserman.larry  roeder.kathryn  liu.han 
25 days ago by cshalizi
[1204.5540] Learning Graph Structure in Discrete Markov Random Fields
"We present a general algorithm for learning the structure of discrete Markov random fields from i.i.d. samples. Several algorithms have been proposed for structure learning algorithms earlier and each of these address the learning problem under different assumptions. Our algorithm provides a unified view in the following sense: when our algorithm is applied to each of the special cases, it results in a the same computational complexity as earlier algorithms. More importantly, our approach also provides a new low-computational complexity algorithm for the case of Ising models where the underlying graph is the Erdos-Renyi random graph G(p,c/p)."

When would you ever want to learn an Ising model on an E-R graph?
to:NB  graphical_models  machine_learning  networks 
4 weeks ago by cshalizi
[1204.3941] A Log-Linear Graphical Model for Inferring Genetic Networks from High-Throughput Sequencing Data
"We develop a novel method for estimating high-dimensional Poisson graphical models, the Log-Linear Graphical Model, allowing us to infer networks based on high-throughput sequencing data. Our model assumes that conditional on all other genes, each gene is Poisson, jointly defining a pair-wise Poisson Markov random field. We estimate our genetic networks via neighborhood selection by fitting `1-norm penalized log-linear models, an approach we call the Poisson Graphical Lasso. Additionally, we develop a fast parallel algorithm, permitting us to fit our graphical models to high-dimensional genomic data sets."
in_NB  graphical_models  gene_expression_data_analysis  lasso  network_data_analysis  statistics  regression 
5 weeks ago by cshalizi
[1204.2003] Directed Information Graphs
"We propose two graphical models to represent a concise description of the causal statistical dependence structure between a group of coupled stochastic processes. The first, minimum generative model graphs, is motivated by generative models. The second, directed information graphs, is motivated by Granger causality. We show that under mild assumptions, the graphs are identical. In fact, these are analogous to Bayesian and Markov networks respectively, in terms of Markov blankets and I-map properties. Furthermore, the underlying variable dependence structure is the unique causal Bayesian network. Lastly, we present a method using minimal-dimension statistics to identify the structure when upper bounds on the in-degrees are known. Simulations show the effectiveness of the approach."
to:NB  graphical_models  to_read  re:functional_communities  causality  information_theory  coleman.todd 
6 weeks ago by cshalizi
Colombo , Maathuis , Kalisch , Richardson : Learning high-dimensional directed acyclic graphs with latent and selection variables
"We consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables. The FCI (Fast Causal Inference) algorithm has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose the new RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct in the asymptotic limit. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."

--- To complicated to actually teach, but should be mentioned in the lecture notes on causal discovery, along with FCI.
in_NB  have_read  statistics  graphical_models  causal_inference  sparsity  to_teach:undergrad-ADA 
7 weeks ago by cshalizi
[1203.3037] Expanding the Transfer Entropy to Identify Information Subgraphs in Complex Systems
"We propose a formal expansion of the transfer entropy to put in evidence irreducible sets of variables which provide information for the future state of each assigned target. Multiplets characterized by an high value will be associated to informational circuits present in the system, with an informational character (synergetic or redundant) which can be associated to the sign of the contribution. We also present preliminary results on fMRI and EEG data sets."
in_NB  graphical_models  information_theory  community_discovery  time_series  re:functional_communities 
7 weeks ago by cshalizi
[1203.0697] Learning High-Dimensional Mixtures of Graphical Models
"We consider the problem of learning mixtures of discrete graphical models in high dimensions and propose a novel method for estimating the mixture components with provable guarantees. The method proceeds mainly in three stages. In the first stage, it estimates the union of the Markov graphs of the mixture components (referred to as the union graph) via a series of rank tests. It then uses this estimated union graph to compute the mixture components via a spectral decomposition method. The spectral decomposition method was originally proposed for latent class models, and we adapt this method for learning the more general class of graphical model mixtures. In the end, the method produces tree approximations of the mixture components via the Chow-Liu algorithm. Our output is thus a tree-mixture model which serves as a good approximation to the underlying graphical model mixture. When the union graph has sparse node separators, we prove that our method has sample and computational complexities scaling as poly(p, d, r), for an r-component mixture of p-variate graphical models, where d is the cardinality of the sample space of each node variable. We also extend our results to the case when the union graph has sparse local separators, which is a weaker criterion than having sparse exact separators, and when the mixture components are in the regime of correlation decay. The computational and sample complexities of our method for this class are significantly improved, since they involve an upper bound on the cardinality of local separators (as opposed to exact separators). Our results push the realm of tractable model classes for high-dimensional learning, which includes the class of tree mixtures."
in_NB  mixture_models  ensemble_methods  graphical_models  machine_learning  to_read  chow-liu_algorithm 
7 weeks ago by cshalizi
Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso
"We consider the sparse inverse covariance regularization problem or graphical lasso with regularization parameter λ. Suppose the sample covariance graph formed by thresholding the entries of the sample covariance matrix at λ is decomposed into connected components. We show that the vertex-partition induced by the connected components of the thresholded sample covariance graph (at λ) is exactly equal to that induced by the connected components of the estimated concentration graph, obtained by solving the graphical lasso problem for the same λ. This characterizes a very interesting property of a path of graphical lasso solutions. Furthermore, this simple rule, when used as a wrapper around existing algorithms for the graphical lasso, leads to enormous performance gains. For a range of values of λ, our proposal splits a large graphical lasso problem into smaller tractable problems, making it possible to solve an otherwise infeasible large-scale problem. We illustrate the graceful scalability of our proposal via synthetic and real-life microarray examples."
--- I wonder whether this hasn't some application to the PC algorithm?
to:NB  graphical_models  lasso  sparsity  statistics  heard_the_talk 
7 weeks ago by cshalizi
[1203.6502] Quantifying causal influences
"Common methods of causal inference generate directed acyclic graphs (DAGs) that formalize causal relations between n variables. Given the joint distribution of all these variables, the DAG contains all information about how intervening on one variable would change the distribution of the other n-1 variables. It remains, however, a non-trivial question how to quantify the causal influence of one variable on another one.
Here we propose a measure for causal strength that refers to direct effects and measure the "strength of an arrow" or a set of arrows. It is based on a hypothetical intervention that modifies the joint distribution by cutting the corresponding edge. The causal strength is then the relative entropy distance between the old and the new distribution.
We discuss other measures of causal strength like the average causal effect, transfer entropy and information flow and describe their limitations. We argue that our measure is also more appropriate for time series than the known ones.
Finally, we discuss conceptual problems in defining the strength of indirect effects."
to:NB  to_read  causality  graphical_models  information_theory  statistics  via:ded-maxim 
8 weeks ago by cshalizi
[1203.3887] Learning Loopy Graphical Models with Latent Variables: Efficient Methods and Guarantees
"The problem of structure estimation in latent graphical models is considered, where some nodes are latent or hidden. A novel method is proposed which attempts to locally reconstruct latent trees and outputs a loopy graph structure with hidden variables. Correctness of the method is established when the underlying graph has a large girth and the model is in the regime of correlation decay, and PAC guarantees for the method are also derived. For the special case of the Ising model, the number of samples $n$ required for structural consistency scales as $n = Omega(theta_{min}^{-2delta eta(eta+1)-2}log p)$, where $theta_{min}$ is the minimum edge potential, $delta$ is the depth (i.e., distance from a hidden node to the nearest observed nodes), and $eta$ is a parameter which depends on the bounds on node and edge potentials in the Ising model. The results are further specialized for the case when the observed nodes are uniformly sampled from the model. Finally, necessary conditions for structural consistency under any algorithm are derived."
to:NB  graphical_models  learning_theory  machine_learning  statistics  ising_model  inference_to_latent_objects 
9 weeks ago by cshalizi
[0704.2551] Inferring dynamic genetic networks with low order independencies
"In this paper, we propose a novel inference method for dynamic genetic networks which makes it possible to face with a number of time measurements n much smaller than the number of genes p. The approach is based on the concept of low order conditional dependence graph that we extend here in the case of Dynamic Bayesian Networks. Most of our results are based on the theory of graphical models associated with the Directed Acyclic Graphs (DAGs). In this way, we define a minimal DAG G which describes exactly the full order conditional dependencies given the past of the process. Then, to face with the large p and small n estimation case, we propose to approximate DAG G by considering low order conditional independencies. We introduce partial qth order conditional dependence DAGs G(q) and analyze their probabilistic properties. In general, DAGs G(q) differ from DAG G but still reflect relevant dependence facts for sparse networks such as genetic networks. By using this approximation, we set out a non-bayesian inference method and demonstrate the effectiveness of this approach on both simulated and real data analysis. The inference procedure is implemented in the R package 'G1DBN' freely available from the CRAN archive."
to:NB  graphical_models  gene_expression_data_analysis  biochemical_networks  statistics 
10 weeks ago by cshalizi
[1202.3765] Learning mixed graphical models from data with p larger than n
"Structure learning of Gaussian graphical models is an extensively studied problem in the classical multivariate setting where the sample size n is larger than the number of random variables p, as well as in the more challenging setting when p>>n. However, analogous approaches for learning the structure of graphical models with mixed discrete and continuous variables when p>>n remain largely unexplored. Here we describe a statistical learning procedure for this problem based on limited-order correlations and assess its performance with synthetic and real data."
to:NB  graphical_models  statistics 
11 weeks ago by cshalizi
[1202.3733] Lipschitz Parametrization of Probabilistic Graphical Models
"We show that the log-likelihood of several probabilistic graphical models is Lipschitz continuous with respect to the lp-norm of the parameters. We discuss several implications of Lipschitz parametrization. We present an upper bound of the Kullback-Leibler divergence that allows understanding methods that penalize the lp-norm of differences of parameters as the minimization of that upper bound. The expected log-likelihood is lower bounded by the negative lp-norm, which allows understanding the generalization ability of probabilistic models. The exponential of the negative lp-norm is involved in the lower bound of the Bayes error rate, which shows that it is reasonable to use parameters as features in algorithms that rely on metric spaces (e.g. classification, dimensionality reduction, clustering). Our results do not rely on specific algorithms for learning the structure or parameters. We show preliminary results for activity recognition and temporal segmentation."
to:NB  graphical_models 
11 weeks ago by cshalizi
[1202.3731] What Cannot be Learned with Bethe Approximations
"We address the problem of learning the parameters in graphical models when inference is intractable. A common strategy in this case is to replace the partition function with its Bethe approximation. We show that there exists a regime of empirical marginals where such Bethe learning will fail. By failure we mean that the empirical marginals cannot be recovered from the approximated maximum likelihood parameters (i.e., moment matching is not achieved). We provide several conditions on empirical marginals that yield outer and inner bounds on the set of Bethe learnable marginals. An interesting implication of our results is that there exists a large class of marginals that cannot be obtained as stable fixed points of belief propagation. Taken together our results provide a novel approach to analyzing learning with Bethe approximations and highlight when it can be expected to work or fail."
to:NB  graphical_models 
11 weeks ago by cshalizi
[1202.1787] Greedy Learning of Markov Network Structure
"We propose a new yet natural algorithm for learning the graph structure of general discrete graphical models (a.k.a. Markov random fields) from samples. Our algorithm finds the neighborhood of a node by sequentially adding nodes that produce the largest reduction in empirical conditional entropy; it is greedy in the sense that the choice of addition is based only on the reduction achieved at that iteration. Its sequential nature gives it a lower computational complexity as compared to other existing comparison-based techniques, all of which involve exhaustive searches over every node set of a certain size. Our main result characterizes the sample complexity of this procedure, as a function of node degrees, graph size and girth in factor-graph representation. We subsequently specialize this result to the case of Ising models, where we provide a simple transparent characterization of sample complexity as a function of model and graph parameters.
For tree graphs, our algorithm is the same as the classical Chow-Liu algorithm, and in that sense can be considered the extension of the same to graphs with cycles."
to:NB  graphical_models  machine_learning  information_theory 
12 weeks ago by cshalizi
[1202.6590] Uniform generation of random acyclic digraphs
"We show how to sample acyclic digraphs uniformly at random through recursive enumeration. This provides an exact method which avoids the convergence issues of the alternative Markov chain methods. The limiting behaviour of the distribution of acyclic digraphs also allows us to sample arbitrarily large acyclic digraphs. Finally we discuss how to include various restrictions in the combinatorial enumeration for efficient uniform sampling of the corresponding graphs."
to:NB  graph_theory  combinatorics  graphical_models 
12 weeks ago by cshalizi
Graphical Models with R
"Graphical models in their modern form have been around since the late 1970s and appear today in many areas of the sciences.  Along with the ongoing developments of graphical models, a number of different graphical modeling software programs have been written over the years.  In recent years many of these software developments have taken place within the R community, either in the form of new packages or by providing an R interface to existing software.  This book attempts to give the reader a gentle introduction to graphical modeling using R and the main features of some of these packages.  In addition, the book provides examples of how more advanced aspects of graphical modeling can be represented and handled within R.  Topics covered in the seven chapters include graphical models for contingency tables, Gaussian and mixed graphical models, Bayesian networks and modeling high dimensional data."
to:NB  books:noted  R  statistics  graphical_models  lauritzen.steffen  computational_statistics 
12 weeks ago by cshalizi
[0810.3177] Inferring sparse Gaussian graphical models with latent structure
"Our concern is selecting the concentration matrix's nonzero coefficients for a sparse Gaussian graphical model in a high-dimensional setting. This corresponds to estimating the graph of conditional dependencies between the variables. We describe a novel framework taking into account a latent structure on the concentration matrix. This latent structure is used to drive a penalty matrix and thus to recover a graphical model with a constrained topology. Our method uses an $ell_1$ penalized likelihood criterion. Inference of the graph of conditional dependencies between the variates and of the hidden variables is performed simultaneously in an iterative textsc{em}-like algorithm. The performances of our method is illustrated on synthetic as well as real data, the latter concerning breast cancer."
to:NB  graphical_models  lasso  sparsity  statistics  inference_to_latent_objects 
february 2012 by cshalizi
"Trygve Haavelmo and the Emergence of Causal Calculus" (Judea Pearl, 2011)
"Haavelmo was the first to recognize the capacity of economic models to guide poli- cies. This paper describes some of the barriers that Haavelmo’s ideas have had (and still have) to overcome, and lays out a logical framework for capturing the relationships between theory, data and policy questions. The mathematical tools that emerge from this framework now enable investigators to answer complex policy and counterfactual questions using embarrassingly simple routines, some by mere inspection of the model’s structure. Several such problems are illustrated by examples, including misspecification tests, identification, mediation and introspection."
to:NB  causal_inference  economics  econometrics  haavelmo.trygve  pearl.judea  graphical_models  to_read 
february 2012 by cshalizi
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
"We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature−instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples."
in_NB  information_theory  statistics  variable_selection  model_selection  to_teach:data-mining  to:blog  machine_learning  classifiers  have_read  graphical_models 
february 2012 by cshalizi
If correlation doesn’t imply causation, then what does? | DDI
Michael preaches the Gospel According to Pearl; and very nicely too. (I would dispute however that DAGs don't give us a handle on mechanisms.)
causal_inference  graphical_models  statistics  causality  nielsen.michael  kith_and_kin 
january 2012 by cshalizi
[1201.0794] Sparse Nonparametric Graphical Models
"We present some nonparametric methods for graphical modeling. In the discrete case, where the data are binary or drawn from a finite alphabet, Markov random fields are already essentially nonparametric, since the cliques can take only a finite number of values. Continuous data are different. The Gaussian graphical model is the standard parametric model for continuous data, but it makes distributional assumptions that are often unrealistic. We discuss two approaches to building more flexible graphical models. One allows arbitrary graphs and a nonparametric extension of the Gaussian; the other uses kernel density estimation and restricts the graphs to trees and forests. Examples of both methods are presented. We also discuss possible future research directions for nonparametric graphical modeling."

(Review/good parts version of previous papers.)
in_NB  kith_and_kin  statistics  machine_learning  graphical_models  nonparametrics  density_estimation  wasserman.larry  liu.han  lafferty.john 
january 2012 by cshalizi
An Asymptotic Behavior of the Marginal Likelihood for General Markov Models
"The standard Bayesian Information Criterion (BIC) is derived under regularity conditions which are not always satisfied in the case of graphical models with hidden variables. In this paper we derive the BIC for the binary graphical tree models where all the inner nodes of a tree represent binary hidden variables. This provides an extension of a similar formula given by Rusakov and Geiger for naive Bayes models. The main tool used in this paper is the connection between the growth behavior of marginal likelihood integrals and the real log-canonical threshold."
in_NB  markov_models  graphical_models  statistics  information_criteria 
december 2011 by cshalizi
[1111.0559] Model Selection in Undirected Graphical Models with the Elastic Net
"Structure learning in random fields has attracted considerable attention due to its difficulty and importance in areas such as remote sensing, computational biology, natural language processing, protein networks, and social network analysis. We consider the problem of estimating the probabilistic graph structure associated with a Gaussian Markov Random Field (GMRF), the Ising model and the Potts model, by extending previous work on $l_1$ regularized neighborhood estimation to include the elastic net $l_1+l_2$ penalty. Additionally, we show numerical evidence that the edge density plays a role in the graph recovery process. Finally, we introduce a novel method for augmenting neighborhood estimation by leveraging pair-wise neighborhood union estimates."
in_NB  graphical_models  model_selection  lasso  sparsity 
november 2011 by cshalizi
[1110.4821] Factor models on locally tree-like graphs
"We consider homogeneous factor models on uniformly sparse graph sequences converging locally to a (unimodular) random tree T, and study the existence of the free energy density phi, the limit of the log-partition function divided by the number of vertices n as n tends to infinity. We provide a new interpolation scheme and use it to prove existence of, and to explicitly compute, the quantity phi subject to uniqueness of a relevant Gibbs measure for the factor model on T. By way of example we compute phi for the independent set (or hard-core) model at low fugacity, for the ferromagnetic Ising model at all parameter values, and for the ferromagnetic Potts model with both weak enough and strong enough interactions. Even beyond uniqueness our interpolation provides useful explicit bounds on phi.
In the regimes in which we establish existence of the limit, we show that it coincides with the Bethe free energy functional evaluated at a suitable fixed point of the belief propagation recursions on T. In the special case that T has a Galton-Watson law, this formula coincides with the non-rigorous "Bethe prediction" obtained by statistical physicists using the "replica" or "cavity" methods. Thus our work is a rigorous generalization of these heuristic calculations to the broader class of sparse graph sequences converging locally to trees. We also provide a variational characterization for the Bethe prediction in this general setting, which is of independent interest."
in_NB  graphical_models  probability  statistical_mechanics 
october 2011 by cshalizi
[1110.3076] Efficient Latent Variable Graphical Model Selection via Split Bregman Method
We consider the problem of covariance matrix estimation in the presence of latent variables. Under suitable conditions, it is possible to learn the marginal covariance matrix of the observed variables via a tractable convex program, where the concentration matrix of the observed variables is decomposed into a sparse matrix (representing the graphical structure of the observed variables) and a low rank matrix (representing the marginalization effect of latent variables). We present an efficient first-order method based on split Bregman to solve the convex problem. The algorithm is guaranteed to converge under mild conditions. We show that our algorithm is significantly faster than the state-of-the-art algorithm on both artificial and real-world data. Applying the algorithm to a gene expression data involving thousands of genes, we show that most of the correlation between observed variables can be explained by only a few dozen latent factors.
in_NB  graphical_models  inference_to_latent_objects  statistics 
october 2011 by cshalizi
Robustification of the PC Algorithm for Directed Acyclic Graphs
"The PC-algorithm was shown to be a powerful method for estimating the equivalence class of a potentially very high-dimensional acyclic directed graph (DAG) with the corresponding Gaussian distribution. Here we propose a computationally eficient robustification of the PC-algorithm and prove its consistency. Furthermore, we compare the robustified and standard version of the PC-algorithm on simulated data using the new corresponding R package pcalg."
statistics  causal_inference  graphical_models  buhlmann.peter  in_NB  to_read  to_teach:data-mining  to_teach:undergrad-ADA  kalisch.markus 
october 2011 by cshalizi
[1110.1338] Robustness and Conditional Independence Ideals
"We study notions of robustness of Markov kernels and probability distribution of a system that is described by $n$ input random variables and one output random variable. Markov kernels can be expanded in a series of potentials that allow to describe the system's behaviour after knockouts. Robustness imposes structural constraints on these potentials. Robustness of probability distributions is defined via conditional independence statements. These statements can be studied algebraically. The corresponding conditional independence ideals are related to binary edge ideals. The set of robust probability distributions lies on an algebraic variety. We compute a Gr"obner basis of this ideal and study the irreducible decomposition of the variety. These algebraic results allow to parametrize the set of all robust probability distributions."
probability  markov_models  graphical_models  information_geometry  algebraic_geometry  ay.nihat  in_NB 
october 2011 by cshalizi
[1110.0718] Directed information and Pearl's causal calculus
"Probabilistic graphical models are a fundamental tool in statistics, machine learning, signal processing, and control. When such a model is defined on a directed acyclic graph (DAG), one can assign a partial ordering to the events occurring in the corresponding stochastic system. Based on the work of Judea Pearl and others, these DAG-based "causal factorizations" of joint probability measures have been used for characterization and inference of functional dependencies (causal links). This mostly expository paper focuses on several connections between Pearl's formalism (and in particular his notion of "intervention") and information-theoretic notions of causality and feedback (such as causal conditioning, directed stochastic kernels, and directed information). As an application, we show how conditional directed information can be used to develop an information-theoretic version of Pearl's "back-door" criterion for identifiability of causal effects from passive observations. This suggests that the back-door criterion can be thought of as a causal analog of statistical sufficiency."
graphical_models  causality  causal_inference  information_theory  statistics  raginsky.maxim  in_NB  to_read  kith_and_kin  sufficiency 
october 2011 by cshalizi
Markov Logic: An Interface Layer for Artificial Intelligence
"Most subfields of computer science have an interface layer via which applications communicate with the infrastructure, and this is key to their success (e.g., the Internet in networking, the relational model in databases, etc.). So far this interface layer has been missing in AI. First-order logic and probabilistic graphical models each have some of the necessary features, but a viable interface layer requires combining both. Markov logic is a powerful new language that accomplishes this by attaching weights to first-order formulas and treating them as templates for features of Markov random fields. Most statistical models in wide use are special cases of Markov logic, and first-order logic is its infinite-weight limit. Inference algorithms for Markov logic combine ideas from satisfiability, Markov chain Monte Carlo, belief propagation, and resolution. Learning algorithms make use of conditional likelihood, convex optimization, and inductive logic programming...."
graphical_models  ai  machine_learning  interface_design  domingos.pedro  in_NB  to_read  relational_learning 
october 2011 by cshalizi
[1104.5617] Learning high-dimensional directed acyclic graphs with latent and selection variables
"We consider the problem of learning causal information between random variables in directed acyclic graph (DAGs) when allowing arbitrarily many latent and selection variables. The FCI algorithm (Spirtes et al., 1999) has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose a new algorithm, the RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg."
have_read  to_teach:undergrad-ADA  graphical_models  causal_inference  in_NB  kalisch.markus  richardson.thomas_s. 
september 2011 by cshalizi
[1107.4966] Lifted Graphical Models: A Survey
"This article presents a survey of work on lifted graphical models. We review a general form for a lifted graphical model, a par-factor graph, and show how a number of existing statistical relational representations map to this formalism. We discuss inference algorithms, including lifted inference algorithms, that efficiently compute the answers to probabilistic queries. We also review work in learning lifted graphical models from data. It is our belief that the need for statistical relational models (whether it goes by that name or another) will grow in the coming decades, as we are inundated with data which is a mix of structured and unstructured, with entities and relations extracted in a noisy manner from text, and with the need to reason effectively with this data. We hope that this synthesis of ideas from many different research groups will provide an accessible starting point for new researchers in this expanding field."
graphical_models  machine_learning  getoor.lise  in_NB  to_read  relational_learning 
july 2011 by cshalizi
[1107.3536] Inverse Ising inference using all the data
"We show that a method based on logistic regression, using all the data, solves the inverse Ising problem far better than mean-field calculations relying only on sample pairwise correlation functions, while still computationally feasible for numbers of nodes in the range of hundreds. The largest improvement in reconstruction occurs for strong interactions. Using two examples, a diluted Sherrington-Kirkpatrick model and a two-dimensional lattice, we also show that interaction topologies can be recovered from few samples with good accuracy (even below the phase transitions in the corresponding forward Ising problems), and that using $l_1$-regularization is beneficial in this process."
graphical_models  logistic_regression  ising_model  sparsity 
july 2011 by cshalizi
Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
From a quick scan, I can't tell if this is a version of a paper I refereed for another conference, or just everyone condensing on the same idea at once.
graphical_models  community_discovery  network_data_analysis  machine_learning  in_NB  relational_learning 
february 2011 by cshalizi
[1102.0316] Partition Functions of Normal Factor Graphs
"One of the most common types of functions in mathematics, physics, and engineering is a sum of products, sometimes called a partition function. After "normalization," a sum of products has a natural graphical representation, called a normal factor graph (NFG), in which vertices represent factors, edges represent internal variables, and half-edges represent the external variables of the partition function. In physics, so-called trace diagrams share similar features. We believe that the conceptual framework of representing sums of products as partition functions of NFGs is an important and intuitive paradigm that, surprisingly, does not seem to have been introduced explicitly in the previous factor graph literature. Of particular interest are NFG modifications that leave the partition function invariant. A simple subclass of such NFG modifications offers a unifying view of the Fourier transform, tree-based reparameterization, loop calculus, and the Legendre transform."
graphical_models  in_NB  probability 
february 2011 by cshalizi
[1101.5108] Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes
"We investigate approximating joint distributions of random processes with causal dependence tree distributions. Such distributions are particularly useful in providing parsimonious representation when there exists causal dynamics among processes. By extending the results by Chow and Liu on dependence tree approximations, we show that the best causal dependence tree approximation is the one which maximizes the sum of directed informations on its edges, where best is defined in terms of minimizing the KL-divergence between the original and the approximate distribution. Moreover, we describe a low-complexity algorithm to efficiently pick this approximate distribution."
stochastic_processes  information_theory  graphical_models  chow-liu_trees  in_NB  coleman.todd  heard_the_talk 
february 2011 by cshalizi
[1012.3795] Estimating Networks With Jumps
"We study the problem of estimating a temporally varying coefficient and varying structure (VCVS) graphical model underlying nonstationary time series data, such as social states of interacting individuals or microarray expression profiles of gene networks, as opposed to i.i.d. data from an invariant model widely considered in current literature of structural estimation. In particular, we consider the scenario in which the model evolves in a piece-wise constant fashion. We propose a procedure that minimizes the so-called TESLA loss (i.e., temporally smoothed L1 regularized regression), which allows jointly estimating the partition boundaries of the VCVS model and the coefficient of the sparse precision matrix on each block of the partition. "
graphical_models  network_data_analysis  time_series  model_selection  statistics  xing.eric 
december 2010 by cshalizi
[1011.4328] Graphical Models Concepts in Compressed Sensing
"This paper surveys recent work in applying ideas from graphical models and message passing algorithms to solve large scale regularized regression problems. In particular, the focus is on compressed sensing reconstruction via $\ell_1$ penalized least-squares (known as LASSO or BPDN). We discuss how to derive fast approximate message passing algorithms to solve this problem. Surprisingly, the analysis of such algorithms allows to prove exact high-dimensional limit results for the LASSO risk. This paper will appear as a chapter in a book on ‘Compressed Sensing’ edited by Yonina Eldar and Gitta Kutyniok."
graphical_models  compressed_sensing  statistics  estimation 
november 2010 by cshalizi
[1011.4088] An Introduction to Conditional Random Fields
"Often we wish to predict a large number of variables that depend on each other as well as on other observed variables. Structured prediction methods are essentially a combination of classification and graphical modeling, combining the ability of graphical models to compactly model multivariate data with the ability of classification methods to perform prediction using large sets of input features. This tutorial describes conditional random fields, a popular probabilistic method for structured prediction. CRFs have seen wide application in natural language processing, computer vision, and bioinformatics. We describe methods for inference and parameter estimation for CRFs, including practical issues for implementing large scale CRFs. We do not assume previous knowledge of graphical modeling, so this tutorial is intended to be useful to practitioners in a wide variety of fields."
conditional_random_fields  graphical_models  machine_learning  in_NB  sutton.charles  mccallum.andrew 
november 2010 by cshalizi
[1010.5720] Information-theoretic inference of common ancestors
"A directed acyclic graph (DAG) partially represents the conditional independence structure among observations of a system if the local Markov condition holds .... In general, there is a whole class of DAGs that represents a given set of conditional independence relations... properties of this class that can be derived from observations of a subsystem only... we prove an information theoretic inequality that allows for the inference of common ancestors of observed parts in ... some unknown larger system.. a large amount of dependence in terms of mutual information among the observations implies the existence of a common ancestor that distributes this information... our result can be seen as a quantitative extension of Reichenbach's Principle of Common Cause... Our conclusions are valid also for non-probabilistic observations such as binary strings, since we state the proof for an axiomatized notion of mutual information that includes the stochastic as well as the algorithmic version."
graphical_models  information_theory  causal_inference  statistics  ay.nihat  have_read  algorithmic_information_theory 
november 2010 by cshalizi
Bowsher: Stochastic kinetic models: Dynamic independence, modularity and graphs
"The dynamic properties and independence structure of stochastic kinetic models (SKMs) .., An SKM is a highly multivariate jump process used to model chemical reaction networks.. We identify SKM subprocesses with the corresponding counting processes and propose a directed, cyclic graph (the kinetic independence graph or KIG) that encodes the local independence structure of their conditional intensities. Given a partition [A, D, B] of the vertices, the graphical separation A ⊥ B|D in the undirected KIG has an intuitive chemical interpretation and implies that A is locally independent of B given A ∪ D. ... this separation also results in global independence of the internal histories of A and B conditional on a history of the jumps in D which ... mathematical definition of a modularization of an SKM using its implied dynamics. Graphical decomposition methods are developed for the identification and efficient computation of nested modularizations. "
interacting_particle_systems  graphical_models  biochemical_networks 
july 2010 by cshalizi
[1004.2287] An empirical comparative study of approximate methods for binary graphical models; application to the search of associations among causes of death in French death certificates
"Looking for associations among multiple variables is a topical issue in statistics due to the increasing amount of data encountered in biology, medicine and many other domains involving statistical applications. Graphical models have recently gained popularity for this purpose in the statistical literature. Following the ideas of the LASSO procedure designed for the linear regression framework, recent developments dealing with graphical model selection have been based on $\ell_1$-penalization. In the binary case, however, exact inference is generally very slow or even intractable because of [the] log-partition function. Various approximate methods have recently been proposed in the literature ... Through an extensive simulation study, we show that a simple modification of a method relying on a Gaussian approximation achieves good performance and is very fast. We present a real application in which we search for associations among causes of death recorded on French death certificates."
graphical_models  lasso  model_selection 
april 2010 by cshalizi
[1004.2304] Spatio-Temporal Graphical Model Selection
"We consider the problem of estimating the topology of spatial interactions in a discrete state, discrete time spatio-temporal graphical model where the interactions affect the temporal evolution of each agent in a network. Among other models, the susceptible, infected, recovered ($SIR$) model for interaction events fall into this framework. We pose the problem as a structure learning problem and solve it using an $\ell_1$-penalized likelihood convex program. We evaluate the solution on a simulated spread of infectious over a complex network. Our topology estimates outperform those of a standard spatial Markov random field graphical model selection using $\ell_1$-regularized logistic regression."
graphical_models  random_fields  lasso  model_selection 
april 2010 by cshalizi
[1002.4802] Gaussian Process Structural Equation Models with Latent Variables
"In a variety of disciplines such as social sciences, psychology, medicine and economics, the recorded data are considered to be noisy measurements of latent variables connected by some causal structure. This corresponds to a family of graphical models known as the structural equation model with latent variables. While linear non-Gaussian variants have been well-studied, inference in nonparametric structural equation models is still underdeveloped. We introduce a sparse Gaussian process parameterization that defines a non-linear structure connecting latent variables, unlike common formulations of Gaussian process latent variable models. An efficient Markov chain Monte Carlo procedure is described. We evaluate the stability of the sampling procedure and the predictive ability of the model compared against the current practice."
statistics  graphical_models  latent_variables  nonparametrics  estimation  heard_the_talk 
february 2010 by cshalizi
« earlier      

related tags

ahmed.amr  ai  airoldi.edo  algebraic_geometry  algorithmic_information_theory  anomaly_detection  artificial_intelligence  ay.nihat  biochemical_networks  bioinformatics  blogged  books:noted  books:recommended  book_reviews  bootstrap  buhlmann.peter  causality  causal_inference  change-point_problem  chow-liu_algorithm  chow-liu_trees  chu.tianjiao  classifiers  climatology  clustering  coleman.todd  combinatorics  community_discovery  compressed_sensing  computational_complexity  computational_statistics  conditional_random_fields  confidence_sets  confounding  contagion  contingency_tables  copulas  coveted  cross-validation  das.kaustav  data_mining  decision-making  density_estimation  design_for_a_brain  didelez.vanessa  dimension_reduction  domingos.pedro  econometrics  economics  emergence  ensemble_methods  estimation  expectation-maximization  exponential_families  factor_analysis  feature_selection  feedback  fmri  food_webs  friedman.nir  functional_connectivity  gene_expression_data_analysis  getoor.lise  glymour.clark  grammar_induction  granger_causality  graphical_models  graph_theory  haavelmo.trygve  have_read  heard_the_talk  heavy_tails  heuristics  hierarchical_models  historical_linguistics  hofling.holger  hofmann.thomas  homophily  hoyer.patrik  hsu.daniel  identifiability  independent_component_analysis  inference_to_latent_objects  information_criteria  information_geometry  information_retrieval  information_theory  interacting_particle_systems  interface_design  internet  in_NB  ising_model  janzing.dominik  jordan.michael_i.  kakade.sham  kalisch.markus  kith_and_kin  koller.daphne  lacerda.gustavo  lafferty.john  large_deviations  lasso  latent_semantic_analysis  latent_variables  lauritzen.steffen  lead  learning_theory  linear_regression  liu.han  logic  logistic_regression  machine_learning  macro_from_micro  markov_models  mccallum.andrew  meinshausen.nicolai  methodology  mixture_models  model_discovery  model_search  model_selection  multiple_testing  networks  network_data_analysis  network_formation  neural_data_analysis  neural_networks  neuroscience  nielsen.michael  non-stationarity  nonparametrics  particle_filters  pearl.judea  phase_transitions  philosophy_of_mind  philosophy_of_science  point_processes  principal_components  probability  R  raginsky.maxim  ramsey.joseph  random_fields  ravikumar.pradeep  re:AoS_project  re:critique_of_diffusion  re:functional_communities  re:g_paper  re:homophily_and_confounding  re:institutions_as_collective_degrees_of_freedom  re:LOB_project  re:what_is_the_right_null_model_for_linear_regression  re:your_favorite_dsge_sucks  rebonato.riccard  regression  reichenbach.hans  relational_learning  richardson.thomas_s.  risk_assessment  robins.james  roeder.kathryn  selection_bias  self-centered  shpister.ilya  smola.alex  social_influence  social_networks  song.le  sparsity  spirtes.peter  statistical_learning  statistical_mechanics  statistics  stochastic_processes  structural_equations  sufficiency  support_vector_machines  sutton.charles  taskar.ben  tetrad  text_mining  tibshirani.robert  time_series  to:blog  to:NB  to_read  to_teach:complexity-and-inference  to_teach:data-mining  to_teach:undergrad-ADA  track_down_references  tutorials  variable_selection  variational_methods  via:arthegall  via:ded-maxim  via:judea_pearl  via:klk  via:matthew_berryman  via:shivak  voter_model  wainwright.martin  wasserman.larry  wermuth.nanny  xing.eric  yee.danny  zhang.jiji  zhang.tong 

Copy this bookmark:



description:


tags: