cshalizi + information_theory 159
Measures of mutual and causal dependence between two time series
17 days ago by cshalizi
"New measures are proposed for mutual and causal dependence between two time series, based on information theoretical ideas. The measure of mutual dependence is shown to be the sum of the measure of unidirectional causal dependence from the first time series to the second, the measure of unidirectional causal dependence from the second to the first, and the measure of instantaneous causal dependence. The measures are applicable to any kind of time series: continuous, discrete, or categorical."
to:NB
causality
information_theory
stochastic_processes
rissanen.jorma
via:coleman
17 days ago by cshalizi
A Normal Law for the Plug-in Estimator of Entropy
4 weeks ago by cshalizi
"This paper establishes a sufficient condition for the asymptotic normality of the plug-in estimator of Shannon's entropy defined on a countable alphabet. The sufficient condition covers a range of cases with countably infinite alphabets, for which no normality results were previously known."
in_NB
entropy_estimation
statistics
information_theory
4 weeks ago by cshalizi
[1204.2612] Computing bounds for entropy of stationary Z^d Markov random fields
6 weeks ago by cshalizi
"For any stationary $mZ^d$-Gibbs measure that satisfies strong spatial mixing, we obtain sequences of upper and lower approximations that converge to its entropy. In the case, $d=2$, these approximations are efficient in the sense that the approximations are accurate to within $epsilon$ and can be computed in time polynomial in $1/epsilon$."
to:NB
information_theory
markov_models
stochastic_processes
entropy
6 weeks ago by cshalizi
Phys. Rev. E 85, 031129 (2012): Entropy production and Kullback-Leibler divergence between stationary trajectories of discrete systems
6 weeks ago by cshalizi
"The irreversibility of a stationary time series can be quantified using the Kullback-Leibler divergence (KLD) between the probability of observing the series and the probability of observing the time-reversed series. Moreover, this KLD is a tool to estimate entropy production from stationary trajectories since it gives a lower bound to the entropy production of the physical process generating the series. In this paper we introduce analytical and numerical techniques to estimate the KLD between time series generated by several stochastic dynamics with a finite number of states. We examine the accuracy of our estimators for a specific example, a discrete flashing ratchet, and investigate how close the KLD is to the entropy production depending on the number of degrees of freedom of the system that are sampled in the trajectories."
to:NB
stochastic_processes
information_theory
6 weeks ago by cshalizi
[0802.4363] Estimating the entropy of binary time series: Methodology, some theory and a simulation study
6 weeks ago by cshalizi
"Partly motivated by entropy-estimation problems in neuroscience, we present a detailed and extensive comparison between some of the most popular and effective entropy estimation methods used in practice: The plug-in method, four different estimators based on the Lempel-Ziv (LZ) family of data compression algorithms, an estimator based on the Context-Tree Weighting (CTW) method, and the renewal entropy estimator.
"**Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency."
in_NB
to_read
entropy_estimation
information_theory
time_series
statistics
kontoyiannis.ioannis
re:stacs
"**Methodology. Three new entropy estimators are introduced. For two of the four LZ-based estimators, a bootstrap procedure is described for evaluating their standard error, and a practical rule of thumb is heuristically derived for selecting the values of their parameters. ** Theory. We prove that, unlike their earlier versions, the two new LZ-based estimators are consistent for every finite-valued, stationary and ergodic process. An effective method is derived for the accurate approximation of the entropy rate of a finite-state HMM with known distribution. Heuristic calculations are presented and approximate formulas are derived for evaluating the bias and the standard error of each estimator. ** Simulation. All estimators are applied to a wide range of data generated by numerous different processes with varying degrees of dependence and memory. Some conclusions drawn from these experiments include: (i) For all estimators considered, the main source of error is the bias. (ii) The CTW method is repeatedly and consistently seen to provide the most accurate results. (iii) The performance of the LZ-based estimators is often comparable to that of the plug-in method. (iv) The main drawback of the plug-in method is its computational inefficiency."
6 weeks ago by cshalizi
Relative Entropy and Exponential Deviation Bounds for General Markov Chains
6 weeks ago by cshalizi
"We develop explicit, general bounds for the prob- ability that the normalized partial sums of a function of a Markov chain on a general alphabet will exceed the steady-state mean of that function by a given amount. Our bounds combine simple information-theoretic ideas together with techniques from optimization and some fairly elementary tools from analysis. In one direction, we obtain a general bound for the important class of Doeblin chains; this bound is optimal, in the sense that in the special case of independent and identically distributed random variables it essentially reduces to the classical Hoeffding bound. In another direction, motivated by important problems in simulation, we develop a series of bounds in a form which is particularly suited to these problems, and which apply to the more general class of “geometrically ergodic” Markov chains."
to:NB
to_read
deviation_bounds
markov_models
stochastic_processes
via:ded-maxim
meyn.sean
kontoyiannis.ioannis
mixing
information_theory
6 weeks ago by cshalizi
[1204.2003] Directed Information Graphs
6 weeks ago by cshalizi
"We propose two graphical models to represent a concise description of the causal statistical dependence structure between a group of coupled stochastic processes. The first, minimum generative model graphs, is motivated by generative models. The second, directed information graphs, is motivated by Granger causality. We show that under mild assumptions, the graphs are identical. In fact, these are analogous to Bayesian and Markov networks respectively, in terms of Markov blankets and I-map properties. Furthermore, the underlying variable dependence structure is the unique causal Bayesian network. Lastly, we present a method using minimal-dimension statistics to identify the structure when upper bounds on the in-degrees are known. Simulations show the effectiveness of the approach."
to:NB
graphical_models
to_read
re:functional_communities
causality
information_theory
coleman.todd
6 weeks ago by cshalizi
[0704.1751] Information Theoretic Proofs of Entropy Power Inequalities
6 weeks ago by cshalizi
"While most useful information theoretic inequalities can be deduced from the basic properties of entropy or mutual information, up to now Shannon's entropy power inequality (EPI) is an exception: Existing information theoretic proofs of the EPI hinge on representations of differential entropy using either Fisher information or minimum mean-square error (MMSE), which are derived from de Bruijn's identity. In this paper, we first present an unified view of these proofs, showing that they share two essential ingredients: 1) a data processing argument applied to a covariance-preserving linear transformation; 2) an integration over a path of a continuous Gaussian perturbation. Using these ingredients, we develop a new and brief proof of the EPI through a mutual information inequality, which replaces Stam and Blachman's Fisher information inequality (FII) and an inequality for MMSE by Guo, Shamai and Verd'u used in earlier proofs. The result has the advantage of being very simple in that it relies only on the basic properties of mutual information. These ideas are then generalized to various extended versions of the EPI: Zamir and Feder's generalized EPI for linear transformations of the random variables, Takano and Johnson's EPI for dependent variables, Liu and Viswanath's covariance-constrained EPI, and Costa's concavity inequality for the entropy power."
to:NB
information_theory
fisher_information
via:ded-maxim
6 weeks ago by cshalizi
[1203.3037] Expanding the Transfer Entropy to Identify Information Subgraphs in Complex Systems
7 weeks ago by cshalizi
"We propose a formal expansion of the transfer entropy to put in evidence irreducible sets of variables which provide information for the future state of each assigned target. Multiplets characterized by an high value will be associated to informational circuits present in the system, with an informational character (synergetic or redundant) which can be associated to the sign of the contribution. We also present preliminary results on fMRI and EEG data sets."
in_NB
graphical_models
information_theory
community_discovery
time_series
re:functional_communities
7 weeks ago by cshalizi
[1203.6502] Quantifying causal influences
8 weeks ago by cshalizi
"Common methods of causal inference generate directed acyclic graphs (DAGs) that formalize causal relations between n variables. Given the joint distribution of all these variables, the DAG contains all information about how intervening on one variable would change the distribution of the other n-1 variables. It remains, however, a non-trivial question how to quantify the causal influence of one variable on another one.
Here we propose a measure for causal strength that refers to direct effects and measure the "strength of an arrow" or a set of arrows. It is based on a hypothetical intervention that modifies the joint distribution by cutting the corresponding edge. The causal strength is then the relative entropy distance between the old and the new distribution.
We discuss other measures of causal strength like the average causal effect, transfer entropy and information flow and describe their limitations. We argue that our measure is also more appropriate for time series than the known ones.
Finally, we discuss conceptual problems in defining the strength of indirect effects."
to:NB
to_read
causality
graphical_models
information_theory
statistics
via:ded-maxim
Here we propose a measure for causal strength that refers to direct effects and measure the "strength of an arrow" or a set of arrows. It is based on a hypothetical intervention that modifies the joint distribution by cutting the corresponding edge. The causal strength is then the relative entropy distance between the old and the new distribution.
We discuss other measures of causal strength like the average causal effect, transfer entropy and information flow and describe their limitations. We argue that our measure is also more appropriate for time series than the known ones.
Finally, we discuss conceptual problems in defining the strength of indirect effects."
8 weeks ago by cshalizi
Entropy Estimation in Turing's Perspective
8 weeks ago by cshalizi
"A new nonparametric estimator of Shannon's entropy on a countable alphabet is proposed and analyzed against the well-known plug-in estimator. The proposed estimator is developed based on Turing's formula, which recovers distributional characteristics on the subset of the alphabet not covered by a size-n sample. The fundamental switch in perspective brings about substantial gain in estimation accuracy for every distribution with finite entropy. In general, a uniform variance upper bound is established for the entire class of distributions with finite entropy that decays at a rate of O(ln(n)/n) compared to O([ln(n)]2/n) for the plug-in. In a wide range of subclasses, the variance of the proposed estimator converges at a rate of O(1/n), and this rate of convergence carries over to the convergence rates in mean squared errors in many subclasses. Specifically, for any finite alphabet, the proposed estimator has a bias decaying exponentially in n. Several new bias-adjusted estimators are also discussed."
in_NB
entropy_estimation
statistics
information_theory
8 weeks ago by cshalizi
[1203.3271] The thermodynamics of prediction
10 weeks ago by cshalizi
"A system responding to a stochastic driving signal can be interpreted as computing, by means of its dynamics, an (implicit) model of the environmental variables. The system's state retains information about past environmental fluctuations, and a fraction of this information is predictive of future ones. The remaining nonpredictive information reflects model complexity that does not improve predictive power, and represents the ineffectiveness of the model. We expose the fundamental equivalence between this model inefficiency and thermodynamic inefficiency, measured by the energy dissipated during the interaction between system and environment. Our results hold arbitrarily far from thermodynamic equilibrium and are applicable to a wide range of systems, including biomolecular machines. They highlight a profound connection between the effective use of information and efficient thermodynamic operation: any system constructed to keep memory about its environment and to operate energetically efficiently has to be predictive."
--- Hrmph, time to send some copies of old papers.
to:NB
thermodynamics
complexity_measures
information_theory
grumble
--- Hrmph, time to send some copies of old papers.
10 weeks ago by cshalizi
[1202.1787] Greedy Learning of Markov Network Structure
12 weeks ago by cshalizi
"We propose a new yet natural algorithm for learning the graph structure of general discrete graphical models (a.k.a. Markov random fields) from samples. Our algorithm finds the neighborhood of a node by sequentially adding nodes that produce the largest reduction in empirical conditional entropy; it is greedy in the sense that the choice of addition is based only on the reduction achieved at that iteration. Its sequential nature gives it a lower computational complexity as compared to other existing comparison-based techniques, all of which involve exhaustive searches over every node set of a certain size. Our main result characterizes the sample complexity of this procedure, as a function of node degrees, graph size and girth in factor-graph representation. We subsequently specialize this result to the case of Ising models, where we provide a simple transparent characterization of sample complexity as a function of model and graph parameters.
For tree graphs, our algorithm is the same as the classical Chow-Liu algorithm, and in that sense can be considered the extension of the same to graphs with cycles."
to:NB
graphical_models
machine_learning
information_theory
For tree graphs, our algorithm is the same as the classical Chow-Liu algorithm, and in that sense can be considered the extension of the same to graphs with cycles."
12 weeks ago by cshalizi
[0808.4042] Statistical models, likelihood, penalized likelihood and hierarchical likelihood
12 weeks ago by cshalizi
"We give an overview of statistical models and likelihood, together with two of its variants: penalized and hierarchical likelihood. The Kullback-Leibler divergence is referred to repeatedly, for defining the misspecification risk of a model, for grounding the likelihood and the likelihood crossvalidation which can be used for choosing weights in penalized likelihood. Families of penalized likelihood and sieves estimators are shown to be equivalent. The similarity of these likelihood with a posteriori distributions in a Bayesian approach is considered."
in_NB
statistics
likelihood
bayesianism
information_theory
12 weeks ago by cshalizi
[0809.1017] Entropy Concentration and the Empirical Coding Game
12 weeks ago by cshalizi
"We give a characterization of Maximum Entropy/Minimum Relative Entropy inference by providing two `strong entropy concentration' theorems. These theorems unify and generalize Jaynes' `concentration phenomenon' and Van Campenhout and Cover's `conditional limit theorem'. The theorems characterize exactly in what sense a prior distribution Q conditioned on a given constraint, and the distribution P, minimizing the relative entropy D(P ||Q) over all distributions satisfying the constraint, are `close' to each other. We then apply our theorems to establish the relationship between entropy concentration and a game-theoretic characterization of Maximum Entropy Inference due to Topsoe and others."
to:NB
statistics
information_theory
maximum_entropy
grunwald.peter
12 weeks ago by cshalizi
[0810.5302] A class of R'{e}nyi information estimators for multidimensional densities
12 weeks ago by cshalizi
"A class of estimators of the R'{e}nyi and Tsallis entropies of an unknown distribution $f$ in $mathbb{R}^m$ is presented. These estimators are based on the $k$th nearest-neighbor distances computed from a sample of $N$ i.i.d. vectors with distribution $f$. We show that entropies of any order $q$, including Shannon's entropy, can be estimated consistently with minimal assumptions on $f$. Moreover, we show that it is straightforward to extend the nearest-neighbor method to estimate the statistical distance between two distributions using one i.i.d. sample from each."
in_NB
statistics
entropy_estimation
information_theory
renyi_entropy
12 weeks ago by cshalizi
[1202.1523] Information Forests
february 2012 by cshalizi
"We describe Information Forests, an approach to classification that generalizes Random Forests by replacing the splitting criterion of non-leaf nodes from a discriminative one -- based on the entropy of the label distribution -- to a generative one -- based on maximizing the information divergence between the class-conditional distributions in the resulting partitions. The basic idea consists of deferring classification until a measure of "classification confidence" is sufficiently high, and instead breaking down the data so as to maximize this measure. In an alternative interpretation, Information Forests attempt to partition the data into subsets that are "as informative as possible" for the purpose of the task, which is to classify the data. Classification confidence, or informative content of the subsets, is quantified by the Information Divergence. Our approach relates to active learning, semi-supervised learning, mixed generative/discriminative learning."
After reading: meh.
have_read
decision_trees
information_theory
classifiers
machine_learning
to_teach:data-mining
re:AoS_project
After reading: meh.
february 2012 by cshalizi
Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
february 2012 by cshalizi
"We present a unifying framework for information theoretic feature selection, bringing almost two decades of research on heuristic filter criteria under a single theoretical interpretation. This is in response to the question: "what are the implicit statistical assumptions of feature selection criteria based on mutual information?". To answer this, we adopt a different strategy than is usual in the feature selection literature−instead of trying to define a criterion, we derive one, directly from a clearly specified objective function: the conditional likelihood of the training labels. While many hand-designed heuristic criteria try to optimize a definition of feature 'relevancy' and 'redundancy', our approach leads to a probabilistic framework which naturally incorporates these concepts. As a result we can unify the numerous criteria published over the last two decades, and show them to be low-order approximations to the exact (but intractable) optimisation problem. The primary contribution is to show that common heuristics for information based feature selection (including Markov Blanket algorithms as a special case) are approximate iterative maximisers of the conditional likelihood. A large empirical study provides strong evidence to favour certain classes of criteria, in particular those that balance the relative size of the relevancy/redundancy terms. Overall we conclude that the JMI criterion (Yang and Moody, 1999; Meyer et al., 2008) provides the best tradeoff in terms of accuracy, stability, and flexibility with small data samples."
in_NB
information_theory
statistics
variable_selection
model_selection
to_teach:data-mining
to:blog
machine_learning
classifiers
have_read
graphical_models
february 2012 by cshalizi
Simon and Tibshirani: COMMENT ON “DETECTING NOVEL ASSOCIATIONS IN LARGE DATA SETS” BY RESHEF ET AL, SCIENCE DEC 16, 2011
february 2012 by cshalizi
Since this is publicly online now, I guess I can write that post.
information_theory
statistics
hypothesis_testing
tibshirani.robert
to:blog
independence_testing
february 2012 by cshalizi
f-Divergence Estimation and Two-Sample Homogeneity Test Under Semiparametric Density-Ratio Models
february 2012 by cshalizi
"A density ratio is defined by the ratio of two probability densities. We study the inference problem of density ratios and apply a semiparametric density-ratio estimator to the two-sample homogeneity test. In the proposed test procedure, the $f$-divergence between two probability densities is estimated using a density-ratio estimator. The $f$ -divergence estimator is then exploited for the two-sample homogeneity test. We derive an optimal estimator of $f$-divergence in the sense of the asymptotic variance in a semiparametric setting, and provide a statistic for two-sample homogeneity test based on the optimal estimator. We prove that the proposed test dominates the existing empirical likelihood score test. Through numerical studies, we illustrate the adequacy of the asymptotic theory for finite-sample inference."
to:NB
statistics
density_estimation
information_theory
hypothesis_testing
two-sample_tests
february 2012 by cshalizi
[1201.2334] Universal Estimation of Directed Information
january 2012 by cshalizi
"We propose four approaches to estimating the directed information rate between a pair of jointly stationary ergodic processes with the help of universal probability assignments. The four approaches yield estimators with different merits such as nonnegativity and boundedness. We establish consistency of these estimators in various senses and derive near-optimal rates of convergence in the minimax sense under mild conditions. The estimators carry over directly to estimating other information measures of stationary ergodic processes, such as entropy rate and mutual information rate, and provide alternatives to classical approaches in the existing literature. Guided by the theoretical results, we use context tree weighting as the vehicle for the implementations of the proposed estimators. Experiments on synthetic and real data are presented, demonstrating the potential of the proposed schemes in practice and the efficacy of directed information estimation as a tool for detecting and measuring causality and delay."
in_NB
to_read
information_theory
entropy_estimation
directed_information
stochastic_processes
nonparametrics
statistics
re:AoS_project
january 2012 by cshalizi
[1201.2056] Adaptive Context Tree Weighting
january 2012 by cshalizi
"We describe an adaptive context tree weighting (ACTW) algorithm, as an extension to the standard context tree weighting (CTW) algorithm. Unlike the standard CTW algorithm, which weights all observations equally regardless of the depth, ACTW gives increasing weight to more recent observations, aiming to improve performance in cases where the input sequence is from a non-stationary distribution. Data compression results show ACTW variants improving over CTW on merged files from standard compression benchmark tests while never being significantly worse on any individual file."
to:NB
information_theory
non-stationarity
markov_models
stochastic_processes
re:AoS_project
january 2012 by cshalizi
[1112.3257] Exact Computation of Kullback-Leibler Distance for Hidden Markov Trees and Models
december 2011 by cshalizi
"We suggest new recursive formulas to compute the exact value of the Kullback-Leibler distance (KLD) between two general Hidden Markov Trees (HMTs). For homogeneous HMTs with regular topology, such as homogeneous Hidden Markov Models (HMMs), we obtain a closed-form expression for the KLD when no evidence is given. We generalize our recursive formulas to the case of HMMs conditioned on the observable variables. Our proposed formulas are validated through several numerical examples in which we compare the exact KLD value with Monte Carlo estimations."
to:NB
to_read
re:AoS_project
markov_models
stochastic_processes
information_theory
december 2011 by cshalizi
[1111.5648] Falsification and future performance
december 2011 by cshalizi
"We information-theoretically reformulate two measures of capacity from statistical learning theory: empirical VC-entropy and empirical Rademacher complexity. We show these capacity measures count the number of hypotheses about a dataset that a learning algorithm falsifies when it finds the classifier in its repertoire minimizing empirical risk. It then follows from that the future performance of predictors on unseen data is controlled in part by how many hypotheses the learner falsifies. As a corollary we show that empirical VC-entropy quantifies the message length of the true hypothesis in the optimal code of a particular probability distribution, the so-called actual repertoire."
to:NB
to_read
information_theory
learning_theory
falsification
balduzzi.david
december 2011 by cshalizi
[1111.0483] Optimally approximating exponential families
november 2011 by cshalizi
"This article studies exponential families $mathcal{E}$ on finite sets such that the information divergence $D(P|mathcal{E})$ of an arbitrary probability distribution from $mathcal{E}$ is bounded by some constant $D>0$. A particular class of low-dimensional exponential families that have low values of $D$ can be obtained from partitions of the state space. The main results concern optimality properties of these partition exponential families. Exponential families where $D=log(2)$ are studied in detail. This case is special, because if $D<log(2)$, then $mathcal{E}$ contains all probability measures with full support."
to:NB
exponential_families
probability
information_theory
approximation
november 2011 by cshalizi
Banff blog « An Ergodic Walk
october 2011 by cshalizi
Sounds delightful (and Banff is beautiful).
conferences
statistics
learning_theory
statistical_inference_for_stochastic_processes
track_down_references
markov_models
network_data_analysis
estimation
van_handel.ramon
nonparametrics
re:smoothing_adjacency_matrices
information_theory
october 2011 by cshalizi
Information Processing and Learning 10-704
october 2011 by cshalizi
I'd take this, if I could.
information_theory
machine_learning
singh.aarti
october 2011 by cshalizi
[1110.3592] Information, learning and falsification
october 2011 by cshalizi
"Broadly speaking, there are two approaches to quantifying information. The first, Shannon information, takes events as belonging to ensembles and quantifies the information resulting from observing the given event in terms of the number of alternate events that have been ruled out. The second, algorithmic information or Kolmogorov complexity, takes events as strings and, given a universal Turing machine, quantifies the information content of a string as the length of the shortest program producing it. This note describes a new method of quantifying information, effective information, that links algorithmic information to Shannon information, and also links both to capacities arising in statistical learning theory. After introducing the measure, we show that it provides a non-universal analog of algorithmic information. We then apply it to derive basic capacities in statistical learning theory: empirical VC-entropy and empirical Rademacher complexity. A nice byproduct of our approach is an interpretation of the explanatory power of a learning algorithm in terms of the number of hypotheses it falsifies (counted in two different ways for the two different capacities). We also discuss how effective information relates to information gain, Shannon and mutual information."
to:NB
to_read
learning_theory
information_theory
balduzzi.david
october 2011 by cshalizi
[1110.2515] Normalized Mutual Information to evaluate overlapping community finding algorithms
october 2011 by cshalizi
"Given the increasing popularity of algorithms for overlapping clustering, in particular in social network analysis, quantitative measures are needed to measure the accuracy of a method. Given a set of true clusters, and the set of clusters found by an algorithm, these sets of clusters must be compared to see how similar or different the sets are. A normalized measure is desirable in many contexts, for example assigning a value of 0 where the two sets are totally dissimilar, and 1 where they are identical. A measure based on normalized mutual information, [1], has recently become popular. We demonstrate unintuitive behaviour of this measure, and show how this can be corrected by using a more conventional normalization. We compare the results to that of other measures, such as the Omega index [2]."
in_NB
community_discovery
information_theory
clustering
data_mining
october 2011 by cshalizi
[1110.2724] Information Transfer in Social Media
october 2011 by cshalizi
"Recent research has explored the increasingly important role of social media by examining the dynamics of individual and group behavior, characterizing patterns of information diffusion, and identifying influential individuals. In this paper we suggest a measure of causal relationships between nodes based on the information-theoretic notion of transfer entropy, or information transfer. This theoretically grounded measure is based on dynamic information, captures fine-grain notions of influence, and admits a natural, predictive interpretation. Causal networks inferred by transfer entropy can differ significantly from static friendship networks because most friendship links are not useful for predicting future dynamics. We demonstrate through analysis of synthetic and real-world data that transfer entropy reveals meaningful hidden network structures. In addition to altering our notion of who is influential, transfer entropy allows us to differentiate between weak influence over large groups and strong influence over small groups."
to:NB
to_read
re:functional_communities
re:social-networks-as-sensor-networks
information_theory
galstyan.aram
social_media
networks
october 2011 by cshalizi
The Effect of Noise Correlations in Populations of Diversely Tuned Neurons
october 2011 by cshalizi
"The amount of information encoded by networks of neurons critically depends on the correlation structure of their activity. Neurons with similar stimulus preferences tend to have higher noise correlations than others. In homogeneous populations of neurons, this limited range correlation structure is highly detrimental to the accuracy of a population code. Therefore, reduced spike count correlations under attention, after adaptation, or after learning have been interpreted as evidence for a more efficient population code. Here, we analyze the role of limited range correlations in more realistic, heterogeneous population models. We use Fisher information and maximum-likelihood decoding to show that reduced correlations do not necessarily improve encoding accuracy. In fact, in populations with more than a few hundred neurons, increasing the level of limited range correlations can substantially improve encoding accuracy. We found that this improvement results from a decrease in noise entropy that is associated with increasing correlations if the marginal distributions are unchanged. Surprisingly, for constant noise entropy and in the limit of large populations, the encoding accuracy is independent of both structure and magnitude of noise correlations."
information_theory
to:NB
neuroscience
neural_coding_and_decoding
october 2011 by cshalizi
[1110.0718] Directed information and Pearl's causal calculus
october 2011 by cshalizi
"Probabilistic graphical models are a fundamental tool in statistics, machine learning, signal processing, and control. When such a model is defined on a directed acyclic graph (DAG), one can assign a partial ordering to the events occurring in the corresponding stochastic system. Based on the work of Judea Pearl and others, these DAG-based "causal factorizations" of joint probability measures have been used for characterization and inference of functional dependencies (causal links). This mostly expository paper focuses on several connections between Pearl's formalism (and in particular his notion of "intervention") and information-theoretic notions of causality and feedback (such as causal conditioning, directed stochastic kernels, and directed information). As an application, we show how conditional directed information can be used to develop an information-theoretic version of Pearl's "back-door" criterion for identifiability of causal effects from passive observations. This suggests that the back-door criterion can be thought of as a causal analog of statistical sufficiency."
graphical_models
causality
causal_inference
information_theory
statistics
raginsky.maxim
in_NB
to_read
kith_and_kin
sufficiency
october 2011 by cshalizi
Divergence in everything: Cramér-Rao from data processing « The Information Structuralist
july 2011 by cshalizi
Very nice. Reminds me of Pitman's proof in _Basic Concepts_, but I need to go back and look at how that went.
fisher_information
statistics
information_theory
estimation
raginsky.maxim
to:blog
july 2011 by cshalizi
[0812.0567] The ensemble of random Markov matrices
july 2011 by cshalizi
"The ensemble of random Markov matrices is introduced as a set of Markov or stochastic matrices with the maximal Shannon entropy. The statistical properties of the stationary distribution pi, the average entropy growth rate $h$ and the second largest eigenvalue nu across the ensemble are studied. It is shown and heuristically proven that the entropy growth-rate and second largest eigenvalue of Markov matrices scale in average with dimension of matrices d as h ~ log(O(d)) and nu ~ d^(-1/2), respectively, yielding the asymptotic relation h tau_c ~ 1/2 between entropy h and correlation decay time tau_c = -1/log|nu| . Additionally, the correlation between h and and tau_c is analysed and is decreasing with increasing dimension d."
markov_models
stochastic_processes
spectral_gap
information_theory
in_NB
july 2011 by cshalizi
Waiting for Landauer - PhilSci-Archive
may 2011 by cshalizi
"Landauer's Principle asserts that there is an unavoidable cost in thermodynamic entropy creation when data is erased. It is usually derived from incorrect assumptions, most notably, that erasure must compress the phase space of a memory device or that thermodynamic entropy arises from the probabilistic uncertainty of random data. Recent work seeks to prove Landauer’s Principle without using these assumptions. I show that the processes assumed in the proof, and in the thermodynamics of computation more generally, can be combined to produce devices that both violate the second law and erase data without entropy cost, indicating an inconsistency in the theoretical system. Worse, the standard repertoire of processes selectively neglects thermal fluctuations..."
landauers_principle
norton.john
thermodynamics
statistical_mechanics
information_theory
physics_of_information
may 2011 by cshalizi
Information Theoretic Econometric Models - Academic and Professional Books - Cambridge University Press
may 2011 by cshalizi
The title is intriguing, but I can't find anything more about this...
books:noted
information_theory
statistics
econometrics
may 2011 by cshalizi
Fourdrinier , Marchand , Righi , Strawderman : On improved predictive density estimation with parametric constraints
april 2011 by cshalizi
Too Gaussian to be applicable, but perhaps useful to think through.
prediction
information_theory
statistics
to:NB
april 2011 by cshalizi
The Neural Costs of Optimal Control
february 2011 by cshalizi
Clever, but their key result is a special case of the variational representation of relative entropy. In fact, that might even amount to a new, decision-theoretic interpretation of relative entropy...
decision_theory
information_theory
to:NB
have_read
february 2011 by cshalizi
[1102.1507] Generalized Measures of Information Transfer
february 2011 by cshalizi
"Transfer entropy provides a general tool for analyzing the magnitudes and directions---but not the \emph{kinds}---of information transfer in a system. We extend transfer entropy in two complementary ways. First, we distinguish state-dependent from state-independent transfer, based on whether a source's influence depends on the state of the target. Second, for multiple sources, we distinguish between unique, redundant, and synergistic transfer. The new measures are demonstrated on several systems that extend examples from previous literature." --- I'm inclined to think there's not much too this from a superficial scan; but that's superficial.
information_theory
to:NB
february 2011 by cshalizi
Focused review: Estimating the amount of information conveyed by a population of neurons
february 2011 by cshalizi
Hey, they cite us positively! By induction, the rest of the paper must be equally intelligent, learned and insightful.
to_read
neuroscience
information_theory
neural_data_analysis
entropy_estimation
neural_coding_and_decoding
february 2011 by cshalizi
[1102.0365] Limit Theorems for the Sample Entropy of Hidden Markov Chains
february 2011 by cshalizi
"The Shannon-McMillan-Breiman theorem asserts that the sample entropy of a stationary and ergodic stochastic process converges to the entropy rate of the same process almost surely. In this paper, we focus our attention on the convergence behavior of the sample entropy of a hidden Markov chain. Under certain positivity assumption, we prove that a central limit theorem (CLT) with some Berry-Esseen bound for the sample entropy of a hidden Markov chain, and we use this CLT to establish a law of iterated logarithm (LIL) for the sample entropy."
information_theory
markov_models
central_limit_theorem
february 2011 by cshalizi
[1101.5108] Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes
february 2011 by cshalizi
"We investigate approximating joint distributions of random processes with causal dependence tree distributions. Such distributions are particularly useful in providing parsimonious representation when there exists causal dynamics among processes. By extending the results by Chow and Liu on dependence tree approximations, we show that the best causal dependence tree approximation is the one which maximizes the sum of directed informations on its edges, where best is defined in terms of minimizing the KL-divergence between the original and the approximate distribution. Moreover, we describe a low-complexity algorithm to efficiently pick this approximate distribution."
stochastic_processes
information_theory
graphical_models
chow-liu_trees
in_NB
coleman.todd
heard_the_talk
february 2011 by cshalizi
[1101.0255] Conditional information and definition of neighbor in categorical random fields
january 2011 by cshalizi
"Who then is my neighbor?" (Not an actual quote from the paper.)
random_fields
information_theory
stochastic_processes
spatial_statistics
january 2011 by cshalizi
[1012.5457] Concentration of the information in data with log-concave distributions
january 2011 by cshalizi
"A concentration property of the functional $-\log f(X)$ is demonstrated, when a random vector $X$ has a log-concave density $f$ on $\R^n$. This concentration property implies in particular an extension of the Shannon-McMillan-Breiman strong ergodic theorem to the class of discrete-time stochastic processes with log-concave marginals."
information_theory
concentration_of_measure
ergodic_theory
in_NB
have_read
january 2011 by cshalizi
[1012.4401] A Note on a Characterization of R'enyi Measures and its Relation to Composite Hypothesis Testing
december 2010 by cshalizi
"The R'enyi information measures are characterized in terms of their Shannon counterparts, and properties of the former are recovered from first principle via the associated properties of the latter." I think if I stared at theorem 1 for a bit longer it would give me a new intuitive sense of what Renyi entropy is, but that's low priority right now...
information_theory
renyi_entropy
via:ded-maxim
in_NB
hypothesis_testing
december 2010 by cshalizi
The Magical Number Seven, , Plus or Minus Two: Some Limits on Our Capacity for Processing Information
december 2010 by cshalizi
Miller's classic paper, transcribed (rather ugly).
cognitive_science
psychology
information_theory
memory
to:NB
december 2010 by cshalizi
[0806.2552] Complexity Measures from Interaction Structures
november 2010 by cshalizi
"We evaluate new complexity measures on the symbolic dynamics of coupled tent maps and cellular automata. These measures quantify complexity in terms of $k$-th order statistical dependencies that cannot be reduced to interactions between $k-1$ units. We demonstrate that these measures are able to identify complex dynamical regimes."
complexity_measures
information_theory
information_geometry
ay.nihat
jost.jurgen
november 2010 by cshalizi
[1010.5720] Information-theoretic inference of common ancestors
november 2010 by cshalizi
"A directed acyclic graph (DAG) partially represents the conditional independence structure among observations of a system if the local Markov condition holds .... In general, there is a whole class of DAGs that represents a given set of conditional independence relations... properties of this class that can be derived from observations of a subsystem only... we prove an information theoretic inequality that allows for the inference of common ancestors of observed parts in ... some unknown larger system.. a large amount of dependence in terms of mutual information among the observations implies the existence of a common ancestor that distributes this information... our result can be seen as a quantitative extension of Reichenbach's Principle of Common Cause... Our conclusions are valid also for non-probabilistic observations such as binary strings, since we state the proof for an axiomatized notion of mutual information that includes the stochastic as well as the algorithmic version."
graphical_models
information_theory
causal_inference
statistics
ay.nihat
have_read
algorithmic_information_theory
november 2010 by cshalizi
[1009.2302] The Predictive Lasso
september 2010 by cshalizi
"We propose a shrinkage procedure for simultaneous variable selection and estimation in generalized linear models (GLMs) with an explicit predictive motivation. The procedure estimates the coefficients by minimizing the Kullback-Leibler divergence of a set of predictive distributions to the corresponding predictive distributions for the full model, subject to an $l_1$ constraint on the coefficient vector. This results in selection of a parsimonious model with similar predictive performance to the full model. Thanks to its similar form to the original lasso problem for GLMs, our procedure can benefit from available $l_1$-regularization path algorithms. Simulation studies and real-data examples confirm the efficiency of our method in terms of predictive performance on future observations."
regression
lasso
variable_selection
sparsity
information_theory
statistics
september 2010 by cshalizi
"Partial Generalized Additive Models: An Information-Theoretic Approach for Dealing With Concurvity and Selecting Variables" (Gu, Kenny, Zhu, 2010)
september 2010 by cshalizi
"Scientists [want to know] which covariates are important, and how [they] affect the response variable, rather than just making predictions. ... Generalized additive models (GAMs) are a class of interpretable, multivariate nonparametric regression models which are very useful ... for these purposes, but concurvity among covariates (the nonlinear analogue of collinearity for linear regression) can ... produce unstable or even wrong estimates of the covariates’ functional effects. We develop a new procedure called partial generalized additive models (pGAM), based on mutual information ... Our procedure is similar in spirit to the Gram–Schmidt method for linear least squares. By building a GAM on a selected set of transformed variables, pGAM produces more stable models, selects variables parsimoniously, and provides insight into the nature of concurvity between the covariates by calculating functional dependencies among them. ... R code for fitting pGAMs is available online"
regression
additive_models
information_theory
variable_selection
statistics
september 2010 by cshalizi
Phys. Rev. E 79, 026201 (2009): Complexity measures from interaction structures
september 2010 by cshalizi
"We evaluate information-theoretic quantities that quantify complexity in terms of kth-order statistical dependences that cannot be reduced to interactions among k−1 random variables. Using symbolic dynamics of coupled maps and cellular automata as model systems, we demonstrate that these measures are able to identify complex dynamical regimes."
complexity_measures
information_theory
kith_and_kin
ay.nihat
jost.jurgen
to_read
re:stacs
september 2010 by cshalizi
related tags
additive_models ⊕ algorithmic_information_theory ⊕ anomaly_detection ⊕ approximation ⊕ arrow_of_time ⊕ artificial_life ⊕ automata_theory ⊕ autonomous_agents ⊕ autonomy ⊕ ay.nihat ⊕ badii.remo ⊕ bad_data_analysis ⊕ bad_science_journalism ⊕ balduzzi.david ⊕ bartlett.m.s. ⊕ bayesianism ⊕ bayesian_consistency ⊕ bayesian_nonparametrics ⊕ bergstrom.carl ⊕ bialek.william ⊕ binder.p.-m. ⊕ biochemical_networks ⊕ biological_computation ⊕ biological_organization ⊕ biology ⊕ blogged ⊕ books:noted ⊕ books:recommended ⊕ calls_for_papers ⊕ causality ⊕ causal_inference ⊕ cellular_automata ⊕ central_limit_theorem ⊕ cesa-bianchi.nicolo ⊕ chaos ⊕ chow-liu_trees ⊕ chuang.isaac ⊕ classifiers ⊕ clustering ⊕ cognitive_science ⊕ coleman.todd ⊕ collective_cognition ⊕ communication ⊕ community_discovery ⊕ complexity ⊕ complexity_measures ⊕ computational_complexity ⊕ concentration_of_measure ⊕ conferences ⊕ confidence_sets ⊕ control ⊕ control_theory ⊕ cybernetics ⊕ data_mining ⊕ debowski.lukasz ⊕ decision_theory ⊕ decision_trees ⊕ density_estimation ⊕ deviation_bounds ⊕ deviation_inequalities ⊕ differential_geometry ⊕ directed_information ⊕ dynamical_systems ⊕ eames.charles ⊕ eames.ray ⊕ econometrics ⊕ edge_of_chaos ⊕ entropy ⊕ entropy_estimation ⊕ entropy_rates ⊕ epistemology ⊕ ergodic_theory ⊕ estimation ⊕ evolutionary_biology ⊕ exponential_families ⊕ fallacies ⊕ falsification ⊕ feature_selection ⊕ feedback ⊕ filtering ⊕ fisher_information ⊕ fluid_mechanics ⊕ formal_languages ⊕ freedom_as_self-control ⊕ functional_connectivity ⊕ funny:geeky ⊕ galstyan.aram ⊕ gene_expression_data_analysis ⊕ gene_regulation ⊕ gibbs_distributions ⊕ gigs ⊕ granger_causality ⊕ graphical_models ⊕ gray.robert_m ⊕ grumble ⊕ grunwald.peter ⊕ haldane.j.b.s. ⊕ harrapan_civilization ⊕ have_read ⊕ heard_the_talk ⊕ hebbian_learning ⊕ hierarchical_structure ⊕ hinton.geoffrey ⊕ homeostasis ⊕ hypothesis_testing ⊕ identifiability ⊕ independence_testing ⊕ individual_sequence_prediction ⊕ induction ⊕ information_criteria ⊕ information_geometry ⊕ information_theory ⊖ in_NB ⊕ iterative_approximation ⊕ jost.jurgen ⊕ kantz.holger ⊕ kass.rob ⊕ kelly.kevin_t. ⊕ kith_and_kin ⊕ kontoyiannis.ioannis ⊕ landauers_principle ⊕ large_deviations ⊕ lasso ⊕ learning_in_games ⊕ learning_theory ⊕ liberman.mark ⊕ likelihood ⊕ linguistics ⊕ low-regret-learning ⊕ lugosi.gabor ⊕ lyapunov_exponents ⊕ machine_learning ⊕ marcus.leslie_f. ⊕ markov_models ⊕ martingales ⊕ massart.pascal ⊕ maximum_entropy ⊕ memory ⊕ method_of_types ⊕ meyn.sean ⊕ minimax ⊕ mixing ⊕ modeling ⊕ model_selection ⊕ movies ⊕ natural_selection ⊕ networks ⊕ network_data_analysis ⊕ neural_coding_and_decoding ⊕ neural_data_analysis ⊕ neural_networks ⊕ neuroscience ⊕ nielsen.michael ⊕ non-stationarity ⊕ nonparametrics ⊕ norton.john ⊕ number_theory ⊕ occams_razor ⊕ online_learning ⊕ optimization ⊕ path_dependence ⊕ pattern_formation ⊕ philosophy_of_science ⊕ physics_of_information ⊕ pictish ⊕ pinsker.m.s. ⊕ politi.antonio ⊕ prediction ⊕ prelov.v._v. ⊕ probability ⊕ psychology ⊕ quantum_computing ⊕ quantum_mechanics ⊕ raginsky.maxim ⊕ random_fields ⊕ re:almost_none ⊕ re:AoS_project ⊕ re:bayes_as_evol ⊕ re:functional_communities ⊕ re:smoothing_adjacency_matrices ⊕ re:social-networks-as-sensor-networks ⊕ re:spike_train_complexity ⊕ re:stacs ⊕ re:XV_for_mixing ⊕ regression ⊕ reichenbach.hans ⊕ renyi.alfred ⊕ renyi_entropy ⊕ replicator_dynamics ⊕ rissanen.jorma ⊕ ryabko.b._ya. ⊕ sejnowski.terrence ⊕ self-centered ⊕ self-organization ⊕ semantics ⊕ signal_transduction ⊕ simulation ⊕ singh.aarti ⊕ social_media ⊕ sparsity ⊕ spatial_statistics ⊕ spectral_gap ⊕ sporns.olaf ⊕ stability_of_learning ⊕ state_estimation ⊕ statistical_inference_for_stochastic_processes ⊕ statistical_mechanics ⊕ statistics ⊕ stochastic_differential_equations ⊕ stochastic_processes ⊕ sufficiency ⊕ superefficiency ⊕ symbolic_dynamics ⊕ systems_identification ⊕ teleology ⊕ teleonomy ⊕ theoretical_biology ⊕ thermodynamics ⊕ thermodynamic_formalism ⊕ the_continuing_crises ⊕ the_organism_as_an_adaptive_control_system ⊕ tibshirani.robert ⊕ time_series ⊕ to:blog ⊕ to:NB ⊕ to_be_shot_after_a_fair_trial ⊕ to_read ⊕ to_teach:complexity-and-inference ⊕ to_teach:data-mining ⊕ track_down_references ⊕ two-sample_tests ⊕ universal_prediction ⊕ van_der_meulen.e._c. ⊕ van_handel.ramon ⊕ variable_selection ⊕ via:arsyed ⊕ via:britta ⊕ via:coleman ⊕ via:ded-maxim ⊕ via:matthew_berryman ⊕ via:shivak ⊕ via:tozier ⊕ visual_display_of_quantitative_information ⊕ vu.vincent ⊕ why_oh_why_cant_we_have_a_better_academic_publishing_system ⊕ wiener.norbert ⊕ willett.rebecca ⊕ yu.bin ⊕Copy this bookmark: