mraginsky + information-theory 181
A Behavioral Defense of Rational Expectations
11 days ago by mraginsky
This paper studies decision making by agents who value optimism, but are unsure of their environment. As in Brunnermeier and Parker (2005), an agent’s optimism is assumed to be tempered by the decision costs it imposes. As in Hansen and Sargent (2008), an agent’s uncertainty about his environment leads him to formulate ‘robust’ decision rules. It is shown that when combined, these two considerations can lead agents to adhere to the Rational Expectations Hypothesis. Rather than being the outcome of the sophisticated statistical calculations of an impassive expected utility maximizer, Rational Expectations can instead be viewed as a useful approximation in environments where agents struggle to strike a balance between doubt and hope.
papers
to-read
rationality
economics
robust_control
decision-making
information-theory
decision-theory
11 days ago by mraginsky
[1002.1300] Architecture for communication with a fidelity criterion in unknown networks
23 days ago by mraginsky
We prove that in order to communicate independent sources (this is the unicast problem) between various users over an unknown medium to within various distortion levels, it is sufficient to consider source-channel separation based architectures: architectures which first compress the sources to within the corresponding distortion levels followed by reliable communication over the unknown medium. We are reducing the problem of universal rate-distortion communication of independent sources over a network to the universal reliable communication problem over networks. This is a reductionist view. We are not solving the reliable communication problem in networks.
papers
to-read
networks
information-theory
information-structures
23 days ago by mraginsky
[1205.0858] Controlled Sensing for Multihypothesis Testing
24 days ago by mraginsky
"The problem of multiple hypothesis testing with observation control is considered in both fixed sample size and sequential settings. In the fixed sample size setting, for binary hypothesis testing, it is shown that the optimal exponent for the maximal error probability corresponds to the maximum Chernoff information over the choice of controls. It is also shown that a pure stationary open-loop control policy is asymptotically optimal within the larger class of all causal control policies. For multihypothesis testing in the fixed sample size setting, lower and upper bounds on the optimal error exponent are derived. It is also shown through an example with three hypotheses that the optimal causal control policy can be strictly better than the optimal open-loop control policy. In the sequential setting, a test based on earlier work by Chernoff for binary hypothesis testing, is shown to be first-order asymptotically optimal for multihypothesis testing for vanishing error probabilities. The role of past information and randomization in designing optimal control policies is discussed."
papers
to-read
information-theory
decision-making
feedback
24 days ago by mraginsky
Fluctuations and response out of equilibrium (C. Maes)
4 weeks ago by mraginsky
"We discuss some recently visited positions towards dealing with nonequilibria from the
mathematical point of view of Markov networks."
papers
to-read
statistical-physics
thermodynamics
probability
information-theory
mathematical point of view of Markov networks."
4 weeks ago by mraginsky
[1107.1222] On the information-theoretic structure of distributed measurements
5 weeks ago by mraginsky
The internal structure of a measuring device, which depends on what its components are and how they are organized, determines how it categorizes its inputs. This paper presents a geometric approach to studying the internal structure of measurements performed by distributed systems such as probabilistic cellular automata. It constructs the quale, a family of sections of a suitably defined presheaf, whose elements correspond to the measurements performed by all subsystems of a distributed system. Using the quale we quantify (i) the information generated by a measurement; (ii) the extent to which a measurement is context-dependent; and (iii) whether a measurement is decomposable into independent submeasurements, which turns out to be equivalent to context-dependence. Finally, we show that only indecomposable measurements are more informative than the sum of their submeasurements.
papers
to-read
dynamical-systems
information-theory
complex-systems
distributed-systems
5 weeks ago by mraginsky
[0704.0704] Entropic Measure and Wasserstein Diffusion
5 weeks ago by mraginsky
We construct a new random probability measure on the sphere and on the unit interval which in both cases has a Gibbs structure with the relative entropy functional as Hamiltonian. It satisfies a quasi-invariance formula with respect to the action of smooth diffeomorphism of the sphere and the interval respectively. The associated integration by parts formula is used to construct two classes of diffusion processes on probability measures (on the sphere or the unit interval) by Dirichlet form methods. The first one is closely related to Malliavin's Brownian motion on the homeomorphism group. The second one is a probability valued stochastic perturbation of the heat flow, whose intrinsic metric is the quadratic Wasserstein distance. It may be regarded as the canonical diffusion process on the Wasserstein space.
papers
to-read
optimal-transportation
information-theory
probability
re:FPF_project
5 weeks ago by mraginsky
[1204.2980] Realizable Rate Distortion Function and Bayesian FIltering Theory
6 weeks ago by mraginsky
The relation between rate distortion function (RDF) and Bayesian filtering theory is discussed. The relation is established by imposing a causal or realizability constraint on the reconstruction conditional distribution of the RDF, leading to the definition of a causal RDF. Existence of the optimal reconstruction distribution of the causal RDF is shown using the topology of weak convergence of probability measures. The optimal non-stationary causal reproduction conditional distribution of the causal RDF is derived in closed form; it is given by a set of recursive equations which are computed backward in time. The realization of causal RDF is described via the source-channel matching approach, while an example is briefly discussed to illustrate the concepts.
papers
to-read
information-theory
rate-distortion
filtering
bayesian
estimation
6 weeks ago by mraginsky
[1109.5417] Finite block length converse results for channel coding and assistance by non-signalling correlations
6 weeks ago by mraginsky
"A non-signalling assisted code is a block code in which the sender and receiver can be correlated in any way that would not by itself allow signalling between them. The maximum rate that can be achieved by such a code for a given block length and error probability is formulated as a linear program. This provides a finite block length converse for classical codes which we show is identical to one derived by Polyanskiy, Poor and Verd'{u}. This shows that the many classical asymptotic and finite converse results which can be derived from this converse apply equally to entanglement assisted codes which have been recently studied. It also gives an explicit linear programming formulation of the converse which has some useful features."
papers
to-read
information-theory
channel-coding
6 weeks ago by mraginsky
[1203.6502] Quantifying causal influences
9 weeks ago by mraginsky
"Common methods of causal inference generate directed acyclic graphs (DAGs) that formalize causal relations between n variables. Given the joint distribution of all these variables, the DAG contains all information about how intervening on one variable would change the distribution of the other n-1 variables. It remains, however, a non-trivial question how to quantify the causal influence of one variable on another one.
Here we propose a measure for causal strength that refers to direct effects and measure the "strength of an arrow" or a set of arrows. It is based on a hypothetical intervention that modifies the joint distribution by cutting the corresponding edge. The causal strength is then the relative entropy distance between the old and the new distribution.
We discuss other measures of causal strength like the average causal effect, transfer entropy and information flow and describe their limitations. We argue that our measure is also more appropriate for time series than the known ones.
Finally, we discuss conceptual problems in defining the strength of indirect effects."
There is no mention of directed information, and yet their measure of causal strength seems to be closely related to it.
papers
to-read
causality
information-theory
feedback-information-theory
Here we propose a measure for causal strength that refers to direct effects and measure the "strength of an arrow" or a set of arrows. It is based on a hypothetical intervention that modifies the joint distribution by cutting the corresponding edge. The causal strength is then the relative entropy distance between the old and the new distribution.
We discuss other measures of causal strength like the average causal effect, transfer entropy and information flow and describe their limitations. We argue that our measure is also more appropriate for time series than the known ones.
Finally, we discuss conceptual problems in defining the strength of indirect effects."
There is no mention of directed information, and yet their measure of causal strength seems to be closely related to it.
9 weeks ago by mraginsky
Majorizing codes and measures (Andreas Maurer)
9 weeks ago by mraginsky
An information theoretical interpretation of majorizing and minorizing
measures is given. The expression logarithmic in the reciprocal of the
measure of a ball is replaced by the number of bits needed to achieve
desired precision in some convergent code. We also give a local version of
the majorizing bound.
papers
to-read
measure-concentration
probability
information-theory
measures is given. The expression logarithmic in the reciprocal of the
measure of a ball is replaced by the number of bits needed to achieve
desired precision in some convergent code. We also give a local version of
the majorizing bound.
9 weeks ago by mraginsky
[0907.4491] Bounding relative entropy by the relative entropy of local specifications in product spaces
9 weeks ago by mraginsky
For a class of density functions $q^n(x^n)$ on $Bbb R^n$ we prove an inequality between relative entropy and the sum of average conditional relative entropies of the following form: For any density function $p^n(x^n)$ on $Bbb R^n$, $D(p^n||q^n)leq Const. sum_{i=1}^n Bbb E D(p_i(cdot|Y_1,..., Y_{i-1},Y_{i+1},..., Y_n) || Q_i(cdot|Y_1,..., Y_{i-1},Y_{i+1},..., Y_n)),$ where $p_i(cdot|y_1,..., y_{i-1},y_{i+1},..., y_n)$ and $Q_i(cdot|x_1,..., x_{i-1},x_{i+1},..., x_n)$ denote the local specifications for $p^n$ resp. $q^n$, i.e., the conditional density functions of the $i$'th coordinate, given the other coordinates. The constant depends on the properties of the local specifications of $q^n$. The above inequality implies a logarithmic Sobolev inequality for $q^n$. We get an explicit lower bound for the logarithmic Sobolev constant of $q^n$ under the assumptions that: (i) the local specifications of $q^n$ satisfy logarithmic Sobolev inequalities with constants $rho_i$, and (ii) they also satisfy some condition expressing that the mixed partial derivatives of the Hamiltonian of $q^n$ are not too large relative to the logarithmic Sobolev constants $rho_i$. Condition (ii) may be weaker than that used in Otto and Reznikoff's recent paper on the estimation of logarithmic Sobolev constants of spin systems.
papers
to-read
measure-concentration
information-theory
probability
re:erasures_and_concentration_project
9 weeks ago by mraginsky
[1203.4626] Active Sequential Hypothesis Testing
10 weeks ago by mraginsky
Consider a decision maker who is responsible to dynamically collect observations so as to enhance his information in a speedy manner about an underlying phenomena of interest while accounting for the penalty of wrong declaration. The special cases of the problem are shown to be that of variable-length coding with feedback and noisy dynamic search. Due to the sequential nature of the problem, the decision maker relies on his current information state to adaptively select the most "informative" sensing action among the available ones.
In this paper, using results in dynamic programming, a lower bound for the optimal total cost is established. Moreover, upper bounds are obtained via an analysis of heuristic policies for dynamic selection of actions. It is shown that the proposed heuristics achieve asymptotic optimality in many practically relevant problems including the problems of variable-length coding with feedback and noisy dynamic search; where asymptotic optimality implies that the relative difference between the total cost achieved by the proposed policies and the optimal total cost approaches zero as the penalty of wrong declaration or the number of hypotheses (hence the number of collected samples) increases. Furthermore, using the obtained bounds, the gain of adaptive selection of sensing actions is shown to be at least logarithmic in the penalty associated with wrong declarations.
papers
to-read
control-theory
information-theory
adaptive-systems
heard-the-talk
In this paper, using results in dynamic programming, a lower bound for the optimal total cost is established. Moreover, upper bounds are obtained via an analysis of heuristic policies for dynamic selection of actions. It is shown that the proposed heuristics achieve asymptotic optimality in many practically relevant problems including the problems of variable-length coding with feedback and noisy dynamic search; where asymptotic optimality implies that the relative difference between the total cost achieved by the proposed policies and the optimal total cost approaches zero as the penalty of wrong declaration or the number of hypotheses (hence the number of collected samples) increases. Furthermore, using the obtained bounds, the gain of adaptive selection of sensing actions is shown to be at least logarithmic in the penalty associated with wrong declarations.
10 weeks ago by mraginsky
[1201.2999] Spatially Coupled Ensembles Universally Achieve Capacity under Belief Propagation
february 2012 by mraginsky
We investigate spatially coupled code ensembles. For transmission over the binary erasure channel, it was recently shown that spatial coupling increases the belief propagation threshold of the ensemble to essentially the maximum a-priori threshold of the underlying component ensemble. This explains why convolutional LDPC ensembles, originally introduced by Felstrom and Zigangirov, perform so well over this channel. We show that the equivalent result holds true for transmission over general binary-input memoryless output-symmetric channels. More precisely, given a desired error probability and a gap to capacity, we can construct a spatially coupled ensemble which fulfills these constraints universally on this class of channels under belief propagation decoding. In fact, most codes in that ensemble have that property. The quantifier universal refers to the single ensemble/code which is good for all channels but we assume that the channel is known at the receiver. The key technical result is a proof that under belief propagation decoding spatially coupled ensembles achieve essentially the area threshold of the underlying uncoupled ensemble. We conclude by discussing some interesting open problems.
papers
to-read
information-theory
channel-coding
february 2012 by mraginsky
[1007.1033] A Theory of Network Equivalence, Parts I and II
february 2012 by mraginsky
A family of equivalence tools for bounding network capacities is introduced. Part I treats networks of point-to-point channels. The main result is roughly as follows. Given a network of noisy, independent, memoryless point-to-point channels, a collection of communication demands can be met on the given network if and only if it can be met on another network where each noisy channel is replaced by a noiseless bit pipe with throughput equal to the noisy channel capacity. This result was known previously for the case of a single-source multicast demand. The result given here treats general demands -- including, for example, multiple unicast demands -- and applies even when the achievable rate region for the corresponding demands is unknown in the noiseless network. In part II, definitions of upper and lower bounding channel models for general channels are introduced. By these definitions, a collection of communication demands can be met on a network of independent channels if it can be met on a network where each channel is replaced by its lower bounding model andonly if it can be met on a network where each channel is replaced by its upper bounding model. This work derives general conditions under which a network of noiseless bit pipes is an upper or lower bounding model for a multiterminal channel. Example upper and lower bounding models for broadcast, multiple access, and interference channels are given. It is then shown that bounding the difference between the upper and lower bounding models for a given channel yields bounds on the accuracy of network capacity bounds derived using those models. By bounding the capacity of a network of independent noisy channels by the network coding capacity of a network of noiseless bit pipes, this approach represents one step towards the goal of building computational tools for bounding network capacities.
papers
to-read
information-theory
networks
multiagent-systems
heard-the-talk
february 2012 by mraginsky
[1111.1977] On Refined Versions of the Azuma-Hoeffding Inequality with Applications in Information Theory
january 2012 by mraginsky
This paper derives some refined versions of the Azuma-Hoeffding inequality for discrete-parameter martingales with uniformly bounded jumps, and it considers some of their potential applications in information theory and related topics. The first part of this paper derives these refined inequalities, followed by a discussion on their relations to some classical results in probability theory. It also considers a geometric interpretation of some of these inequalities, providing an insight on the inter-connections between them. The second part exemplifies the use of these refined inequalities in the context of hypothesis testing, information theory, and communication. The paper is concluded with a discussion on some directions for further research. This work is meant to stimulate the use of some refined versions of the Azuma-Hoeffding inequality in information-theoretic aspects.
papers
to-read
information-theory
measure-concentration
martingales
markov-chains
probability
january 2012 by mraginsky
Concentration Inequalities for the Missing Mass and for Histogram Rule Error (McAllester and Ortiz)
december 2011 by mraginsky
This paper gives distribution-free concentration inequalities for the missing mass and the error rate of histogram rules. Negative association methods can be used to reduce these concentration problems to concentration questions about independent sums. Although the sums are independent, they are highly heterogeneous. Such highly heterogeneous independent sums cannot be analyzed using standard concentration inequalities such as Hoeffding's inequality, the Angluin-Valiant bound, Bernstein's inequality, Bennett's inequality, or McDiarmid's theorem. The concentration inequality for histogram rule error is motivated by the desire to construct a new class of bounds on the generalization error of decision trees.
papers
to-read
learning-theory
measure-concentration
information-theory
probability
re:erasures_and_concentration_project
december 2011 by mraginsky
[1112.0708] Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing
december 2011 by mraginsky
We study the compressed sensing reconstruction problem for a broad class of random, band-diagonal sensing matrices. This construction is inspired by the idea of spatial coupling in coding theory. As demonstrated heuristically and numerically by Krzakala et al. cite{KrzakalaEtAl}, message passing algorithms can effectively solve the reconstruction problem for spatially coupled measurements with undersampling rates close to the fraction of non-zero coordinates.
We use an approximate message passing (AMP) algorithm and analyze it through the state evolution method. We give a rigorous proof that this approach is successful as soon as the undersampling rate $delta$ exceeds the (upper) R'enyi information dimension of the signal, $uRenyi(p_X)$. More precisely, for a sequence of signals of diverging dimension $n$ whose empirical distribution converges to $p_X$, reconstruction is with high probability successful from $uRenyi(p_X), n+o(n)$ measurements taken according to a band diagonal matrix.
For sparse signals, i.e. sequences of dimension $n$ and $k(n)$ non-zero entries, this implies reconstruction from $k(n)+o(n)$ measurements. For `discrete' signals, i.e. signals whose coordinates take a fixed finite set of values, this implies reconstruction from $o(n)$ measurements. The result is robust with respect to noise, does not apply uniquely to random signals, but requires the knowledge of the empirical distribution of the signal $p_X$.
papers
to-read
compressed-sensing
information-theory
message-passing
sparsity
analog-information-theory
Renyi-dimension
We use an approximate message passing (AMP) algorithm and analyze it through the state evolution method. We give a rigorous proof that this approach is successful as soon as the undersampling rate $delta$ exceeds the (upper) R'enyi information dimension of the signal, $uRenyi(p_X)$. More precisely, for a sequence of signals of diverging dimension $n$ whose empirical distribution converges to $p_X$, reconstruction is with high probability successful from $uRenyi(p_X), n+o(n)$ measurements taken according to a band diagonal matrix.
For sparse signals, i.e. sequences of dimension $n$ and $k(n)$ non-zero entries, this implies reconstruction from $k(n)+o(n)$ measurements. For `discrete' signals, i.e. signals whose coordinates take a fixed finite set of values, this implies reconstruction from $o(n)$ measurements. The result is robust with respect to noise, does not apply uniquely to random signals, but requires the knowledge of the empirical distribution of the signal $p_X$.
december 2011 by mraginsky
[1111.6822] Optimal Phase Transitions in Compressed Sensing
november 2011 by mraginsky
Compressed sensing deals with efficient recovery of analog signals from linear measurements. This paper presents a statistical study of compressed sensing by modeling the input signal as random processes. Three classes of encoders are considered, namely optimal nonlinear, optimal linear and random linear encoders. Focusing on optimal decoders, we investigate the fundamental tradeoff between measurement rate and reconstruction fidelity gauged by error probability and noise sensitivity in the absence and presence of measurement noise, respectively. Optimal phase transition thresholds are determined as a functional of the input distribution and compared to suboptimal thresholds achieved by various popular reconstruction algorithms. In particular, we show that Gaussian sensing matrices incur no penalty on the phase transition threshold with respect to optimal nonlinear encoding. Our results also provide a rigorous justification of previous results based on replica heuristics in the weak-noise regime.
papers
to-read
information-theory
compressed-sensing
signal-processing
sparsity
november 2011 by mraginsky
[1110.6654] Pointwise Relations between Information and Estimation
november 2011 by mraginsky
Many of the classical and recent relations between information and estimation in the presence of Gaussian noise can be viewed as identities between expectations of random quantities. These include the I-MMSE relationship of Guo et al.; the relative entropy and mismatched estimation relationship of Verd\'{u}; the relationship between causal estimation and mutual information of Duncan, and its extension to the presence of feedback by Kadota et al.; the relationship between causal and non-casual estimation of Guo et al., and its mismatched version of Weissman. We dispense with the expectations and explore the nature of the pointwise relations between the respective random quantities.
papers
to-read
information-theory
estimation
filtering
november 2011 by mraginsky
WRITE OR RADIATE? Inscribed Matter vs. Electromagnetic Communication
august 2011 by mraginsky
"We consider information flow via physical transport of inscribed media through space and compare it to information flow via electromagnetic radiation. Somewhat counterintuitively for point to point links, physical transport of inscribed mass is often energetically more efficient by many orders of magnitude than electromagnetic broadcast. And perhaps more surprising, even in a broadcast setting (depending on the receiver density) inscribed mass transport is still energetically more efficient. We discuss the implications of these results for terrestrial telecommunications networks as well as point to point and broadcast communication over great distances with loose delay constraints."
papers
to-read
communication
physics
speculation
information-theory
filetype:pdf
media:document
august 2011 by mraginsky
[1104.5246] How well can we estimate a sparse vector?
april 2011 by mraginsky
"The estimation of a sparse vector in the linear model is a fundamental problem in signal processing, statistics, and compressive sensing. This paper establishes a lower bound on the mean-squared error, which holds regardless of the sensing/design matrix being used and regardless of the estimation procedure. This lower bound very nearly matches the known upper bound one gets by taking a random projection of the sparse vector followed by an $\ell_1$ estimation procedure such as the Dantzig selector. In this sense, compressive sensing techniques cannot essentially be improved."
papers
compressed-sensing
estimation
information-theory
statistics
have-read
april 2011 by mraginsky
Thermodynamics and Concentration
march 2011 by mraginsky
"We show that the thermal subadditivity of entropy provides a common basis to derive a strong form of the bounded dierence inequality and related results as well as more recent inequalities applicable to convex Lipschitz functions, random symmetric matrices, shortest travelling sales-men paths and weakly self-bounding functions. We also give two new concentration inequalities."
papers
to-read
measure-concentration
information-theory
probability
statistical-physics
statistical-learning
via:shivak
filetype:pdf
media:document
march 2011 by mraginsky
[1103.1861] Distinguishing and integrating aleatoric and epistemic variation in uncertainty quantification
march 2011 by mraginsky
"Much of uncertainty quantification to date has focused on determining the effect of variables modeled probabilistically, and with a known distribution, on some physical or engineering system. We develop methods to obtain information on the system when the distributions of some variables are known exactly, others are known only approximately, and perhaps others are not modeled as random variables at all. The main tool used is the duality between risk-sensitive integrals and relative entropy, and we obtain explicit bounds on standard performance measures (variances, exceedance probabilities) over families of distributions whose distance from a nominal distribution is measured by relative entropy. The evaluation of the risk-sensitive expectations is based on polynomial chaos expansions, which help keep the computational aspects tractable."
papers
probability
information-theory
re:knightian-uncertainty
have-read
has:re
march 2011 by mraginsky
[1102.3944] Fixed-length lossy compression in the finite blocklength regime
february 2011 by mraginsky
"This paper studies the minimum achievable source coding rate as a function of blocklength $n$ and tolerable distortion level $d$. Tight general achievability and converse bounds are derived that hold at arbitrary fixed blocklength. For stationary memoryless sources with separable distortion, the minimum rate achievable is shown to be closely approximated by $R(d) + \sqrt{\frac{V(d)}{n}} \Qinv{\epsilon}$, where $R(d)$ is the rate-distortion function, $V(d)$ is the {\it rate dispersion}, a characteristic of the source which measures its stochastic variability, $\Qinv{\cdot}$ is the inverse of the standard Gaussian complementary cdf, and $\epsilon$ is the probability that the distortion exceeds $d$."
papers
information-theory
rate-distortion
source-coding
have-read
february 2011 by mraginsky
[1102.0710] Universal Prior Prediction for Communication
february 2011 by mraginsky
"We consider the problem of communicating over an unknown and arbitrarily varying channel, using feedback. This paper focuses on the problem of determining the input behavior, or more specifically, a prior which is used to randomly generate a codebook. We pose the problem of setting the prior as a universal sequential prediction problem using information theoretic abstractions of the communication channel. For the case where the channel is block-wise constant, we show it is possible to asymptotically approach the best rate that can be attained by any system using a fixed prior. For the case where the channel may change on each symbol, we combine a rateless coding scheme with a prior predictor and asymptotically approach the capacity of the average channel universally for every sequence of channels."
papers
to-read
information-theory
feedback-information-theory
prediction
online-learning
february 2011 by mraginsky
[1012.5457] Concentration of the information in data with log-concave distributions
january 2011 by mraginsky
A concentration property of the functional $-\log f(X)$ is demonstrated, when a random vector $X$ has a log-concave density $f$ on $\R^n$. This concentration property implies in particular an extension of the Shannon-McMillan-Breiman strong ergodic theorem to the class of discrete-time stochastic processes with log-concave marginals.
papers
to-read
information-theory
probability
measure-concentration
january 2011 by mraginsky
[1011.6451] Informational derivation of Quantum Theory
december 2010 by mraginsky
"Quantum theory can be derived from purely informational principles. Five elementary axioms-causality, perfect distinguishability, ideal compression, local distinguishability, and pure conditioning-define a broad class of theories of information-processing that can be regarded as a standard. One postulate-purification-singles out quantum theory within this class. The main structures of quantum theory, such as the representation of mixed states as convex combinations of perfectly distinguishable pure states, are derived directly from the principles without using the Hilbert space framework."
papers
to-read
quantum-mechanics
information-theory
probability
philosophy-of-science
december 2010 by mraginsky
[1012.0602] LDPC Codes for Compressed Sensing
december 2010 by mraginsky
"We present a mathematical connection between channel coding and compressed sensing. In particular, we link, on the one hand, \emph{channel coding linear programming decoding (CC-LPD)}, which is a well-known relaxation of maximum-likelihood channel decoding for binary linear codes, and, on the other hand, \emph{compressed sensing linear programming decoding (CS-LPD)}, also known as basis pursuit, which is a widely used linear programming relaxation for the problem of finding the sparsest solution of an under-determined system of linear equations."
papers
to-read
compressed-sensing
channel-coding
information-theory
december 2010 by mraginsky
Algorithmic Thermodynamics « Azimuth
october 2010 by mraginsky
John Baez blogs about his paper with Mike Stay on algorithmic thermodynamics
have-read
blogs
information-theory
complexity
computation
thermodynamics
october 2010 by mraginsky
[1009.3958] Approximate Inference and Stochastic Optimal Control
september 2010 by mraginsky
"We propose a novel reformulation of the stochastic optimal control problem as an approximate inference problem, demonstrating, that such a interpretation leads to new practical methods for the original problem. In particular we characterise a novel class of iterative solutions to the stochastic optimal control problem based on a natural relaxation of the exact dual formulation. These theoretical insights are applied to the Reinforcement Learning problem where they lead to new model free, off policy methods for discrete and continuous problems."
papers
to-read
control-theory
reinforcement-learning
optimization
information-theory
graphical-models
september 2010 by mraginsky
[1009.0679] Optimal Uncertainty Quantification
september 2010 by mraginsky
Hmm: "We propose a rigorous framework for Uncertainty Quantification (UQ) in which the UQ objectives and the assumptions/information set are brought to the forefront. This framework, which we call \emph{Optimal Uncertainty Quantification} (OUQ), is based on the observation that, given a set of assumptions and information about the problem, there exist optimal bounds on uncertainties: these are obtained as extreme values of well-defined optimization problems corresponding to extremizing probabilities of failure, or of deviations, subject to the constraints imposed by the scenarios compatible with the assumptions and information. In particular, this framework does not implicitly impose inappropriate assumptions, nor does it repudiate relevant information. ..."
papers
to-read
information-theory
decision-making
optimization
september 2010 by mraginsky
The Compression Lemma ← Inductio Ex Machina
august 2010 by mraginsky
ETA: The lemma discussed in the blog entry is quite well-known in information theory (cf. Theorem 5.2.1 in Gray's Entropy and Information Theory).
blogs
information-theory
bayesian
divergence
august 2010 by mraginsky
[1008.2471] Projection Pursuit through Relative Entropy Minimization
august 2010 by mraginsky
"Projection Pursuit methodology permits to solve the difficult problem of finding an estimate of a density defined on a set of very large dimension. In his seminal article, Huber (see "Projection pursuit", Annals of Statistics, 1985) evidences the interest of the Projection Pursuit method thanks to the factorisation of a density into a Gaussian component and some residual density in a context of Kullback-Leibler divergence maximisation. In the present article, we introduce a new algorithm, and in particular a test for the factorisation of a density estimated from an iid sample."
papers
to-read
statistics
information-theory
divergence
august 2010 by mraginsky
Carl T. Bergstrom - University of Washington
august 2010 by mraginsky
"Information in biological systems. How do living organisms acquire, store, and make use of information? How and why does communication evolve? How does information flow through biological or social networks?"
people
homepages
research
information-theory
biology
theoretical-biology
communication
social-networks
august 2010 by mraginsky
[1008.2008] Rate-Constrained Simulation and Source Coding IID Sources
august 2010 by mraginsky
"Necessary conditions for asymptotically optimal sliding-block or stationary codes for source coding and rate-constrained simulation of memoryless sources are presented and used to motivate a design technique for trellis-encoded source coding and rate-constrained simulation. The code structure has intuitive similarities to classic random coding arguments as well as to ``fake process'' methods and alphabet-constrained methods. Experimental evidence shows that the approach provides comparable or superior performance in comparison with previously published methods on common examples, sometimes by significant margins."
papers
to-read
information-theory
rate-distortion
source-coding
simulation
august 2010 by mraginsky
[1007.5354] Synchronization and Control in Intrinsic and Designed Computation: An Information-Theoretic Analysis of Competing Models of Stochastic Computation
august 2010 by mraginsky
"We adapt tools from information theory to analyze how an observer comes to synchronize with the hidden states of a finitary, stationary stochastic process. We show that synchronization is determined by both the process's internal organization and by an observer's model of it. We analyze these components using the convergence of state-block and block-state entropies, comparing them to the previously known convergence properties of the Shannon block entropy. Along the way, we introduce a hierarchy of information quantifiers as derivatives and integrals of these entropies, which parallels a similar hierarchy introduced for block entropy. We also draw out the duality between synchronization properties and a process's controllability. The tools lead to a new classification of a process's alternative representations in terms of minimality, synchronizability, and unifilarity."
papers
to-read
information-theory
control-theory
complexity
computation
august 2010 by mraginsky
[1006.3780] Least Squares Superposition Codes of Moderate Dictionary Size, Reliable at Rates up to Capacity
june 2010 by mraginsky
Nice: "For the additive white Gaussian noise channel with average codeword power constraint, new coding methods are devised in which the codewords are sparse superpositions, that is, linear combinations of subsets of vectors from a given design, with the possible messages indexed by the choice of subset. Decoding is by least squares, tailored to the assumed form of linear combination. Communication is shown to be reliable with error probability exponentially small for all rates up to the Shannon capacity."
papers
to-read
information-theory
communication
sparsity
statistics
june 2010 by mraginsky
[1003.1377] Entropy: The Markov Ordering Approach
june 2010 by mraginsky
"The focus of this article is on entropy and Markov processes. We study the properties of functionals which are invariant with respect to monotonic transformations and analyze two invariant "additivity" properties: (i) existence of a monotonic transformation which makes the functional additive with respect to the joining of independent systems and (ii) existence of a monotonic transformation which makes the functional additive with respect to the partitioning of the space of states. All Lyapunov functionals for Markov chains which have properties (i) and (ii) are derived. We describe the most general ordering of the distribution space, with respect to which all continuous-time Markov processes are monotonic (the {\em Markov order}). The solution differs significantly from the ordering given by the inequality of entropy growth. For inference, this approach results in a convex compact set of conditionally "most random" distributions."
papers
have-read
statistics
probability
majorization
information-theory
june 2010 by mraginsky
[1001.4448] R'enyi Divergence and Majorization
june 2010 by mraginsky
"R'enyi divergence is related to R'enyi entropy much like information divergence (also called Kullback-Leibler divergence or relative entropy) is related to Shannon's entropy, and comes up in many settings. It was introduced by R'enyi as a measure of information that satisfies almost the same axioms as information divergence. We review the most important properties of R'enyi divergence, including its relation to some other distances. We show how R'enyi divergence appears when the theory of majorization is generalized from the finite to the continuous setting. Finally, R'enyi divergence plays a role in analyzing the number of binary questions required to guess the values of a sequence of random variables."
papers
to-read
information-theory
statistics
majorization
heard-the-talk
june 2010 by mraginsky
[1002.0042] Lower bounds for the minimax risk using $f$-divergences and applications
june 2010 by mraginsky
Very nice: "A new lower bound involving $f$-divergences between the underlying probability measures is proved for the minimax risk in estimation problems. The proof just uses the convexity of the function $f$ and is extremely simple. Special cases and straightforward corollaries of our bound include well known inequalities for establishing minimax lower bounds such as Fano's inequality, Pinsker's inequality and inequalities based on global entropy conditions. Two applications are provided: a new minimax lower bound for the reconstruction of convex bodies from noisy support function measurements and a different proof of a recent minimax lower bound for the estimation of a covariance matrix."
papers
have-read
information-theory
statistics
heard-the-talk
june 2010 by mraginsky
[1004.0557] Applications of Lindeberg Principle in Communications and Statistical Learning
april 2010 by mraginsky
"We use a generalization of the Lindeberg principle developed by Sourav Chatterjee to prove universality properties for various problems in communications, statistical learning and random matrix theory. We also show that these systems can be viewed as the limiting case of a properly defined sparse system. The latter result is useful when the sparse systems are easier to analyze than their dense counterparts. The list of problems we consider is by no means exhaustive. We believe that the ideas can be used in many other problems relevant for information theory."
paper
to-read
random-matrices
probability
machine-learning
signal-processing
information-theory
geometric-functional-analysis
april 2010 by mraginsky
"Pictish writing?" (Language Log)
april 2010 by mraginsky
Yet another EPIC FAIL: "The "new research" is described in Rob Lee, Philip Jonathan, and Pauline Ziman, "Pictish symbols revealed as a written language through application of Shannon entropy", Proceedings of the Royal Society A, in press." Make it stop, make it stop! Shannon fhtagn!
linguistics
information-theory
via:cshalizi
april 2010 by mraginsky
related tags
active-learning ⊕ adaptive-systems ⊕ ai ⊕ algorithms ⊕ analog-information-theory ⊕ analysis ⊕ bayesian ⊕ biology ⊕ blogs ⊕ books ⊕ causality ⊕ channel-coding ⊕ coding-theory ⊕ cognition ⊕ combinatorics ⊕ communication ⊕ communications ⊕ complex-systems ⊕ complexity ⊕ compressed-sensing ⊕ computation ⊕ computer-science ⊕ conferences ⊕ control-theory ⊕ convex-programming ⊕ crypto ⊕ cybernetics ⊕ data-mining ⊕ decentralized-control ⊕ decision-making ⊕ decision-theory ⊕ distributed-computing ⊕ distributed-systems ⊕ divergence ⊕ dynamic-programming ⊕ dynamical-systems ⊕ economics ⊕ entropy-estimation ⊕ epsilon-entropy ⊕ ergodic-theory ⊕ estimation ⊕ evolution ⊕ experimental-design ⊕ feature-selection ⊕ feedback ⊕ feedback-information-theory ⊕ filetype:pdf ⊕ film ⊕ filtering ⊕ frequentist ⊕ game-theory ⊕ geekery ⊕ geometric-functional-analysis ⊕ graph-theory ⊕ graphical-models ⊕ group-testing ⊕ has:re ⊕ have-read ⊕ heard-the-talk ⊕ homepages ⊕ information-structures ⊕ information-theory ⊖ interesting ⊕ journals ⊕ learning-theory ⊕ lecture-notes ⊕ linguistics ⊕ machine-learning ⊕ majorization ⊕ markov-chains ⊕ martingales ⊕ mathematics ⊕ measure-concentration ⊕ media:document ⊕ message-passing ⊕ model-selection ⊕ multiagent-systems ⊕ networks ⊕ neuroscience ⊕ online-learning ⊕ optimal-search ⊕ optimal-transportation ⊕ optimization ⊕ paper ⊕ papers ⊕ people ⊕ perception ⊕ philosophy-of-science ⊕ physics ⊕ prediction ⊕ probability ⊕ psychology ⊕ quantization ⊕ quantum-mechanics ⊕ random-matrices ⊕ rate-distortion ⊕ rationality ⊕ re:active_feature_selection_project ⊕ re:causality_project ⊕ re:erasures_and_concentration_project ⊕ re:FPF_project ⊕ re:knightian-uncertainty ⊕ re:posterior_matching_project ⊕ re:typical_sequences_project ⊕ reference ⊕ reinforcement-learning ⊕ Renyi-dimension ⊕ research ⊕ reviews ⊕ robust_control ⊕ russian ⊕ security ⊕ sensor-networks ⊕ Shannon ⊕ signal-processing ⊕ simulation ⊕ slides ⊕ social-networks ⊕ software ⊕ source-coding ⊕ sparsity ⊕ speculation ⊕ statistical-learning ⊕ statistical-physics ⊕ statistics ⊕ stochastic-search ⊕ technology ⊕ theoretical-biology ⊕ thermodynamics ⊕ to-read ⊕ via:cshalizi ⊕ via:shivak ⊕Copy this bookmark: