mraginsky + to-read   372

JSTOR: The American Economic Review, Vol. 70, No. 3 (Jun., 1980), pp. 393-408
"On the impossibility of informationally efficient markets" (Grossman and Stiglitz)
papers  to-read  economics  via:cshalizi 
yesterday by mraginsky
The Complexity of Exchange - Axtell - 2005 - The Economic Journal - Wiley Online Library
"The computational complexity of two classes of market mechanisms is compared. First the Walrasian interpretation in which prices are centrally computed by an auctioneer. Recent results on the computational complexity are reviewed. The non-polynomial complexity of these algorithms makes Walrasian general equilibrium an implausible conception. Second, a decentralised picture of market processes is described, involving concurrent exchange within transient coalitions of agents. These processes feature price dispersion, yield allocations that are not in the core, modify the distribution of wealth, are always stable, but path-dependent. Replacing the Walrasian framing of markets requires substantial revision of conventional wisdom concerning markets."
papers  to-read  economics  decision-making  via:cshalizi 
yesterday by mraginsky
[1106.1401] Volatility of Power Grids under Real-Time Pricing
The paper proposes a framework for modeling and analysis of the dynamics of supply, demand, and clearing prices in power system with real-time retail pricing and information asymmetry. Real-time retail pricing is characterized by passing on the real-time wholesale electricity prices to the end consumers, and is shown to create a closed-loop feedback system between the physical layer and the market layer of the power system. In the absence of a carefully designed control law, such direct feedback between the two layers could increase volatility and lower the system's robustness to uncertainty in demand and generation. A new notion of generalized price-elasticity is introduced, and it is shown that price volatility can be characterized in terms of the system's maximal relative price elasticity, defined as the maximal ratio of the generalized price-elasticity of consumers to that of the producers. As this ratio increases, the system becomes more volatile, and eventually, unstable. As new demand response technologies and distributed storage increase the price-elasticity of demand, the architecture under examination is likely to lead to increased volatility and possibly instability. This highlights the need for assessing architecture systematically and in advance, in order to optimally strike the trade-offs between volatility, economic efficiency, and system reliability.
papers  to-read  economics  markets  electricity-markets 
3 days ago by mraginsky
Cognitive Democracy — Crooked Timber
"Over the last couple of years, Cosma Shalizi and I have been working together on various things, including, inter alia, the relationship between complex systems, democracy and the Internet. These are big unwieldy topics, and trying to think about them systematically is hard. Even so, we’ve gotten to the point where we at least feel ready to start throwing stuff at a wider audience, to get feedback on what works and what doesn’t. Here’s a paper we’re working on, which argues that we should (for some purposes at least), think of markets, hierarchy and democracy in terms of their capacity to solve complex collective problems, makes the case that democracy will on average do the job a lot better than the other two ways, and then looks at different forms of collective information processing on the Internet as experiments that democracies can learn from."
to-read  blogs  decision-making  institutions  markets  democracy  policy 
4 days ago by mraginsky
A Behavioral Defense of Rational Expectations
This paper studies decision making by agents who value optimism, but are unsure of their environment. As in Brunnermeier and Parker (2005), an agent’s optimism is assumed to be tempered by the decision costs it imposes. As in Hansen and Sargent (2008), an agent’s uncertainty about his environment leads him to formulate ‘robust’ decision rules. It is shown that when combined, these two considerations can lead agents to adhere to the Rational Expectations Hypothesis. Rather than being the outcome of the sophisticated statistical calculations of an impassive expected utility maximizer, Rational Expectations can instead be viewed as a useful approximation in environments where agents struggle to strike a balance between doubt and hope.
papers  to-read  rationality  economics  robust_control  decision-making  information-theory  decision-theory 
11 days ago by mraginsky
[1205.1099] From Knothe's rearrangement to Brenier's optimal transport map
"The Brenier optimal map and Knothe-Rosenblatt rearrangement are two instances of a transport map, that is to say a map sending one measure onto another. The main interest of the former is that it solves the Monge-Kantorovich optimal transport problem, while the latter is very easy to compute, being given by an explicit formula. A few years ago, Carlier, Galichon, and Santambrogio showed that the Knothe rearrangement could be seen as the limit of the Brenier map when the quadratic cost degenerates. In this paper, we prove that on the torus (to avoid boundary issues), when all the data are smooth, the evolution is also smooth, and is entirely determined by a PDE for the Kantorovich potential (which determines the map), with a subtle initial condition. The proof requires the use of the Nash-Moser inverse function theorem. This result generalizes the ode discovered by Carlier, Galichon, and Santambrogio when one measure is uniform and the other is discrete, and could pave to way to new numerical methods for optimal transportation."
papers  to-read  optimal-transportation  probability  PDEs 
23 days ago by mraginsky
[1002.1300] Architecture for communication with a fidelity criterion in unknown networks
We prove that in order to communicate independent sources (this is the unicast problem) between various users over an unknown medium to within various distortion levels, it is sufficient to consider source-channel separation based architectures: architectures which first compress the sources to within the corresponding distortion levels followed by reliable communication over the unknown medium. We are reducing the problem of universal rate-distortion communication of independent sources over a network to the universal reliable communication problem over networks. This is a reductionist view. We are not solving the reliable communication problem in networks.
papers  to-read  networks  information-theory  information-structures 
23 days ago by mraginsky
[1205.1005] Some Refinements of Large Deviation Tail Probabilities
"We study tail probabilities via some Gaussian approximations. Our results make refinements to large deviation theory. The proof builds on classical results by Bahadur and Rao. Binomial distributions and their tail probabilities are discussed in more detail."
papers  to-read  statistics  probability  large-deviations  measure-concentration  statistical-physics 
24 days ago by mraginsky
[1205.0858] Controlled Sensing for Multihypothesis Testing
"The problem of multiple hypothesis testing with observation control is considered in both fixed sample size and sequential settings. In the fixed sample size setting, for binary hypothesis testing, it is shown that the optimal exponent for the maximal error probability corresponds to the maximum Chernoff information over the choice of controls. It is also shown that a pure stationary open-loop control policy is asymptotically optimal within the larger class of all causal control policies. For multihypothesis testing in the fixed sample size setting, lower and upper bounds on the optimal error exponent are derived. It is also shown through an example with three hypotheses that the optimal causal control policy can be strictly better than the optimal open-loop control policy. In the sequential setting, a test based on earlier work by Chernoff for binary hypothesis testing, is shown to be first-order asymptotically optimal for multihypothesis testing for vanishing error probabilities. The role of past information and randomization in designing optimal control policies is discussed."
papers  to-read  information-theory  decision-making  feedback 
24 days ago by mraginsky
[1204.6265] Statistical inference for dynamical systems: a review
The topic of statistical inference for dynamical systems has been studied extensively across several fields. In this survey we focus on the problem of parameter estimation for non-linear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research.
papers  to-read  dynamical-systems  machine-learning  system-identification  ergodic-theory 
4 weeks ago by mraginsky
[1204.5721] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.
papers  to-read  bandit-problems  online-learning  control-theory  dynamic-programming  decision-making 
4 weeks ago by mraginsky
Fluctuations and response out of equilibrium (C. Maes)
"We discuss some recently visited positions towards dealing with nonequilibria from the
mathematical point of view of Markov networks."
papers  to-read  statistical-physics  thermodynamics  probability  information-theory 
4 weeks ago by mraginsky
[1107.1222] On the information-theoretic structure of distributed measurements
The internal structure of a measuring device, which depends on what its components are and how they are organized, determines how it categorizes its inputs. This paper presents a geometric approach to studying the internal structure of measurements performed by distributed systems such as probabilistic cellular automata. It constructs the quale, a family of sections of a suitably defined presheaf, whose elements correspond to the measurements performed by all subsystems of a distributed system. Using the quale we quantify (i) the information generated by a measurement; (ii) the extent to which a measurement is context-dependent; and (iii) whether a measurement is decomposable into independent submeasurements, which turns out to be equivalent to context-dependence. Finally, we show that only indecomposable measurements are more informative than the sum of their submeasurements.
papers  to-read  dynamical-systems  information-theory  complex-systems  distributed-systems 
5 weeks ago by mraginsky
[1111.2687] Ricci curvature of finite Markov chains via convexity of the entropy
We define and study a new notion of Ricci curvature that applies to Markov chains on discrete spaces. This notion relies on geodesic convexity of the entropy and is analogous to the one introduced by Lott, Sturm, and Villani for geodesic measure spaces. In order to apply to the discrete setting, the role of the Wasserstein metric is taken over by a different metric, having the property that continuous time Markov chains are gradient flows of the entropy.
Using this notion of Ricci curvature we prove discrete analogues of fundamental results by Bakry--Emery and Otto--Villani. Furthermore we show that Ricci curvature bounds are preserved under tensorisation. As a special case we obtain the sharp Ricci curvature lower bound for the discrete hypercube.
papers  to-read  markov-chains  probability  measure-concentration 
5 weeks ago by mraginsky
[0704.0704] Entropic Measure and Wasserstein Diffusion
We construct a new random probability measure on the sphere and on the unit interval which in both cases has a Gibbs structure with the relative entropy functional as Hamiltonian. It satisfies a quasi-invariance formula with respect to the action of smooth diffeomorphism of the sphere and the interval respectively. The associated integration by parts formula is used to construct two classes of diffusion processes on probability measures (on the sphere or the unit interval) by Dirichlet form methods. The first one is closely related to Malliavin's Brownian motion on the homeomorphism group. The second one is a probability valued stochastic perturbation of the heat flow, whose intrinsic metric is the quadratic Wasserstein distance. It may be regarded as the canonical diffusion process on the Wasserstein space.
papers  to-read  optimal-transportation  information-theory  probability  re:FPF_project 
5 weeks ago by mraginsky
[1204.2980] Realizable Rate Distortion Function and Bayesian FIltering Theory
The relation between rate distortion function (RDF) and Bayesian filtering theory is discussed. The relation is established by imposing a causal or realizability constraint on the reconstruction conditional distribution of the RDF, leading to the definition of a causal RDF. Existence of the optimal reconstruction distribution of the causal RDF is shown using the topology of weak convergence of probability measures. The optimal non-stationary causal reproduction conditional distribution of the causal RDF is derived in closed form; it is given by a set of recursive equations which are computed backward in time. The realization of causal RDF is described via the source-channel matching approach, while an example is briefly discussed to illustrate the concepts.
papers  to-read  information-theory  rate-distortion  filtering  bayesian  estimation 
6 weeks ago by mraginsky
"Approximate dynamic programming via Bellman inequalities" (Wang and Boyd)
"In this paper we introduce new methods for finding functions that lower bound
the value function of a stochastic control problem, using an iterated form of the Bellman inequality. Our method is based on solving linear or semidefinite programs, and
produces both a bound on the optimal objective, as well as a suboptimal policy that
appears to works very well. These results extend and improve bounds obtained by
authors in a previous paper using a single Bellman inequality condition. We describe
the methods in a general setting, and show how they can be applied in specific cases
including the finite state case, constrained linear quadratic control, switched affine
control, and multi-period portfolio investment."
papers  to-read  control-theory  dynamic-programming  optimization 
6 weeks ago by mraginsky
[1204.2975] Multiple Objects: Error Exponents in Hypotheses Testing and Identification
"We servey [sic!] a series of investigations of optimal testing of multiple hypotheses conserning various multiobject models. These studies are a bright instance of application of methods and technics developed in Shannon information theory to solution of typical statistical problems."
papers  to-read  feedback-information-theory  statistics  active-learning  decision-making 
6 weeks ago by mraginsky
[1109.5417] Finite block length converse results for channel coding and assistance by non-signalling correlations
"A non-signalling assisted code is a block code in which the sender and receiver can be correlated in any way that would not by itself allow signalling between them. The maximum rate that can be achieved by such a code for a given block length and error probability is formulated as a linear program. This provides a finite block length converse for classical codes which we show is identical to one derived by Polyanskiy, Poor and Verd'{u}. This shows that the many classical asymptotic and finite converse results which can be derived from this converse apply equally to entanglement assisted codes which have been recently studied. It also gives an explicit linear programming formulation of the converse which has some useful features."
papers  to-read  information-theory  channel-coding 
6 weeks ago by mraginsky
Entropy and the value of information for investors (Cabrales et al.)
Consider any investor who fears ruin facing any set of invest-
ments that satisfy no-arbitrage. Before investing, he can purchase informa-
tion about the state of nature in the form of an information structure. Given
his prior, information structure is more informative than information struc-
ture if whenever he rejects at some price, he also rejects at that price.
We show that this complete informativeness ordering is represented by the
decrease in entropy of his beliefs, regardless of his preferences, initial wealth
or investment problem. It is also shown that no prior-independent informa-
tiveness ordering based on similar premises exists.
papers  to-read  economics  decision-making  comparison-of-experiments  re:knightian-uncertainty 
8 weeks ago by mraginsky
Information Order in Decision and Agency Problems (Ian Jewitt)
This paper establishes information order in a number of special settings. We
discuss Lehmann’s condition, relate it to Blackwell’s and discuss its application in
statistical decision problems, certain Bayesian games and in principal agent problems. We distinguish between KR (Karlin Rubin) monotone preferences and single
crossing preferences and between Lehmann’s and Persico’s theorems. A generalisation of Lehmann’s condition is given that is applicable to situations where monotone
likelihood ratio does not apply. The principal agent problem is treated both via
the first-order approach and for the general finite action model. Points of contact
with Lehmann’s condition are identified in both cases.
papers  to-read  economics  decision-making  comparison-of-experiments  re:knightian-uncertainty 
8 weeks ago by mraginsky
[1203.6502] Quantifying causal influences
"Common methods of causal inference generate directed acyclic graphs (DAGs) that formalize causal relations between n variables. Given the joint distribution of all these variables, the DAG contains all information about how intervening on one variable would change the distribution of the other n-1 variables. It remains, however, a non-trivial question how to quantify the causal influence of one variable on another one.
Here we propose a measure for causal strength that refers to direct effects and measure the "strength of an arrow" or a set of arrows. It is based on a hypothetical intervention that modifies the joint distribution by cutting the corresponding edge. The causal strength is then the relative entropy distance between the old and the new distribution.
We discuss other measures of causal strength like the average causal effect, transfer entropy and information flow and describe their limitations. We argue that our measure is also more appropriate for time series than the known ones.
Finally, we discuss conceptual problems in defining the strength of indirect effects."

There is no mention of directed information, and yet their measure of causal strength seems to be closely related to it.
papers  to-read  causality  information-theory  feedback-information-theory 
9 weeks ago by mraginsky
Majorizing codes and measures (Andreas Maurer)
An information theoretical interpretation of majorizing and minorizing
measures is given. The expression logarithmic in the reciprocal of the
measure of a ball is replaced by the number of bits needed to achieve
desired precision in some convergent code. We also give a local version of
the majorizing bound.
papers  to-read  measure-concentration  probability  information-theory 
9 weeks ago by mraginsky
[0907.4491] Bounding relative entropy by the relative entropy of local specifications in product spaces
For a class of density functions $q^n(x^n)$ on $Bbb R^n$ we prove an inequality between relative entropy and the sum of average conditional relative entropies of the following form: For any density function $p^n(x^n)$ on $Bbb R^n$, $D(p^n||q^n)leq Const. sum_{i=1}^n Bbb E D(p_i(cdot|Y_1,..., Y_{i-1},Y_{i+1},..., Y_n) || Q_i(cdot|Y_1,..., Y_{i-1},Y_{i+1},..., Y_n)),$ where $p_i(cdot|y_1,..., y_{i-1},y_{i+1},..., y_n)$ and $Q_i(cdot|x_1,..., x_{i-1},x_{i+1},..., x_n)$ denote the local specifications for $p^n$ resp. $q^n$, i.e., the conditional density functions of the $i$'th coordinate, given the other coordinates. The constant depends on the properties of the local specifications of $q^n$. The above inequality implies a logarithmic Sobolev inequality for $q^n$. We get an explicit lower bound for the logarithmic Sobolev constant of $q^n$ under the assumptions that: (i) the local specifications of $q^n$ satisfy logarithmic Sobolev inequalities with constants $rho_i$, and (ii) they also satisfy some condition expressing that the mixed partial derivatives of the Hamiltonian of $q^n$ are not too large relative to the logarithmic Sobolev constants $rho_i$. Condition (ii) may be weaker than that used in Otto and Reznikoff's recent paper on the estimation of logarithmic Sobolev constants of spin systems.
papers  to-read  measure-concentration  information-theory  probability  re:erasures_and_concentration_project 
9 weeks ago by mraginsky
[1203.4626] Active Sequential Hypothesis Testing
Consider a decision maker who is responsible to dynamically collect observations so as to enhance his information in a speedy manner about an underlying phenomena of interest while accounting for the penalty of wrong declaration. The special cases of the problem are shown to be that of variable-length coding with feedback and noisy dynamic search. Due to the sequential nature of the problem, the decision maker relies on his current information state to adaptively select the most "informative" sensing action among the available ones.
In this paper, using results in dynamic programming, a lower bound for the optimal total cost is established. Moreover, upper bounds are obtained via an analysis of heuristic policies for dynamic selection of actions. It is shown that the proposed heuristics achieve asymptotic optimality in many practically relevant problems including the problems of variable-length coding with feedback and noisy dynamic search; where asymptotic optimality implies that the relative difference between the total cost achieved by the proposed policies and the optimal total cost approaches zero as the penalty of wrong declaration or the number of hypotheses (hence the number of collected samples) increases. Furthermore, using the obtained bounds, the gain of adaptive selection of sensing actions is shown to be at least logarithmic in the penalty associated with wrong declarations.
papers  to-read  control-theory  information-theory  adaptive-systems  heard-the-talk 
10 weeks ago by mraginsky
[1104.1303] A new characterization of Talagrand's transport-entropy inequalities and applications
Wow! "We show that Talagrand's transport inequality is equivalent to a restricted logarithmic Sobolev inequality. This result clarifies the links between these two important functional inequalities. As an application, we give the first proof of the fact that Talagrand's inequality is stable under bounded perturbations."
papers  to-read  measure-concentration  optimal-transportation  probability  re:erasures_and_concentration_project 
12 weeks ago by mraginsky
[0911.0280] Causal Inference on Discrete Data using Additive Noise Models
Inferring the causal structure of a set of random variables from a finite sample of the joint distribution is an important problem in science. Recently, methods using additive noise models have been suggested to approach the case of continuous variables. In many situations, however, the variables of interest are discrete or even have only finitely many states. In this work we extend the notion of additive noise models to these cases. We prove that whenever the joint distribution $prob^{(X,Y)}$ admits such a model in one direction, e.g. $Y=f(X)+N, N independent X$, it does not admit the reversed model $X=g(Y)+tilde N, tilde N independent Y$ as long as the model is chosen in a generic way. Based on these deliberations we propose an efficient new algorithm that is able to distinguish between cause and effect for a finite sample of discrete variables. In an extensive experimental study we show that this algorithm works both on synthetic and real data sets.
papers  to-read  machine-learning  causality 
march 2012 by mraginsky
[1202.6258] A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets
We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. Numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms.
papers  to-read  optimization  convex-programming 
march 2012 by mraginsky
Certainty equivalence and model uncertainty
Simon’s and Theil’s certainty equivalence property justifies a convenient algorithm for solving dynamic programming problems with quadratic objectives and linear transition laws: first, optimize under perfect foresight, then substitute optimal forecasts for unknown future values. A similar decomposition into separate optimization and forecasting steps prevails when a decision maker wants a decision rule that is robust to model misspecification. Concerns about model misspecification leave the first step of the algorithm intact and affect only the second step of forecasting the future. The decision maker attains robustness by making forecasts with a distorted model that twists probabilities relative to his approximating model. The appropriate twisting emerges from a two-player zero-sum dynamic game.
papers  to-read  adaptive-systems  control-theory  dynamic-programming  estimation  game-theory  uncertainty 
february 2012 by mraginsky
[1201.2999] Spatially Coupled Ensembles Universally Achieve Capacity under Belief Propagation
We investigate spatially coupled code ensembles. For transmission over the binary erasure channel, it was recently shown that spatial coupling increases the belief propagation threshold of the ensemble to essentially the maximum a-priori threshold of the underlying component ensemble. This explains why convolutional LDPC ensembles, originally introduced by Felstrom and Zigangirov, perform so well over this channel. We show that the equivalent result holds true for transmission over general binary-input memoryless output-symmetric channels. More precisely, given a desired error probability and a gap to capacity, we can construct a spatially coupled ensemble which fulfills these constraints universally on this class of channels under belief propagation decoding. In fact, most codes in that ensemble have that property. The quantifier universal refers to the single ensemble/code which is good for all channels but we assume that the channel is known at the receiver. The key technical result is a proof that under belief propagation decoding spatially coupled ensembles achieve essentially the area threshold of the underlying uncoupled ensemble. We conclude by discussing some interesting open problems.
papers  to-read  information-theory  channel-coding 
february 2012 by mraginsky
"Online sequential optimization with biased gradients"
We study a class of stochastic optimization problems where though the objective functions
may not be convex, they satisfy a generalization of convexity, called the sequentially convex
property. We focus on a setting where the distribution of the underlying uncertainty is unknown and the manager must make a decision in real-time based on historical data. Since sequentially convex functions are not necessarily convex, they pose difficulties in applying standard adaptive methods for convex optimization. We propose a nonparametric algorithm based on a gradient descent method, and show that the T-season average expected cost di ffers from the minimum cost by at most O(1/sqrt(T)). Our analysis is based on a careful quanti cation of the bias that is inherent in gradient estimation due to the adaptive nature of the problem. We demonstrate the usefulness of the concept of sequential convexity by applying it to three canonical problems in inventory control, capacity allocation, and the lifetime buy decision, under the assumption that the manager does not know the demand distributions and has access only to historical sales (censored demand) data.
papers  to-read  online-learning  optimization  convex-programming 
february 2012 by mraginsky
[1105.2274] Data-Distributed Weighted Majority and Online Mirror Descent
In this paper, we focus on the question of the extent to which online learning can benefit from distributed computing. We focus on the setting in which $N$ agents online-learn cooperatively, where each agent only has access to its own data. We propose a generic data-distributed online learning meta-algorithm. We then introduce the Distributed Weighted Majority and Distributed Online Mirror Descent algorithms, as special cases. We show, using both theoretical analysis and experiments, that compared to a single agent: given the same computation time, these distributed algorithms achieve smaller generalization errors; and given the same generalization errors, they can be $N$ times faster.
papers  to-read  machine-learning  online-learning  optimization  distributed-systems 
february 2012 by mraginsky
[1007.1033] A Theory of Network Equivalence, Parts I and II
A family of equivalence tools for bounding network capacities is introduced. Part I treats networks of point-to-point channels. The main result is roughly as follows. Given a network of noisy, independent, memoryless point-to-point channels, a collection of communication demands can be met on the given network if and only if it can be met on another network where each noisy channel is replaced by a noiseless bit pipe with throughput equal to the noisy channel capacity. This result was known previously for the case of a single-source multicast demand. The result given here treats general demands -- including, for example, multiple unicast demands -- and applies even when the achievable rate region for the corresponding demands is unknown in the noiseless network. In part II, definitions of upper and lower bounding channel models for general channels are introduced. By these definitions, a collection of communication demands can be met on a network of independent channels if it can be met on a network where each channel is replaced by its lower bounding model andonly if it can be met on a network where each channel is replaced by its upper bounding model. This work derives general conditions under which a network of noiseless bit pipes is an upper or lower bounding model for a multiterminal channel. Example upper and lower bounding models for broadcast, multiple access, and interference channels are given. It is then shown that bounding the difference between the upper and lower bounding models for a given channel yields bounds on the accuracy of network capacity bounds derived using those models. By bounding the capacity of a network of independent noisy channels by the network coding capacity of a network of noiseless bit pipes, this approach represents one step towards the goal of building computational tools for bounding network capacities.
papers  to-read  information-theory  networks  multiagent-systems  heard-the-talk 
february 2012 by mraginsky
Natural selection and veridical perceptions 10.1016/j.jtbi.2010.07.020 : Journal of Theoretical Biology | ScienceDirect.com
Does natural selection favor veridical perceptions, those that more accurately depict the objective environment? Students of perception often claim that it does. But this claim, though influential, has not been adequately tested. Here we formalize the claim and a few alternatives. To test them, we introduce “interface games,” a class of evolutionary games in which perceptual strategies compete. We explore, in closed-form solutions and Monte Carlo simulations, some simpler games that assume frequency-dependent selection and complete mixing in infinite populations. We find that veridical perceptions can be driven to extinction by non-veridical strategies that are tuned to utility rather than objective reality. This suggests that natural selection need not favor veridical perceptions, and that the effects of selection on sensory perception deserve further study.
papers  to-read  perception  evolution  game-theory  re:active_feature_selection_project 
january 2012 by mraginsky
Concentration inequalities for functions of independent variables - Maurer - 2005 - Random Structures & Algorithms - Wiley Online Library
Following the entropy method this paper presents general concentration inequalities, which can be applied to combinatorial optimization and empirical processes. The inequalities give improved concentration results for optimal traveling salesmen tours, Steiner trees, and the eigenvalues of random symmetric matrices.
papers  to-read  measure-concentration  probability 
january 2012 by mraginsky
Dominated concentration : Statistics & Probability Letters | ScienceDirect.com
The concentration properties of one random variable may be governed by the values of another random variable which is concentrated and more easily analyzed. We present a general concentration inequality to handle such cases and apply it to the eigenvalues of the Gram matrix for a sample of independent vectors distributed in the unit ball of a Hilbert space. For large samples the deviation of the eigenvalues from their mean is shown to scale with the largest eigenvalue.
papers  to-read  measure-concentration  probability 
january 2012 by mraginsky
[1201.2256] Empirical Processes of Markov Chains and Dynamical Systems Indexed by Classes of Functions
We study weak convergence of empirical processes of dependent data, indexed by classes of functions. We obtain results that are especially suitable for data arising from dynamical systems and Markov chains, where the Central Limit Theorem for partial sums is commonly derived via the spectral gap technique. Our results apply, e.g. to the empirical process of ergodic torus automorphisms.
papers  to-read  empirical-processes  dynamical-systems  markov-chains  re:adaptive_control_project 
january 2012 by mraginsky
[1111.1977] On Refined Versions of the Azuma-Hoeffding Inequality with Applications in Information Theory
This paper derives some refined versions of the Azuma-Hoeffding inequality for discrete-parameter martingales with uniformly bounded jumps, and it considers some of their potential applications in information theory and related topics. The first part of this paper derives these refined inequalities, followed by a discussion on their relations to some classical results in probability theory. It also considers a geometric interpretation of some of these inequalities, providing an insight on the inter-connections between them. The second part exemplifies the use of these refined inequalities in the context of hypothesis testing, information theory, and communication. The paper is concluded with a discussion on some directions for further research. This work is meant to stimulate the use of some refined versions of the Azuma-Hoeffding inequality in information-theoretic aspects.
papers  to-read  information-theory  measure-concentration  martingales  markov-chains  probability 
january 2012 by mraginsky
[1201.0559] Chernoff-Hoeffding Bounds for Markov Chains: Generalized and Simplified
We prove the first Chernoff-Hoeffding bounds for general nonreversible finite-state Markov chains based on the standard L_1 (variation distance) mixing-time of the chain. Specifically, consider an ergodic Markov chain M and a weight function f: [n] -> [0,1] on the state space [n] of M with mean mu = E_{v <- pi}[f(v)], where pi is the stationary distribution of M. A t-step random walk (v_1,...,v_t) on M starting from the stationary distribution pi has expected total weight E[X] = mu t, where X = sum_{i=1}^t f(v_i). Let T be the L_1 mixing-time of M. We show that the probability of X deviating from its mean by a multiplicative factor of delta, i.e., Pr [ |X - mu t| >= delta mu t ], is at most exp(-Omega(delta^2 mu t / T)) for 0 <= delta <= 1, and exp(-Omega(delta mu t / T)) for delta > 1. In fact, the bounds hold even if the weight functions f_i's for i in [t] are distinct, provided that all of them have the same mean mu.
We also obtain a simplified proof for the Chernoff-Hoeffding bounds based on the spectral expansion lambda of M, which is the square root of the second largest eigenvalue (in absolute value) of M tilde{M}, where tilde{M} is the time-reversal Markov chain of M. We show that the probability Pr [ |X - mu t| >= delta mu t ] is at most exp(-Omega(delta^2 (1-lambda) mu t)) for 0 <= delta <= 1, and exp(-Omega(delta (1-lambda) mu t)) for delta > 1.
Both of our results extend to continuous-time Markov chains, and to the case where the walk starts from an arbitrary distribution x, at a price of a multiplicative factor depending on the distribution x in the concentration bounds
to-read  papers  markov-chains  measure-concentration  probability 
january 2012 by mraginsky
Concentration Inequalities for the Missing Mass and for Histogram Rule Error (McAllester and Ortiz)
This paper gives distribution-free concentration inequalities for the missing mass and the error rate of histogram rules. Negative association methods can be used to reduce these concentration problems to concentration questions about independent sums. Although the sums are independent, they are highly heterogeneous. Such highly heterogeneous independent sums cannot be analyzed using standard concentration inequalities such as Hoeffding's inequality, the Angluin-Valiant bound, Bernstein's inequality, Bennett's inequality, or McDiarmid's theorem. The concentration inequality for histogram rule error is motivated by the desire to construct a new class of bounds on the generalization error of decision trees.
papers  to-read  learning-theory  measure-concentration  information-theory  probability  re:erasures_and_concentration_project 
december 2011 by mraginsky
[1112.0708] Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing
We study the compressed sensing reconstruction problem for a broad class of random, band-diagonal sensing matrices. This construction is inspired by the idea of spatial coupling in coding theory. As demonstrated heuristically and numerically by Krzakala et al. cite{KrzakalaEtAl}, message passing algorithms can effectively solve the reconstruction problem for spatially coupled measurements with undersampling rates close to the fraction of non-zero coordinates.
We use an approximate message passing (AMP) algorithm and analyze it through the state evolution method. We give a rigorous proof that this approach is successful as soon as the undersampling rate $delta$ exceeds the (upper) R'enyi information dimension of the signal, $uRenyi(p_X)$. More precisely, for a sequence of signals of diverging dimension $n$ whose empirical distribution converges to $p_X$, reconstruction is with high probability successful from $uRenyi(p_X), n+o(n)$ measurements taken according to a band diagonal matrix.
For sparse signals, i.e. sequences of dimension $n$ and $k(n)$ non-zero entries, this implies reconstruction from $k(n)+o(n)$ measurements. For `discrete' signals, i.e. signals whose coordinates take a fixed finite set of values, this implies reconstruction from $o(n)$ measurements. The result is robust with respect to noise, does not apply uniquely to random signals, but requires the knowledge of the empirical distribution of the signal $p_X$.
papers  to-read  compressed-sensing  information-theory  message-passing  sparsity  analog-information-theory  Renyi-dimension 
december 2011 by mraginsky
[1111.6822] Optimal Phase Transitions in Compressed Sensing
Compressed sensing deals with efficient recovery of analog signals from linear measurements. This paper presents a statistical study of compressed sensing by modeling the input signal as random processes. Three classes of encoders are considered, namely optimal nonlinear, optimal linear and random linear encoders. Focusing on optimal decoders, we investigate the fundamental tradeoff between measurement rate and reconstruction fidelity gauged by error probability and noise sensitivity in the absence and presence of measurement noise, respectively. Optimal phase transition thresholds are determined as a functional of the input distribution and compared to suboptimal thresholds achieved by various popular reconstruction algorithms. In particular, we show that Gaussian sensing matrices incur no penalty on the phase transition threshold with respect to optimal nonlinear encoding. Our results also provide a rigorous justification of previous results based on replica heuristics in the weak-noise regime.
papers  to-read  information-theory  compressed-sensing  signal-processing  sparsity 
november 2011 by mraginsky
[1111.2622] Optimal re-centering bounds, with applications to Rosenthal-type concentration of measure inequalities
For any nonnegative Borel-measurable function f such that f(x)=0 if and only if x=0, the best constant c_f in the inequality E f(X-E X) leq c_f E f(X) for all random variables X with a finite mean is obtained. Properties of the constant c_f in the case when f=|.|^p are studied. Applications to concentration of measure in the form of Rosenthal-type bounds on the moments of separately Lipschitz functions on product spaces are given.
papers  to-read  probability  measure-concentration 
november 2011 by mraginsky
[1111.3029] Parametric estimation. Finite sample theory
The paper aims at reconsidering the famous Le Cam LAN theory. The main features of the approach which make it different from the classical one are: (1) the study is non-asymptotic, that is, the sample size is fixed and does not tend to infinity; (2) the parametric assumption is possibly misspecified and the underlying data distribution can lie beyond the given parametric family.
The main results include a large deviation bounds for the (quasi) maximum likelihood and the local quadratic majorization of the log-likelihood process. The latter yields a number of important corollaries for statistical inference: concentration, confidence and risk bounds, expansion of the maximum likelihood estimate, etc. All these corollaries are stated in a non-classical way admitting a model misspecification and finite samples. However, the classical asymptotic results including the efficiency bounds can be easily derived as corollaries of the obtained non-asymptotic statements. The general results are illustrated for the i.i.d. set-up as well as for generalized linear and median estimation. The results apply for any dimension of the parameter space and provide a quantitative lower bound on the sample size yielding the root-n accuracy.
papers  to-read  statistics  Le_Cam-theory  asymptotic-normality 
november 2011 by mraginsky
[1111.3054] Consistency under Sampling of Exponential Random Graph Models
The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focussing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general ones about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.
papers  to-read  statistics  exponential-families  graphical-models  networks 
november 2011 by mraginsky
[1111.3486] New Concentration Inequalities for Suprema of Empirical Processes
While effective concentration inequalities for suprema of empirical processes exist under boundedness or strict tail assumptions, no comparable results have been available under considerably weaker assumptions. In this paper, we derive concentration inequalities assuming only low moments for an envelope of the empirical process. These concentration inequalities are beneficial even when the envelope is much larger than the single functions under consideration.
papers  to-read  probability  empirical-processes  measure-concentration 
november 2011 by mraginsky
[1110.6654] Pointwise Relations between Information and Estimation
Many of the classical and recent relations between information and estimation in the presence of Gaussian noise can be viewed as identities between expectations of random quantities. These include the I-MMSE relationship of Guo et al.; the relative entropy and mismatched estimation relationship of Verd\'{u}; the relationship between causal estimation and mutual information of Duncan, and its extension to the presence of feedback by Kadota et al.; the relationship between causal and non-casual estimation of Guo et al., and its mismatched version of Weissman. We dispense with the expectations and explore the nature of the pointwise relations between the respective random quantities.
papers  to-read  information-theory  estimation  filtering 
november 2011 by mraginsky
[1110.5429] Causal modeling and inference for electricity markets
How does dynamic price information flow among Northern European electricity spot prices and prices of major electricity generation fuel sources? We use time series models combined with new advances in causal inference to answer these questions. Applying our methods to weekly Nordic and German electricity prices, and oil, gas and coal prices, with German wind power and Nordic water reservoir levels as exogenous variables, we estimate a causal model for the price dynamics, both for contemporaneous and lagged relationships. In contemporaneous time, Nordic and German electricity prices are interlinked through gas prices. In the long run, electricity prices and British gas prices adjust themselves to establish the equlibrium price level, since oil, coal, continental gas and EUR/USD are found to be weakly exogenous.
papers  to-read  causality  economics  markets 
october 2011 by mraginsky
[1110.5447] Optimal discovery with probabilistic expert advice
We consider an original problem that arises from the issue of security analysis of a power system and that we name optimal discovery with probabilistic expert advice. We address it with an algorithm based on the optimistic paradigm and the Good-Turing missing mass estimator. We show that this strategy uniformly attains the optimal discovery rate in a macroscopic limit sense, under some assumptions on the probabilistic experts. We also provide numerical experiments suggesting that this optimal behavior may still hold under weaker assumptions.
papers  to-read  prediction 
october 2011 by mraginsky
[1109.1516] Bayesian Inference with Optimal Maps
"We present a new approach to Bayesian inference that entirely avoids Markov chain simulation, by constructing a map that pushes forward the prior measure to the posterior measure. Existence and uniqueness of a suitable measure-preserving map is established by formulating the problem in the context of optimal transport theory."
to-read  papers  bayesian-learning  optimal-transportation 
september 2011 by mraginsky
« earlier      

related tags

active-learning  adaptive-systems  AI  algorithms  analog-information-theory  analysis  anomaly-detection  anthropology  art  assholes  asymptotic-normality  atheism  banach-spaces  bandit-problems  bayesian  bayesian-learning  behavioral-control  biology  biophysics  biostatistics  blogs  book-reviews  books  books-noted  bounded-rationality  business  causality  cells  channel-coding  Christianity  city-states  climate  clustering  coding-theory  cognition  cognitive-science  combinatorics  communication  communications  comparison-of-experiments  complex-systems  complexity  compressed-sensing  computation  computer-science  consensus-algorithms  continuing-crises  control-theory  convex-programming  cooking  copyright  criticism  crypto  cthulhu  culture  current-events  cybernetics  data-analysis  data-mining  decentralized-control  decision-making  decision-theory  democracy  dependent-data  distributed-computing  distributed-systems  divergence  dynamic-programming  dynamical-systems  economics  electricity-markets  empirical-processes  entropy-estimation  epistemology  epsilon-entropy  ergodic-theory  essays  estimation  evolution  experimental-design  exponential-families  fantasy  feature-selection  feedback  feedback-information-theory  fiction  filetype:pdf  film  filtering  finance  frequentist  game-theory  geekery  genetics  genomics  geometric-functional-analysis  graph-theory  graphical-models  group-testing  heard-the-talk  historiography  history  history-of-probability  history_of_statistics  impending_doom  information-geometry  information-structures  information-theory  innovation  institutions  interesting  internet  interviews  inverse-problems  Judaism  large-deviations  learning-theory  lecture-notes  Le_Cam-theory  linear-programming  linear-systems  literary-estates  literature  lovecraft  lower-bounds  machine-learning  majorization  management  markets  markov-chains  Markov-decision-processes  martingales  mathematics  MCMC  measure-concentration  media:document  message-passing  model-selection  multiagent-systems  music  mythology  nearest-neighbors  network-data-analysis  networks  neuroscience  nonparametrics  online-learning  optimal-search  optimal-transportation  optimization  oracle-inequalities  organizations  paper  papers  papes  particle-filters  PDEs  people  perception  philosophy  philosophy-of-science  physics  point-processes  polemics  policy  politics  pop-culture  prediction  privacy  probability  psychology  quantization  quantum-mechanics  random-matrices  rate-distortion  rationality  re:active_feature_selection_project  re:adaptive_control_project  re:causality_project  re:distributed_decisions_project  re:erasures_and_concentration_project  re:FPF_project  re:knightian-uncertainty  re:network_inference_project  re:posterior_matching_project  reference  reinforcement-learning  religion  Renyi-dimension  research  robust_control  robust_statistics  sci-fi  science  sensor-networks  signal-processing  simulation  social-networks  sociology  source-coding  sparsity  special-relativity  spectral-graph-theory  speculation  state-space-models  statistical-learning  statistical-physics  statistics  submodular-functions  system-identification  systems-biology  technology  thermodynamics  to-read  trivia  uncertainty  universal-prediction  via:arsyed  via:arthegall  via:bookslut  via:cshalizi  via:mreid  via:nielsen  via:shivak  vision  weird 

Copy this bookmark:



description:


tags: