JSTOR: The American Economic Review, Vol. 70, No. 3 (Jun., 1980), pp. 393-408
"On the impossibility of informationally efficient markets" (Grossman and Stiglitz)
papers  to-read  economics  via:cshalizi 
yesterday
The Complexity of Exchange - Axtell - 2005 - The Economic Journal - Wiley Online Library
"The computational complexity of two classes of market mechanisms is compared. First the Walrasian interpretation in which prices are centrally computed by an auctioneer. Recent results on the computational complexity are reviewed. The non-polynomial complexity of these algorithms makes Walrasian general equilibrium an implausible conception. Second, a decentralised picture of market processes is described, involving concurrent exchange within transient coalitions of agents. These processes feature price dispersion, yield allocations that are not in the core, modify the distribution of wealth, are always stable, but path-dependent. Replacing the Walrasian framing of markets requires substantial revision of conventional wisdom concerning markets."
papers  to-read  economics  decision-making  via:cshalizi 
yesterday
[1106.1401] Volatility of Power Grids under Real-Time Pricing
The paper proposes a framework for modeling and analysis of the dynamics of supply, demand, and clearing prices in power system with real-time retail pricing and information asymmetry. Real-time retail pricing is characterized by passing on the real-time wholesale electricity prices to the end consumers, and is shown to create a closed-loop feedback system between the physical layer and the market layer of the power system. In the absence of a carefully designed control law, such direct feedback between the two layers could increase volatility and lower the system's robustness to uncertainty in demand and generation. A new notion of generalized price-elasticity is introduced, and it is shown that price volatility can be characterized in terms of the system's maximal relative price elasticity, defined as the maximal ratio of the generalized price-elasticity of consumers to that of the producers. As this ratio increases, the system becomes more volatile, and eventually, unstable. As new demand response technologies and distributed storage increase the price-elasticity of demand, the architecture under examination is likely to lead to increased volatility and possibly instability. This highlights the need for assessing architecture systematically and in advance, in order to optimally strike the trade-offs between volatility, economic efficiency, and system reliability.
papers  to-read  economics  markets  electricity-markets 
3 days ago
Cognitive Democracy — Crooked Timber
"Over the last couple of years, Cosma Shalizi and I have been working together on various things, including, inter alia, the relationship between complex systems, democracy and the Internet. These are big unwieldy topics, and trying to think about them systematically is hard. Even so, we’ve gotten to the point where we at least feel ready to start throwing stuff at a wider audience, to get feedback on what works and what doesn’t. Here’s a paper we’re working on, which argues that we should (for some purposes at least), think of markets, hierarchy and democracy in terms of their capacity to solve complex collective problems, makes the case that democracy will on average do the job a lot better than the other two ways, and then looks at different forms of collective information processing on the Internet as experiments that democracies can learn from."
to-read  blogs  decision-making  institutions  markets  democracy  policy 
4 days ago
A Behavioral Defense of Rational Expectations
This paper studies decision making by agents who value optimism, but are unsure of their environment. As in Brunnermeier and Parker (2005), an agent’s optimism is assumed to be tempered by the decision costs it imposes. As in Hansen and Sargent (2008), an agent’s uncertainty about his environment leads him to formulate ‘robust’ decision rules. It is shown that when combined, these two considerations can lead agents to adhere to the Rational Expectations Hypothesis. Rather than being the outcome of the sophisticated statistical calculations of an impassive expected utility maximizer, Rational Expectations can instead be viewed as a useful approximation in environments where agents struggle to strike a balance between doubt and hope.
papers  to-read  rationality  economics  robust_control  decision-making  information-theory  decision-theory 
11 days ago
[1205.1099] From Knothe's rearrangement to Brenier's optimal transport map
"The Brenier optimal map and Knothe-Rosenblatt rearrangement are two instances of a transport map, that is to say a map sending one measure onto another. The main interest of the former is that it solves the Monge-Kantorovich optimal transport problem, while the latter is very easy to compute, being given by an explicit formula. A few years ago, Carlier, Galichon, and Santambrogio showed that the Knothe rearrangement could be seen as the limit of the Brenier map when the quadratic cost degenerates. In this paper, we prove that on the torus (to avoid boundary issues), when all the data are smooth, the evolution is also smooth, and is entirely determined by a PDE for the Kantorovich potential (which determines the map), with a subtle initial condition. The proof requires the use of the Nash-Moser inverse function theorem. This result generalizes the ode discovered by Carlier, Galichon, and Santambrogio when one measure is uniform and the other is discrete, and could pave to way to new numerical methods for optimal transportation."
papers  to-read  optimal-transportation  probability  PDEs 
23 days ago
[1002.1300] Architecture for communication with a fidelity criterion in unknown networks
We prove that in order to communicate independent sources (this is the unicast problem) between various users over an unknown medium to within various distortion levels, it is sufficient to consider source-channel separation based architectures: architectures which first compress the sources to within the corresponding distortion levels followed by reliable communication over the unknown medium. We are reducing the problem of universal rate-distortion communication of independent sources over a network to the universal reliable communication problem over networks. This is a reductionist view. We are not solving the reliable communication problem in networks.
papers  to-read  networks  information-theory  information-structures 
23 days ago
[1205.1005] Some Refinements of Large Deviation Tail Probabilities
"We study tail probabilities via some Gaussian approximations. Our results make refinements to large deviation theory. The proof builds on classical results by Bahadur and Rao. Binomial distributions and their tail probabilities are discussed in more detail."
papers  to-read  statistics  probability  large-deviations  measure-concentration  statistical-physics 
24 days ago
[1205.0858] Controlled Sensing for Multihypothesis Testing
"The problem of multiple hypothesis testing with observation control is considered in both fixed sample size and sequential settings. In the fixed sample size setting, for binary hypothesis testing, it is shown that the optimal exponent for the maximal error probability corresponds to the maximum Chernoff information over the choice of controls. It is also shown that a pure stationary open-loop control policy is asymptotically optimal within the larger class of all causal control policies. For multihypothesis testing in the fixed sample size setting, lower and upper bounds on the optimal error exponent are derived. It is also shown through an example with three hypotheses that the optimal causal control policy can be strictly better than the optimal open-loop control policy. In the sequential setting, a test based on earlier work by Chernoff for binary hypothesis testing, is shown to be first-order asymptotically optimal for multihypothesis testing for vanishing error probabilities. The role of past information and randomization in designing optimal control policies is discussed."
papers  to-read  information-theory  decision-making  feedback 
24 days ago
[1204.6265] Statistical inference for dynamical systems: a review
The topic of statistical inference for dynamical systems has been studied extensively across several fields. In this survey we focus on the problem of parameter estimation for non-linear dynamical systems. Our objective is to place results across distinct disciplines in a common setting and highlight opportunities for further research.
papers  to-read  dynamical-systems  machine-learning  system-identification  ergodic-theory 
4 weeks ago
[1204.5721] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.
papers  to-read  bandit-problems  online-learning  control-theory  dynamic-programming  decision-making 
4 weeks ago
Fluctuations and response out of equilibrium (C. Maes)
"We discuss some recently visited positions towards dealing with nonequilibria from the
mathematical point of view of Markov networks."
papers  to-read  statistical-physics  thermodynamics  probability  information-theory 
4 weeks ago
[1107.1222] On the information-theoretic structure of distributed measurements
The internal structure of a measuring device, which depends on what its components are and how they are organized, determines how it categorizes its inputs. This paper presents a geometric approach to studying the internal structure of measurements performed by distributed systems such as probabilistic cellular automata. It constructs the quale, a family of sections of a suitably defined presheaf, whose elements correspond to the measurements performed by all subsystems of a distributed system. Using the quale we quantify (i) the information generated by a measurement; (ii) the extent to which a measurement is context-dependent; and (iii) whether a measurement is decomposable into independent submeasurements, which turns out to be equivalent to context-dependence. Finally, we show that only indecomposable measurements are more informative than the sum of their submeasurements.
papers  to-read  dynamical-systems  information-theory  complex-systems  distributed-systems 
5 weeks ago
Game of Thrones, US Politics Edition
I disagree on the Daenerys analogy, though -- should be Cersei instead.
politics  satire  via:cshalizi 
5 weeks ago
[1111.2687] Ricci curvature of finite Markov chains via convexity of the entropy
We define and study a new notion of Ricci curvature that applies to Markov chains on discrete spaces. This notion relies on geodesic convexity of the entropy and is analogous to the one introduced by Lott, Sturm, and Villani for geodesic measure spaces. In order to apply to the discrete setting, the role of the Wasserstein metric is taken over by a different metric, having the property that continuous time Markov chains are gradient flows of the entropy.
Using this notion of Ricci curvature we prove discrete analogues of fundamental results by Bakry--Emery and Otto--Villani. Furthermore we show that Ricci curvature bounds are preserved under tensorisation. As a special case we obtain the sharp Ricci curvature lower bound for the discrete hypercube.
papers  to-read  markov-chains  probability  measure-concentration 
5 weeks ago
[0704.0704] Entropic Measure and Wasserstein Diffusion
We construct a new random probability measure on the sphere and on the unit interval which in both cases has a Gibbs structure with the relative entropy functional as Hamiltonian. It satisfies a quasi-invariance formula with respect to the action of smooth diffeomorphism of the sphere and the interval respectively. The associated integration by parts formula is used to construct two classes of diffusion processes on probability measures (on the sphere or the unit interval) by Dirichlet form methods. The first one is closely related to Malliavin's Brownian motion on the homeomorphism group. The second one is a probability valued stochastic perturbation of the heat flow, whose intrinsic metric is the quadratic Wasserstein distance. It may be regarded as the canonical diffusion process on the Wasserstein space.
papers  to-read  optimal-transportation  information-theory  probability  re:FPF_project 
5 weeks ago
[1204.2980] Realizable Rate Distortion Function and Bayesian FIltering Theory
The relation between rate distortion function (RDF) and Bayesian filtering theory is discussed. The relation is established by imposing a causal or realizability constraint on the reconstruction conditional distribution of the RDF, leading to the definition of a causal RDF. Existence of the optimal reconstruction distribution of the causal RDF is shown using the topology of weak convergence of probability measures. The optimal non-stationary causal reproduction conditional distribution of the causal RDF is derived in closed form; it is given by a set of recursive equations which are computed backward in time. The realization of causal RDF is described via the source-channel matching approach, while an example is briefly discussed to illustrate the concepts.
papers  to-read  information-theory  rate-distortion  filtering  bayesian  estimation 
6 weeks ago
"Approximate dynamic programming via Bellman inequalities" (Wang and Boyd)
"In this paper we introduce new methods for finding functions that lower bound
the value function of a stochastic control problem, using an iterated form of the Bellman inequality. Our method is based on solving linear or semidefinite programs, and
produces both a bound on the optimal objective, as well as a suboptimal policy that
appears to works very well. These results extend and improve bounds obtained by
authors in a previous paper using a single Bellman inequality condition. We describe
the methods in a general setting, and show how they can be applied in specific cases
including the finite state case, constrained linear quadratic control, switched affine
control, and multi-period portfolio investment."
papers  to-read  control-theory  dynamic-programming  optimization 
6 weeks ago
[1204.2975] Multiple Objects: Error Exponents in Hypotheses Testing and Identification
"We servey [sic!] a series of investigations of optimal testing of multiple hypotheses conserning various multiobject models. These studies are a bright instance of application of methods and technics developed in Shannon information theory to solution of typical statistical problems."
papers  to-read  feedback-information-theory  statistics  active-learning  decision-making 
6 weeks ago
[1109.5417] Finite block length converse results for channel coding and assistance by non-signalling correlations
"A non-signalling assisted code is a block code in which the sender and receiver can be correlated in any way that would not by itself allow signalling between them. The maximum rate that can be achieved by such a code for a given block length and error probability is formulated as a linear program. This provides a finite block length converse for classical codes which we show is identical to one derived by Polyanskiy, Poor and Verd'{u}. This shows that the many classical asymptotic and finite converse results which can be derived from this converse apply equally to entanglement assisted codes which have been recently studied. It also gives an explicit linear programming formulation of the converse which has some useful features."
papers  to-read  information-theory  channel-coding 
6 weeks ago
Entropy and the value of information for investors (Cabrales et al.)
Consider any investor who fears ruin facing any set of invest-
ments that satisfy no-arbitrage. Before investing, he can purchase informa-
tion about the state of nature in the form of an information structure. Given
his prior, information structure is more informative than information struc-
ture if whenever he rejects at some price, he also rejects at that price.
We show that this complete informativeness ordering is represented by the
decrease in entropy of his beliefs, regardless of his preferences, initial wealth
or investment problem. It is also shown that no prior-independent informa-
tiveness ordering based on similar premises exists.
papers  to-read  economics  decision-making  comparison-of-experiments  re:knightian-uncertainty 
8 weeks ago
Information Order in Decision and Agency Problems (Ian Jewitt)
This paper establishes information order in a number of special settings. We
discuss Lehmann’s condition, relate it to Blackwell’s and discuss its application in
statistical decision problems, certain Bayesian games and in principal agent problems. We distinguish between KR (Karlin Rubin) monotone preferences and single
crossing preferences and between Lehmann’s and Persico’s theorems. A generalisation of Lehmann’s condition is given that is applicable to situations where monotone
likelihood ratio does not apply. The principal agent problem is treated both via
the first-order approach and for the general finite action model. Points of contact
with Lehmann’s condition are identified in both cases.
papers  to-read  economics  decision-making  comparison-of-experiments  re:knightian-uncertainty 
8 weeks ago
[1203.6502] Quantifying causal influences
"Common methods of causal inference generate directed acyclic graphs (DAGs) that formalize causal relations between n variables. Given the joint distribution of all these variables, the DAG contains all information about how intervening on one variable would change the distribution of the other n-1 variables. It remains, however, a non-trivial question how to quantify the causal influence of one variable on another one.
Here we propose a measure for causal strength that refers to direct effects and measure the "strength of an arrow" or a set of arrows. It is based on a hypothetical intervention that modifies the joint distribution by cutting the corresponding edge. The causal strength is then the relative entropy distance between the old and the new distribution.
We discuss other measures of causal strength like the average causal effect, transfer entropy and information flow and describe their limitations. We argue that our measure is also more appropriate for time series than the known ones.
Finally, we discuss conceptual problems in defining the strength of indirect effects."

There is no mention of directed information, and yet their measure of causal strength seems to be closely related to it.
papers  to-read  causality  information-theory  feedback-information-theory 
9 weeks ago
Majorizing codes and measures (Andreas Maurer)
An information theoretical interpretation of majorizing and minorizing
measures is given. The expression logarithmic in the reciprocal of the
measure of a ball is replaced by the number of bits needed to achieve
desired precision in some convergent code. We also give a local version of
the majorizing bound.
papers  to-read  measure-concentration  probability  information-theory 
9 weeks ago
Social Democracy for the 21st Century: A Post Keynesian Perspective: Hayek the Evil Socialist
"He can be seen through a reading of his book Law, Legislation and Liberty: A New Statement of the Liberal Principles of Justice and Political Economy (vol. 3) (London, 1979), pp. 41–64.

In particular, this striking passage could have been written by a Keynesian:
“On the other hand, it is merely common sense that government, as the biggest spender and investor whose activities cannot be guided wholly by profitability, and which for finance is in a great measure independent of the state of the capital market, should so far as practicable distribute its expenditure over time in such a manner that it will step in when private investment flags, and thereby employ resources for public investment at the least cost and with the greatest benefit to society.” (Hayek 1979: 59)."
blogs  economics  libertarianism 
9 weeks ago
[0907.4491] Bounding relative entropy by the relative entropy of local specifications in product spaces
For a class of density functions $q^n(x^n)$ on $Bbb R^n$ we prove an inequality between relative entropy and the sum of average conditional relative entropies of the following form: For any density function $p^n(x^n)$ on $Bbb R^n$, $D(p^n||q^n)leq Const. sum_{i=1}^n Bbb E D(p_i(cdot|Y_1,..., Y_{i-1},Y_{i+1},..., Y_n) || Q_i(cdot|Y_1,..., Y_{i-1},Y_{i+1},..., Y_n)),$ where $p_i(cdot|y_1,..., y_{i-1},y_{i+1},..., y_n)$ and $Q_i(cdot|x_1,..., x_{i-1},x_{i+1},..., x_n)$ denote the local specifications for $p^n$ resp. $q^n$, i.e., the conditional density functions of the $i$'th coordinate, given the other coordinates. The constant depends on the properties of the local specifications of $q^n$. The above inequality implies a logarithmic Sobolev inequality for $q^n$. We get an explicit lower bound for the logarithmic Sobolev constant of $q^n$ under the assumptions that: (i) the local specifications of $q^n$ satisfy logarithmic Sobolev inequalities with constants $rho_i$, and (ii) they also satisfy some condition expressing that the mixed partial derivatives of the Hamiltonian of $q^n$ are not too large relative to the logarithmic Sobolev constants $rho_i$. Condition (ii) may be weaker than that used in Otto and Reznikoff's recent paper on the estimation of logarithmic Sobolev constants of spin systems.
papers  to-read  measure-concentration  information-theory  probability  re:erasures_and_concentration_project 
9 weeks ago
[1203.4626] Active Sequential Hypothesis Testing
Consider a decision maker who is responsible to dynamically collect observations so as to enhance his information in a speedy manner about an underlying phenomena of interest while accounting for the penalty of wrong declaration. The special cases of the problem are shown to be that of variable-length coding with feedback and noisy dynamic search. Due to the sequential nature of the problem, the decision maker relies on his current information state to adaptively select the most "informative" sensing action among the available ones.
In this paper, using results in dynamic programming, a lower bound for the optimal total cost is established. Moreover, upper bounds are obtained via an analysis of heuristic policies for dynamic selection of actions. It is shown that the proposed heuristics achieve asymptotic optimality in many practically relevant problems including the problems of variable-length coding with feedback and noisy dynamic search; where asymptotic optimality implies that the relative difference between the total cost achieved by the proposed policies and the optimal total cost approaches zero as the penalty of wrong declaration or the number of hypotheses (hence the number of collected samples) increases. Furthermore, using the obtained bounds, the gain of adaptive selection of sensing actions is shown to be at least logarithmic in the penalty associated with wrong declarations.
papers  to-read  control-theory  information-theory  adaptive-systems  heard-the-talk 
10 weeks ago
[1104.1303] A new characterization of Talagrand's transport-entropy inequalities and applications
Wow! "We show that Talagrand's transport inequality is equivalent to a restricted logarithmic Sobolev inequality. This result clarifies the links between these two important functional inequalities. As an application, we give the first proof of the fact that Talagrand's inequality is stable under bounded perturbations."
papers  to-read  measure-concentration  optimal-transportation  probability  re:erasures_and_concentration_project 
12 weeks ago
[0911.0280] Causal Inference on Discrete Data using Additive Noise Models
Inferring the causal structure of a set of random variables from a finite sample of the joint distribution is an important problem in science. Recently, methods using additive noise models have been suggested to approach the case of continuous variables. In many situations, however, the variables of interest are discrete or even have only finitely many states. In this work we extend the notion of additive noise models to these cases. We prove that whenever the joint distribution $prob^{(X,Y)}$ admits such a model in one direction, e.g. $Y=f(X)+N, N independent X$, it does not admit the reversed model $X=g(Y)+tilde N, tilde N independent Y$ as long as the model is chosen in a generic way. Based on these deliberations we propose an efficient new algorithm that is able to distinguish between cause and effect for a finite sample of discrete variables. In an extensive experimental study we show that this algorithm works both on synthetic and real data sets.
papers  to-read  machine-learning  causality 
march 2012
[1202.6258] A Stochastic Gradient Method with an Exponential Convergence Rate for Strongly-Convex Optimization with Finite Training Sets
We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. Numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms.
papers  to-read  optimization  convex-programming 
march 2012
Certainty equivalence and model uncertainty
Simon’s and Theil’s certainty equivalence property justifies a convenient algorithm for solving dynamic programming problems with quadratic objectives and linear transition laws: first, optimize under perfect foresight, then substitute optimal forecasts for unknown future values. A similar decomposition into separate optimization and forecasting steps prevails when a decision maker wants a decision rule that is robust to model misspecification. Concerns about model misspecification leave the first step of the algorithm intact and affect only the second step of forecasting the future. The decision maker attains robustness by making forecasts with a distorted model that twists probabilities relative to his approximating model. The appropriate twisting emerges from a two-player zero-sum dynamic game.
papers  to-read  adaptive-systems  control-theory  dynamic-programming  estimation  game-theory  uncertainty 
february 2012
Hansen, L.P. and Sargent, T.J.: Robustness.
The standard theory of decision making under uncertainty advises the decision maker to form a statistical model linking outcomes to decisions and then to choose the optimal distribution of outcomes. This assumes that the decision maker trusts the model completely. But what should a decision maker do if the model cannot be trusted?

Lars Hansen and Thomas Sargent, two leading macroeconomists, push the field forward as they set about answering this question. They adapt robust control techniques and apply them to economics. By using this theory to let decision makers acknowledge misspecification in economic modeling, the authors develop applications to a variety of problems in dynamic macroeconomics.

Technical, rigorous, and self-contained, this book will be useful for macroeconomists who seek to improve the robustness of decision-making processes.
books-noted  economics  robust_control  game-theory  multiagent-systems  uncertainty 
february 2012
[1201.2999] Spatially Coupled Ensembles Universally Achieve Capacity under Belief Propagation
We investigate spatially coupled code ensembles. For transmission over the binary erasure channel, it was recently shown that spatial coupling increases the belief propagation threshold of the ensemble to essentially the maximum a-priori threshold of the underlying component ensemble. This explains why convolutional LDPC ensembles, originally introduced by Felstrom and Zigangirov, perform so well over this channel. We show that the equivalent result holds true for transmission over general binary-input memoryless output-symmetric channels. More precisely, given a desired error probability and a gap to capacity, we can construct a spatially coupled ensemble which fulfills these constraints universally on this class of channels under belief propagation decoding. In fact, most codes in that ensemble have that property. The quantifier universal refers to the single ensemble/code which is good for all channels but we assume that the channel is known at the receiver. The key technical result is a proof that under belief propagation decoding spatially coupled ensembles achieve essentially the area threshold of the underlying uncoupled ensemble. We conclude by discussing some interesting open problems.
papers  to-read  information-theory  channel-coding 
february 2012
"Online sequential optimization with biased gradients"
We study a class of stochastic optimization problems where though the objective functions
may not be convex, they satisfy a generalization of convexity, called the sequentially convex
property. We focus on a setting where the distribution of the underlying uncertainty is unknown and the manager must make a decision in real-time based on historical data. Since sequentially convex functions are not necessarily convex, they pose difficulties in applying standard adaptive methods for convex optimization. We propose a nonparametric algorithm based on a gradient descent method, and show that the T-season average expected cost di ffers from the minimum cost by at most O(1/sqrt(T)). Our analysis is based on a careful quanti cation of the bias that is inherent in gradient estimation due to the adaptive nature of the problem. We demonstrate the usefulness of the concept of sequential convexity by applying it to three canonical problems in inventory control, capacity allocation, and the lifetime buy decision, under the assumption that the manager does not know the demand distributions and has access only to historical sales (censored demand) data.
papers  to-read  online-learning  optimization  convex-programming 
february 2012
[1105.2274] Data-Distributed Weighted Majority and Online Mirror Descent
In this paper, we focus on the question of the extent to which online learning can benefit from distributed computing. We focus on the setting in which $N$ agents online-learn cooperatively, where each agent only has access to its own data. We propose a generic data-distributed online learning meta-algorithm. We then introduce the Distributed Weighted Majority and Distributed Online Mirror Descent algorithms, as special cases. We show, using both theoretical analysis and experiments, that compared to a single agent: given the same computation time, these distributed algorithms achieve smaller generalization errors; and given the same generalization errors, they can be $N$ times faster.
papers  to-read  machine-learning  online-learning  optimization  distributed-systems 
february 2012
[1007.1033] A Theory of Network Equivalence, Parts I and II
A family of equivalence tools for bounding network capacities is introduced. Part I treats networks of point-to-point channels. The main result is roughly as follows. Given a network of noisy, independent, memoryless point-to-point channels, a collection of communication demands can be met on the given network if and only if it can be met on another network where each noisy channel is replaced by a noiseless bit pipe with throughput equal to the noisy channel capacity. This result was known previously for the case of a single-source multicast demand. The result given here treats general demands -- including, for example, multiple unicast demands -- and applies even when the achievable rate region for the corresponding demands is unknown in the noiseless network. In part II, definitions of upper and lower bounding channel models for general channels are introduced. By these definitions, a collection of communication demands can be met on a network of independent channels if it can be met on a network where each channel is replaced by its lower bounding model andonly if it can be met on a network where each channel is replaced by its upper bounding model. This work derives general conditions under which a network of noiseless bit pipes is an upper or lower bounding model for a multiterminal channel. Example upper and lower bounding models for broadcast, multiple access, and interference channels are given. It is then shown that bounding the difference between the upper and lower bounding models for a given channel yields bounds on the accuracy of network capacity bounds derived using those models. By bounding the capacity of a network of independent noisy channels by the network coding capacity of a network of noiseless bit pipes, this approach represents one step towards the goal of building computational tools for bounding network capacities.
papers  to-read  information-theory  networks  multiagent-systems  heard-the-talk 
february 2012
Natural selection and veridical perceptions 10.1016/j.jtbi.2010.07.020 : Journal of Theoretical Biology | ScienceDirect.com
Does natural selection favor veridical perceptions, those that more accurately depict the objective environment? Students of perception often claim that it does. But this claim, though influential, has not been adequately tested. Here we formalize the claim and a few alternatives. To test them, we introduce “interface games,” a class of evolutionary games in which perceptual strategies compete. We explore, in closed-form solutions and Monte Carlo simulations, some simpler games that assume frequency-dependent selection and complete mixing in infinite populations. We find that veridical perceptions can be driven to extinction by non-veridical strategies that are tuned to utility rather than objective reality. This suggests that natural selection need not favor veridical perceptions, and that the effects of selection on sensory perception deserve further study.
papers  to-read  perception  evolution  game-theory  re:active_feature_selection_project 
january 2012
Concentration inequalities for functions of independent variables - Maurer - 2005 - Random Structures & Algorithms - Wiley Online Library
Following the entropy method this paper presents general concentration inequalities, which can be applied to combinatorial optimization and empirical processes. The inequalities give improved concentration results for optimal traveling salesmen tours, Steiner trees, and the eigenvalues of random symmetric matrices.
papers  to-read  measure-concentration  probability 
january 2012
Dominated concentration : Statistics & Probability Letters | ScienceDirect.com
The concentration properties of one random variable may be governed by the values of another random variable which is concentrated and more easily analyzed. We present a general concentration inequality to handle such cases and apply it to the eigenvalues of the Gram matrix for a sample of independent vectors distributed in the unit ball of a Hilbert space. For large samples the deviation of the eigenvalues from their mean is shown to scale with the largest eigenvalue.
papers  to-read  measure-concentration  probability 
january 2012
[1201.2256] Empirical Processes of Markov Chains and Dynamical Systems Indexed by Classes of Functions
We study weak convergence of empirical processes of dependent data, indexed by classes of functions. We obtain results that are especially suitable for data arising from dynamical systems and Markov chains, where the Central Limit Theorem for partial sums is commonly derived via the spectral gap technique. Our results apply, e.g. to the empirical process of ergodic torus automorphisms.
papers  to-read  empirical-processes  dynamical-systems  markov-chains  re:adaptive_control_project 
january 2012
Gronwall's inequality (J. W. Robin)
The _right_ way of proving Gronwall's inequality.
mathematics  analysis 
january 2012
[1111.1977] On Refined Versions of the Azuma-Hoeffding Inequality with Applications in Information Theory
This paper derives some refined versions of the Azuma-Hoeffding inequality for discrete-parameter martingales with uniformly bounded jumps, and it considers some of their potential applications in information theory and related topics. The first part of this paper derives these refined inequalities, followed by a discussion on their relations to some classical results in probability theory. It also considers a geometric interpretation of some of these inequalities, providing an insight on the inter-connections between them. The second part exemplifies the use of these refined inequalities in the context of hypothesis testing, information theory, and communication. The paper is concluded with a discussion on some directions for further research. This work is meant to stimulate the use of some refined versions of the Azuma-Hoeffding inequality in information-theoretic aspects.
papers  to-read  information-theory  measure-concentration  martingales  markov-chains  probability 
january 2012
[1201.0559] Chernoff-Hoeffding Bounds for Markov Chains: Generalized and Simplified
We prove the first Chernoff-Hoeffding bounds for general nonreversible finite-state Markov chains based on the standard L_1 (variation distance) mixing-time of the chain. Specifically, consider an ergodic Markov chain M and a weight function f: [n] -> [0,1] on the state space [n] of M with mean mu = E_{v <- pi}[f(v)], where pi is the stationary distribution of M. A t-step random walk (v_1,...,v_t) on M starting from the stationary distribution pi has expected total weight E[X] = mu t, where X = sum_{i=1}^t f(v_i). Let T be the L_1 mixing-time of M. We show that the probability of X deviating from its mean by a multiplicative factor of delta, i.e., Pr [ |X - mu t| >= delta mu t ], is at most exp(-Omega(delta^2 mu t / T)) for 0 <= delta <= 1, and exp(-Omega(delta mu t / T)) for delta > 1. In fact, the bounds hold even if the weight functions f_i's for i in [t] are distinct, provided that all of them have the same mean mu.
We also obtain a simplified proof for the Chernoff-Hoeffding bounds based on the spectral expansion lambda of M, which is the square root of the second largest eigenvalue (in absolute value) of M tilde{M}, where tilde{M} is the time-reversal Markov chain of M. We show that the probability Pr [ |X - mu t| >= delta mu t ] is at most exp(-Omega(delta^2 (1-lambda) mu t)) for 0 <= delta <= 1, and exp(-Omega(delta (1-lambda) mu t)) for delta > 1.
Both of our results extend to continuous-time Markov chains, and to the case where the walk starts from an arbitrary distribution x, at a price of a multiplicative factor depending on the distribution x in the concentration bounds
to-read  papers  markov-chains  measure-concentration  probability 
january 2012
Concentration Inequalities for the Missing Mass and for Histogram Rule Error (McAllester and Ortiz)
This paper gives distribution-free concentration inequalities for the missing mass and the error rate of histogram rules. Negative association methods can be used to reduce these concentration problems to concentration questions about independent sums. Although the sums are independent, they are highly heterogeneous. Such highly heterogeneous independent sums cannot be analyzed using standard concentration inequalities such as Hoeffding's inequality, the Angluin-Valiant bound, Bernstein's inequality, Bennett's inequality, or McDiarmid's theorem. The concentration inequality for histogram rule error is motivated by the desire to construct a new class of bounds on the generalization error of decision trees.
papers  to-read  learning-theory  measure-concentration  information-theory  probability  re:erasures_and_concentration_project 
december 2011
Sebastien Bubeck, "Introduction to Online Optimization"
lecture notes for ORF570: Special Topics in Statistics and Operations Research, Prediction Games (Princeton)
lecture-notes  online-learning  optimization  prediction 
december 2011
[1112.0708] Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing
We study the compressed sensing reconstruction problem for a broad class of random, band-diagonal sensing matrices. This construction is inspired by the idea of spatial coupling in coding theory. As demonstrated heuristically and numerically by Krzakala et al. cite{KrzakalaEtAl}, message passing algorithms can effectively solve the reconstruction problem for spatially coupled measurements with undersampling rates close to the fraction of non-zero coordinates.
We use an approximate message passing (AMP) algorithm and analyze it through the state evolution method. We give a rigorous proof that this approach is successful as soon as the undersampling rate $delta$ exceeds the (upper) R'enyi information dimension of the signal, $uRenyi(p_X)$. More precisely, for a sequence of signals of diverging dimension $n$ whose empirical distribution converges to $p_X$, reconstruction is with high probability successful from $uRenyi(p_X), n+o(n)$ measurements taken according to a band diagonal matrix.
For sparse signals, i.e. sequences of dimension $n$ and $k(n)$ non-zero entries, this implies reconstruction from $k(n)+o(n)$ measurements. For `discrete' signals, i.e. signals whose coordinates take a fixed finite set of values, this implies reconstruction from $o(n)$ measurements. The result is robust with respect to noise, does not apply uniquely to random signals, but requires the knowledge of the empirical distribution of the signal $p_X$.
papers  to-read  compressed-sensing  information-theory  message-passing  sparsity  analog-information-theory  Renyi-dimension 
december 2011
[1111.6822] Optimal Phase Transitions in Compressed Sensing
Compressed sensing deals with efficient recovery of analog signals from linear measurements. This paper presents a statistical study of compressed sensing by modeling the input signal as random processes. Three classes of encoders are considered, namely optimal nonlinear, optimal linear and random linear encoders. Focusing on optimal decoders, we investigate the fundamental tradeoff between measurement rate and reconstruction fidelity gauged by error probability and noise sensitivity in the absence and presence of measurement noise, respectively. Optimal phase transition thresholds are determined as a functional of the input distribution and compared to suboptimal thresholds achieved by various popular reconstruction algorithms. In particular, we show that Gaussian sensing matrices incur no penalty on the phase transition threshold with respect to optimal nonlinear encoding. Our results also provide a rigorous justification of previous results based on replica heuristics in the weak-noise regime.
papers  to-read  information-theory  compressed-sensing  signal-processing  sparsity 
november 2011
Le Cam Made Simple: No-N Asymptotics
"If the log likelihood is approximately quadratic with constant Hessian, then the maximum likelihood estimator (MLE) is approximately normally distributed. No other assumptions are required. We do not need independent and identically distributed data. We do not need the law of large numbers (LLN) or the central limit theorem (CLT). We do not need sample size going to infinity or anything going to infinity.

The theory presented here is a combination of Le Cam style involving local asymptotic normality (LAN) and local asymptotic mixed normality (LAMN) and Cramér style involving derivatives and Fisher information. The main tool is convergence in law of the log likelihood function and its derivatives considered as random elements of a Polish space of continuous functions with the metric of uniform convergence on compact sets. We obtain results for both one-step-Newton estimators and Newton-iterated-to-convergence estimators."
have-read  statistics  estimation  Le_Cam-theory  asymptotic-normality  information-geometry  via:cshalizi 
november 2011
Edsger W. Dijkstra Prize in Distributed Computing
The Edsger W. Dijkstra Prize in Distributed Computing is named for Edsger Wybe Dijkstra (1930-2002), a pioneer in the area of distributed computing. His foundational work on concurrency primitives (such as the semaphore), concurrency problems (such as mutual exclusion and deadlock), reasoning about concurrent systems, and self-stabilization comprises one of the most important supports upon which the field of distributed computing is built. No other individual has had a larger influence on research in principles of distributed computing.
computation  distributed-computing  distributed-systems  reference 
november 2011
[1111.2622] Optimal re-centering bounds, with applications to Rosenthal-type concentration of measure inequalities
For any nonnegative Borel-measurable function f such that f(x)=0 if and only if x=0, the best constant c_f in the inequality E f(X-E X) leq c_f E f(X) for all random variables X with a finite mean is obtained. Properties of the constant c_f in the case when f=|.|^p are studied. Applications to concentration of measure in the form of Rosenthal-type bounds on the moments of separately Lipschitz functions on product spaces are given.
papers  to-read  probability  measure-concentration 
november 2011
[1111.3029] Parametric estimation. Finite sample theory
The paper aims at reconsidering the famous Le Cam LAN theory. The main features of the approach which make it different from the classical one are: (1) the study is non-asymptotic, that is, the sample size is fixed and does not tend to infinity; (2) the parametric assumption is possibly misspecified and the underlying data distribution can lie beyond the given parametric family.
The main results include a large deviation bounds for the (quasi) maximum likelihood and the local quadratic majorization of the log-likelihood process. The latter yields a number of important corollaries for statistical inference: concentration, confidence and risk bounds, expansion of the maximum likelihood estimate, etc. All these corollaries are stated in a non-classical way admitting a model misspecification and finite samples. However, the classical asymptotic results including the efficiency bounds can be easily derived as corollaries of the obtained non-asymptotic statements. The general results are illustrated for the i.i.d. set-up as well as for generalized linear and median estimation. The results apply for any dimension of the parameter space and provide a quantitative lower bound on the sample size yielding the root-n accuracy.
papers  to-read  statistics  Le_Cam-theory  asymptotic-normality 
november 2011
[1111.3054] Consistency under Sampling of Exponential Random Graph Models
The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focussing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general ones about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.
papers  to-read  statistics  exponential-families  graphical-models  networks 
november 2011
[1111.3486] New Concentration Inequalities for Suprema of Empirical Processes
While effective concentration inequalities for suprema of empirical processes exist under boundedness or strict tail assumptions, no comparable results have been available under considerably weaker assumptions. In this paper, we derive concentration inequalities assuming only low moments for an envelope of the empirical process. These concentration inequalities are beneficial even when the envelope is much larger than the single functions under consideration.
papers  to-read  probability  empirical-processes  measure-concentration 
november 2011
« earlier      
academia active-learning adaptive-systems ai algorithms alternative-fuels analysis anarchy anthropology art Asia assholes asymptotic-normality audio avant-garde bandit-problems bands banking bayesian bayesian-learning biology biophysics biostatistics blogs book-reviews books books-noted bounded-rationality camp cartoons causality cells channel-coding chapel-hill-durham children-lit chinese Christianity climate clothing coding-theory coffee cognition cognitive-science cold-war combinatorics comics commentary communication communications comparison-of-experiments complex-systems complexity compressed-sensing computation computer-science computer-vision conferences consensus-algorithms constitution continuing-crises control-theory convex-programming cooking copyright creative-commons creeping-authoritarianism criticism crypto cthulhu culture current-events cybernetics data-analysis data-mining data-sets decentralized-control decision-making decision-theory democracy diet distributed-computing distributed-systems divergence drinking duke dynamic-programming dynamical-systems economics empirical-processes energy english Enlightenment environment epistemology epsilon-entropy ergodic-theory espresso-machine essays estimation evisceration evolution experimental-design exponential-families fantasy feedback feedback-information-theory filetype:pdf film films filtering finance financial-regulation-reform food forensics free-culture French frequentist fun-stuff gadgets game-theory geekery genomics geometric-functional-analysis graph-theory graphical-models green guitar guitar-tabs have-read health health-care heard-the-talk history history_of_cybernetics history_of_science history_of_statistics homepages humor impending_doom industrial information-geometry information-structures information-theory innovation institutions interesting internet interviews Japanese journals Judaism labor-unions language LaTeX law Le_Cam-theory learning-theory lecture-notes libertarian libertarianism liberty linear-programming linguistics literature lolcats lovecraft lyrics mac machine-learning majorization management markets markov-chains martingales mathematics matlab measure-concentration media media:document memes message-passing methodology model-selection monetary_policy monsters mp3 multiagent-systems music network-data-analysis networks neuroscience nonparametrics nutrition occult omics online-learning optimal-search optimal-transportation optimization organizations papers particle-filters people perception philosophy philosophy-of-science physics poetry polemics policy politics polymaths pop-culture prediction prediction-markets privacy probability productivity profiles progrock psychoceramics psychology quantization quantum-mechanics random-matrices rate-distortion rationality re:active_feature_selection_project re:causality_project re:distributed_decisions_project re:erasures_and_concentration_project re:knightian-uncertainty re:posterior_matching_project recipes record-labels record-stores reference reinforcement-learning religion Renaissance research restaurants retro reviews robotics robust_control robust_statistics russia russian satire sci-fi science security sensor-networks Shannon signal-processing simpsons simulation slides social-networks sociology software source-coding sparsity spectral-graph-theory speculation spirits statistical-learning statistical-physics statistics steampunk submodular-functions subversive system-identification teaching technology tentacles theremin thermodynamics to-read toulmin travel trivia uncertainty universal-prediction useful-tips via:anand via:arsyed via:arthegall via:bookslut via:cshalizi via:mreid via:nielsen via:shivak via:vaguery via:warren_ellis want-this web2.0 weird Wikipedia wisdom workflow workshops Yiddish

Copy this bookmark:



description:


tags: