mraginsky + probability   90

[1205.1099] From Knothe's rearrangement to Brenier's optimal transport map
"The Brenier optimal map and Knothe-Rosenblatt rearrangement are two instances of a transport map, that is to say a map sending one measure onto another. The main interest of the former is that it solves the Monge-Kantorovich optimal transport problem, while the latter is very easy to compute, being given by an explicit formula. A few years ago, Carlier, Galichon, and Santambrogio showed that the Knothe rearrangement could be seen as the limit of the Brenier map when the quadratic cost degenerates. In this paper, we prove that on the torus (to avoid boundary issues), when all the data are smooth, the evolution is also smooth, and is entirely determined by a PDE for the Kantorovich potential (which determines the map), with a subtle initial condition. The proof requires the use of the Nash-Moser inverse function theorem. This result generalizes the ode discovered by Carlier, Galichon, and Santambrogio when one measure is uniform and the other is discrete, and could pave to way to new numerical methods for optimal transportation."
papers  to-read  optimal-transportation  probability  PDEs 
23 days ago by mraginsky
[1205.1005] Some Refinements of Large Deviation Tail Probabilities
"We study tail probabilities via some Gaussian approximations. Our results make refinements to large deviation theory. The proof builds on classical results by Bahadur and Rao. Binomial distributions and their tail probabilities are discussed in more detail."
papers  to-read  statistics  probability  large-deviations  measure-concentration  statistical-physics 
24 days ago by mraginsky
Fluctuations and response out of equilibrium (C. Maes)
"We discuss some recently visited positions towards dealing with nonequilibria from the
mathematical point of view of Markov networks."
papers  to-read  statistical-physics  thermodynamics  probability  information-theory 
4 weeks ago by mraginsky
[1111.2687] Ricci curvature of finite Markov chains via convexity of the entropy
We define and study a new notion of Ricci curvature that applies to Markov chains on discrete spaces. This notion relies on geodesic convexity of the entropy and is analogous to the one introduced by Lott, Sturm, and Villani for geodesic measure spaces. In order to apply to the discrete setting, the role of the Wasserstein metric is taken over by a different metric, having the property that continuous time Markov chains are gradient flows of the entropy.
Using this notion of Ricci curvature we prove discrete analogues of fundamental results by Bakry--Emery and Otto--Villani. Furthermore we show that Ricci curvature bounds are preserved under tensorisation. As a special case we obtain the sharp Ricci curvature lower bound for the discrete hypercube.
papers  to-read  markov-chains  probability  measure-concentration 
5 weeks ago by mraginsky
[0704.0704] Entropic Measure and Wasserstein Diffusion
We construct a new random probability measure on the sphere and on the unit interval which in both cases has a Gibbs structure with the relative entropy functional as Hamiltonian. It satisfies a quasi-invariance formula with respect to the action of smooth diffeomorphism of the sphere and the interval respectively. The associated integration by parts formula is used to construct two classes of diffusion processes on probability measures (on the sphere or the unit interval) by Dirichlet form methods. The first one is closely related to Malliavin's Brownian motion on the homeomorphism group. The second one is a probability valued stochastic perturbation of the heat flow, whose intrinsic metric is the quadratic Wasserstein distance. It may be regarded as the canonical diffusion process on the Wasserstein space.
papers  to-read  optimal-transportation  information-theory  probability  re:FPF_project 
5 weeks ago by mraginsky
Majorizing codes and measures (Andreas Maurer)
An information theoretical interpretation of majorizing and minorizing
measures is given. The expression logarithmic in the reciprocal of the
measure of a ball is replaced by the number of bits needed to achieve
desired precision in some convergent code. We also give a local version of
the majorizing bound.
papers  to-read  measure-concentration  probability  information-theory 
9 weeks ago by mraginsky
[0907.4491] Bounding relative entropy by the relative entropy of local specifications in product spaces
For a class of density functions $q^n(x^n)$ on $Bbb R^n$ we prove an inequality between relative entropy and the sum of average conditional relative entropies of the following form: For any density function $p^n(x^n)$ on $Bbb R^n$, $D(p^n||q^n)leq Const. sum_{i=1}^n Bbb E D(p_i(cdot|Y_1,..., Y_{i-1},Y_{i+1},..., Y_n) || Q_i(cdot|Y_1,..., Y_{i-1},Y_{i+1},..., Y_n)),$ where $p_i(cdot|y_1,..., y_{i-1},y_{i+1},..., y_n)$ and $Q_i(cdot|x_1,..., x_{i-1},x_{i+1},..., x_n)$ denote the local specifications for $p^n$ resp. $q^n$, i.e., the conditional density functions of the $i$'th coordinate, given the other coordinates. The constant depends on the properties of the local specifications of $q^n$. The above inequality implies a logarithmic Sobolev inequality for $q^n$. We get an explicit lower bound for the logarithmic Sobolev constant of $q^n$ under the assumptions that: (i) the local specifications of $q^n$ satisfy logarithmic Sobolev inequalities with constants $rho_i$, and (ii) they also satisfy some condition expressing that the mixed partial derivatives of the Hamiltonian of $q^n$ are not too large relative to the logarithmic Sobolev constants $rho_i$. Condition (ii) may be weaker than that used in Otto and Reznikoff's recent paper on the estimation of logarithmic Sobolev constants of spin systems.
papers  to-read  measure-concentration  information-theory  probability  re:erasures_and_concentration_project 
9 weeks ago by mraginsky
[1104.1303] A new characterization of Talagrand's transport-entropy inequalities and applications
Wow! "We show that Talagrand's transport inequality is equivalent to a restricted logarithmic Sobolev inequality. This result clarifies the links between these two important functional inequalities. As an application, we give the first proof of the fact that Talagrand's inequality is stable under bounded perturbations."
papers  to-read  measure-concentration  optimal-transportation  probability  re:erasures_and_concentration_project 
12 weeks ago by mraginsky
Concentration inequalities for functions of independent variables - Maurer - 2005 - Random Structures & Algorithms - Wiley Online Library
Following the entropy method this paper presents general concentration inequalities, which can be applied to combinatorial optimization and empirical processes. The inequalities give improved concentration results for optimal traveling salesmen tours, Steiner trees, and the eigenvalues of random symmetric matrices.
papers  to-read  measure-concentration  probability 
january 2012 by mraginsky
Dominated concentration : Statistics & Probability Letters | ScienceDirect.com
The concentration properties of one random variable may be governed by the values of another random variable which is concentrated and more easily analyzed. We present a general concentration inequality to handle such cases and apply it to the eigenvalues of the Gram matrix for a sample of independent vectors distributed in the unit ball of a Hilbert space. For large samples the deviation of the eigenvalues from their mean is shown to scale with the largest eigenvalue.
papers  to-read  measure-concentration  probability 
january 2012 by mraginsky
[1111.1977] On Refined Versions of the Azuma-Hoeffding Inequality with Applications in Information Theory
This paper derives some refined versions of the Azuma-Hoeffding inequality for discrete-parameter martingales with uniformly bounded jumps, and it considers some of their potential applications in information theory and related topics. The first part of this paper derives these refined inequalities, followed by a discussion on their relations to some classical results in probability theory. It also considers a geometric interpretation of some of these inequalities, providing an insight on the inter-connections between them. The second part exemplifies the use of these refined inequalities in the context of hypothesis testing, information theory, and communication. The paper is concluded with a discussion on some directions for further research. This work is meant to stimulate the use of some refined versions of the Azuma-Hoeffding inequality in information-theoretic aspects.
papers  to-read  information-theory  measure-concentration  martingales  markov-chains  probability 
january 2012 by mraginsky
[1201.0559] Chernoff-Hoeffding Bounds for Markov Chains: Generalized and Simplified
We prove the first Chernoff-Hoeffding bounds for general nonreversible finite-state Markov chains based on the standard L_1 (variation distance) mixing-time of the chain. Specifically, consider an ergodic Markov chain M and a weight function f: [n] -> [0,1] on the state space [n] of M with mean mu = E_{v <- pi}[f(v)], where pi is the stationary distribution of M. A t-step random walk (v_1,...,v_t) on M starting from the stationary distribution pi has expected total weight E[X] = mu t, where X = sum_{i=1}^t f(v_i). Let T be the L_1 mixing-time of M. We show that the probability of X deviating from its mean by a multiplicative factor of delta, i.e., Pr [ |X - mu t| >= delta mu t ], is at most exp(-Omega(delta^2 mu t / T)) for 0 <= delta <= 1, and exp(-Omega(delta mu t / T)) for delta > 1. In fact, the bounds hold even if the weight functions f_i's for i in [t] are distinct, provided that all of them have the same mean mu.
We also obtain a simplified proof for the Chernoff-Hoeffding bounds based on the spectral expansion lambda of M, which is the square root of the second largest eigenvalue (in absolute value) of M tilde{M}, where tilde{M} is the time-reversal Markov chain of M. We show that the probability Pr [ |X - mu t| >= delta mu t ] is at most exp(-Omega(delta^2 (1-lambda) mu t)) for 0 <= delta <= 1, and exp(-Omega(delta (1-lambda) mu t)) for delta > 1.
Both of our results extend to continuous-time Markov chains, and to the case where the walk starts from an arbitrary distribution x, at a price of a multiplicative factor depending on the distribution x in the concentration bounds
to-read  papers  markov-chains  measure-concentration  probability 
january 2012 by mraginsky
Concentration Inequalities for the Missing Mass and for Histogram Rule Error (McAllester and Ortiz)
This paper gives distribution-free concentration inequalities for the missing mass and the error rate of histogram rules. Negative association methods can be used to reduce these concentration problems to concentration questions about independent sums. Although the sums are independent, they are highly heterogeneous. Such highly heterogeneous independent sums cannot be analyzed using standard concentration inequalities such as Hoeffding's inequality, the Angluin-Valiant bound, Bernstein's inequality, Bennett's inequality, or McDiarmid's theorem. The concentration inequality for histogram rule error is motivated by the desire to construct a new class of bounds on the generalization error of decision trees.
papers  to-read  learning-theory  measure-concentration  information-theory  probability  re:erasures_and_concentration_project 
december 2011 by mraginsky
[1111.2622] Optimal re-centering bounds, with applications to Rosenthal-type concentration of measure inequalities
For any nonnegative Borel-measurable function f such that f(x)=0 if and only if x=0, the best constant c_f in the inequality E f(X-E X) leq c_f E f(X) for all random variables X with a finite mean is obtained. Properties of the constant c_f in the case when f=|.|^p are studied. Applications to concentration of measure in the form of Rosenthal-type bounds on the moments of separately Lipschitz functions on product spaces are given.
papers  to-read  probability  measure-concentration 
november 2011 by mraginsky
[1111.3486] New Concentration Inequalities for Suprema of Empirical Processes
While effective concentration inequalities for suprema of empirical processes exist under boundedness or strict tail assumptions, no comparable results have been available under considerably weaker assumptions. In this paper, we derive concentration inequalities assuming only low moments for an envelope of the empirical process. These concentration inequalities are beneficial even when the envelope is much larger than the single functions under consideration.
papers  to-read  probability  empirical-processes  measure-concentration 
november 2011 by mraginsky
Thermodynamics and Concentration
"We show that the thermal subadditivity of entropy provides a common basis to derive a strong form of the bounded dierence inequality and related results as well as more recent inequalities applicable to convex Lipschitz functions, random symmetric matrices, shortest travelling sales-men paths and weakly self-bounding functions. We also give two new concentration inequalities."
papers  to-read  measure-concentration  information-theory  probability  statistical-physics  statistical-learning  via:shivak  filetype:pdf  media:document 
march 2011 by mraginsky
[1103.3188] "Exact" deviations in Wasserstein distance for empirical and occupation measures
"We study the problem of so-called "exact" or non-asymptotic deviations between a reference measure $\mu$ and its empirical version $L_n$, in the $p$-Wasserstein metric, $1 \leq p \leq 2$, under the standing assumption that $\mu$ satisfies a transport-entropy inequality. This work is a generalization of an article by F.Bolley, A.Guillin and C.Villani, where the case of measures with support in $\R^d$ was studied. Our methods are based on concentration inequalities and extend to the general setting of measures on a Polish space. Deviation bounds for the occupation measure of a contracting Markov chain in $W_1$ distance are also given. Throughout the text, several examples are worked out, including the cases of Gaussian measures on separable Banach spaces, and laws of diffusion processes."
papers  to-read  probability  measure-concentration  empirical-processes 
march 2011 by mraginsky
[1103.1861] Distinguishing and integrating aleatoric and epistemic variation in uncertainty quantification
"Much of uncertainty quantification to date has focused on determining the effect of variables modeled probabilistically, and with a known distribution, on some physical or engineering system. We develop methods to obtain information on the system when the distributions of some variables are known exactly, others are known only approximately, and perhaps others are not modeled as random variables at all. The main tool used is the duality between risk-sensitive integrals and relative entropy, and we obtain explicit bounds on standard performance measures (variances, exceedance probabilities) over families of distributions whose distance from a nominal distribution is measured by relative entropy. The evaluation of the risk-sensitive expectations is based on polynomial chaos expansions, which help keep the computational aspects tractable."
papers  probability  information-theory  re:knightian-uncertainty  have-read  has:re 
march 2011 by mraginsky
[1012.2643] Geometry of maximum likelihood estimation in Gaussian graphical models
"We study maximum likelihood estimation in Gaussian graphical models from a geometric point of view. An algebraic elimination criterion allows us to find exact lower bounds on the number of observations needed to ensure that the maximum likelihood estimator exists with probability one. This is applied to bipartite graphs, grids and colored graphs. We also study the ML degree, and we present the first instance of a graph for which the MLE exists with probability one even when the number of observations equals the treewidth."
papers  to-read  statistics  graphical-models  estimation  probability  signal-processing 
january 2011 by mraginsky
[1012.5687] Coupling and Applications
"This paper presents a self-contained account for coupling arguments and applications in the context of Markov processes. We first use coupling to describe the transport problem, which leads to the concepts of optimal coupling and probability distance (or transportation-cost), then introduce applications of coupling to the study of ergodicity, Liouville theorem, convergence rate, gradient estimate, and Harnack inequality for Markov processes."
papers  to-read  probability  ergodic-theory 
january 2011 by mraginsky
[1012.5457] Concentration of the information in data with log-concave distributions
A concentration property of the functional $-\log f(X)$ is demonstrated, when a random vector $X$ has a log-concave density $f$ on $\R^n$. This concentration property implies in particular an extension of the Shannon-McMillan-Breiman strong ergodic theorem to the class of discrete-time stochastic processes with log-concave marginals.
papers  to-read  information-theory  probability  measure-concentration 
january 2011 by mraginsky
Observer Mechanics: A Formal Theory of Perception (Bennett, Hoffman, Prakash)
"Observer Mechanics is an inquiry into the subject of perception. It suggests an approach to the study of perception that attempts to be both rigorous and general. A central thesis of Observer Mechanics is that every perceptual capacity (e.g., stereovision, auditory localization, sentence parsing, haptic recognition, and so on) can be described as an instance of a single formal structure: viz., an "observer.""
books  to-read  complexity  computation  perception  dynamical-systems  probability  multiagent-systems  cognitive-science  cybernetics 
january 2011 by mraginsky
[1011.6451] Informational derivation of Quantum Theory
"Quantum theory can be derived from purely informational principles. Five elementary axioms-causality, perfect distinguishability, ideal compression, local distinguishability, and pure conditioning-define a broad class of theories of information-processing that can be regarded as a standard. One postulate-purification-singles out quantum theory within this class. The main structures of quantum theory, such as the representation of mixed states as convex combinations of perfectly distinguishable pure states, are derived directly from the principles without using the Hilbert space framework."
papers  to-read  quantum-mechanics  information-theory  probability  philosophy-of-science 
december 2010 by mraginsky
Pollard@Paris2001
David Pollard's lecture notes on asymptotic methods in statistical decision theory
statistics  probability  lecture-notes 
august 2010 by mraginsky
[1008.2697] A CLT for Empirical Processes Involving Time Dependent Data
"For stochastic processes $\{X_t: t \in E\}$, we establish sufficient conditions for the empirical process based on $\{ I_{X_t \le y} - P(X_t \le y): t \in E, y \in \mathbb{R}\}$ to satisfy the CLT uniformly in $ t \in E, y \in \mathbb{R}$. Corollaries of our main result include examples of classical processes where the CLT holds, and we also show that it fails for Brownian motion tied down at zero and $E= [0,1]$."
papers  to-read  probability  empirical-processes  dependent-data 
august 2010 by mraginsky
A note on exponential families of distributions
"We show that an arbitrary probability distribution can be represented in an exponential form. In physical contexts, this implies that the equilibrium distribution of any classical or quantum dynamical system is expressible in a grand canonical form." Is there anything really new here, though? ETA: No, cf. Barron and Sheu.
papers  have-read  meh  probability  exponential-families  statistical-physics 
august 2010 by mraginsky
[1007.4037] Uniform Approximation and Bracketing Properties of VC classes
"We show that the sets in a family with finite VC dimension can be uniformly approximated within a given error by a finite partition. Immediate corollaries include the fact that VC classes have finite bracketing numbers, satisfy uniform laws of averages under strong dependence, and exhibit uniform mixing. Our results are based on recent work concerning uniform laws of averages for VC classes under ergodic sampling."
papers  to-read  probability  ergodic-theory 
july 2010 by mraginsky
David Blackwell - Statistical Modeling, Causal Inference, and Social Science
A quote from Blackwell himself: "Basically, I'm not interested in doing research and I never have been, I'm interested in understanding, which is quite a different thing. And often to understand something you have to work it out yourself because no one else has done it."
people  statistics  research  probability  decision-making 
july 2010 by mraginsky
[1003.1377] Entropy: The Markov Ordering Approach
"The focus of this article is on entropy and Markov processes. We study the properties of functionals which are invariant with respect to monotonic transformations and analyze two invariant "additivity" properties: (i) existence of a monotonic transformation which makes the functional additive with respect to the joining of independent systems and (ii) existence of a monotonic transformation which makes the functional additive with respect to the partitioning of the space of states. All Lyapunov functionals for Markov chains which have properties (i) and (ii) are derived. We describe the most general ordering of the distribution space, with respect to which all continuous-time Markov processes are monotonic (the {\em Markov order}). The solution differs significantly from the ordering given by the inequality of entropy growth. For inference, this approach results in a convex compact set of conditionally "most random" distributions."
papers  have-read  statistics  probability  majorization  information-theory 
june 2010 by mraginsky
[1004.3484] How close is the sample covariance matrix to the actual covariance matrix?
Abstract: "Given a distribution in R^n, a classical estimator of its covariance matrix is the sample covariance matrix obtained from a sample of N independent points. What is the optimal sample size N = N(n) that guarantees estimation with a fixed accuracy in the operator norm? Suppose the distribution is supported in a centered Euclidean ball of radius \sqrt{n}. We conjecture that the optimal sample size is N = O(n) for all distributions with finite fourth moment, and we prove this up to an iterated logarithmic factor. This problem is motivated by the optimal theorem of Rudelson which states that N = O(n \log n) for distributions with finite second moment, and a recent result of Adamczak, Litvak, Pajor and Tomczak-Jaegermann which guarantees that N = O(n) for sub-exponential distributions." -- Implications for kernel PCA and similar methods?
papers  to-read  statistics  random-matrices  geometric-functional-analysis  probability 
april 2010 by mraginsky
[1004.0557] Applications of Lindeberg Principle in Communications and Statistical Learning
"We use a generalization of the Lindeberg principle developed by Sourav Chatterjee to prove universality properties for various problems in communications, statistical learning and random matrix theory. We also show that these systems can be viewed as the limiting case of a properly defined sparse system. The latter result is useful when the sparse systems are easier to analyze than their dense counterparts. The list of problems we consider is by no means exhaustive. We believe that the ideas can be used in many other problems relevant for information theory."
paper  to-read  random-matrices  probability  machine-learning  signal-processing  information-theory  geometric-functional-analysis 
april 2010 by mraginsky
[0809.3066] Some Notes on Standard Borel and Related Spaces (Chris Preston)
"These notes give an elementary approach to parts of the theory of standard Borel and analytic spaces." Useful stuff, seeing as how the machinery of analytic sets and standard Borel spaces is used to justify the manipulations with suprema over uncountable sets in the theory of empirical processes, as well as the reduction of partially observed MDPs in general spaces to fully observed MDPs on the space of hyperstates.
papers  to-read  mathematics  analysis  probability 
january 2010 by mraginsky
[0910.3603] A complete solution to Blackwell's unique ergodicity problem for hidden Markov chains
Sounds exciting: "We develop necessary and sufficient conditions for uniqueness of the invariant measure of the filtering process associated to an ergodic hidden Markov model in a finite or countable state space. These results provide a complete solution to a problem posed by Blackwell (1957), and subsume earlier partial results due to Kaijser, Kochman and Reeds. The proofs of our main results are based on the stability theory of nonlinear filters."
papers  to-read  filtering  probability  estimation  ergodic-theory  dynamical-systems 
october 2009 by mraginsky
« earlier      

related tags

AI  algorithms  analysis  bayesian  blogs  book-reviews  books  causality  cognitive-science  complex-systems  complexity  compressed-sensing  computation  computer-science  control-theory  cybernetics  decision-making  dependent-data  dynamical-systems  economics  empirical-processes  ergodic-theory  estimation  evolution  exponential-families  filetype:pdf  filtering  finance  game-theory  geometric-functional-analysis  graph-theory  graphical-models  has:re  has:via  have-read  heard-the-talk  homepages  information-theory  large-deviations  learning-theory  lecture-notes  machine-learning  majorization  markov-chains  martingales  mathematics  measure-concentration  media:document  meh  multiagent-systems  network-data-analysis  networks  neuroscience  optimal-transportation  optimization  paper  papers  PDEs  people  perception  philosophy  philosophy-of-science  polemics  probability  quantum-mechanics  random-graphs  random-matrices  re:erasures_and_concentration_project  re:FPF_project  re:knightian-uncertainty  reference  research  reviews  RIP  russian  science  signal-processing  statistical-learning  statistical-physics  statistics  teaching  thermodynamics  to-read  via:cshalizi  via:shivak  want-this 

Copy this bookmark:



description:


tags: