Le Cam Made Simple: No-N Asymptotics
november 2011 by mraginsky
"If the log likelihood is approximately quadratic with constant Hessian, then the maximum likelihood estimator (MLE) is approximately normally distributed. No other assumptions are required. We do not need independent and identically distributed data. We do not need the law of large numbers (LLN) or the central limit theorem (CLT). We do not need sample size going to infinity or anything going to infinity.
The theory presented here is a combination of Le Cam style involving local asymptotic normality (LAN) and local asymptotic mixed normality (LAMN) and Cramér style involving derivatives and Fisher information. The main tool is convergence in law of the log likelihood function and its derivatives considered as random elements of a Polish space of continuous functions with the metric of uniform convergence on compact sets. We obtain results for both one-step-Newton estimators and Newton-iterated-to-convergence estimators."
have-read
statistics
estimation
Le_Cam-theory
asymptotic-normality
information-geometry
via:cshalizi
The theory presented here is a combination of Le Cam style involving local asymptotic normality (LAN) and local asymptotic mixed normality (LAMN) and Cramér style involving derivatives and Fisher information. The main tool is convergence in law of the log likelihood function and its derivatives considered as random elements of a Polish space of continuous functions with the metric of uniform convergence on compact sets. We obtain results for both one-step-Newton estimators and Newton-iterated-to-convergence estimators."
november 2011 by mraginsky
[1104.5246] How well can we estimate a sparse vector?
april 2011 by mraginsky
"The estimation of a sparse vector in the linear model is a fundamental problem in signal processing, statistics, and compressive sensing. This paper establishes a lower bound on the mean-squared error, which holds regardless of the sensing/design matrix being used and regardless of the estimation procedure. This lower bound very nearly matches the known upper bound one gets by taking a random projection of the sparse vector followed by an $\ell_1$ estimation procedure such as the Dantzig selector. In this sense, compressive sensing techniques cannot essentially be improved."
papers
compressed-sensing
estimation
information-theory
statistics
have-read
april 2011 by mraginsky
[1103.1861] Distinguishing and integrating aleatoric and epistemic variation in uncertainty quantification
march 2011 by mraginsky
"Much of uncertainty quantification to date has focused on determining the effect of variables modeled probabilistically, and with a known distribution, on some physical or engineering system. We develop methods to obtain information on the system when the distributions of some variables are known exactly, others are known only approximately, and perhaps others are not modeled as random variables at all. The main tool used is the duality between risk-sensitive integrals and relative entropy, and we obtain explicit bounds on standard performance measures (variances, exceedance probabilities) over families of distributions whose distance from a nominal distribution is measured by relative entropy. The evaluation of the risk-sensitive expectations is based on polynomial chaos expansions, which help keep the computational aspects tractable."
papers
probability
information-theory
re:knightian-uncertainty
have-read
has:re
march 2011 by mraginsky
[1102.3944] Fixed-length lossy compression in the finite blocklength regime
february 2011 by mraginsky
"This paper studies the minimum achievable source coding rate as a function of blocklength $n$ and tolerable distortion level $d$. Tight general achievability and converse bounds are derived that hold at arbitrary fixed blocklength. For stationary memoryless sources with separable distortion, the minimum rate achievable is shown to be closely approximated by $R(d) + \sqrt{\frac{V(d)}{n}} \Qinv{\epsilon}$, where $R(d)$ is the rate-distortion function, $V(d)$ is the {\it rate dispersion}, a characteristic of the source which measures its stochastic variability, $\Qinv{\cdot}$ is the inverse of the standard Gaussian complementary cdf, and $\epsilon$ is the probability that the distortion exceeds $d$."
papers
information-theory
rate-distortion
source-coding
have-read
february 2011 by mraginsky
[1011.3854] A probabilistic and RIPless theory of compressed sensing
november 2010 by mraginsky
"This paper introduces a simple and very general theory of compressive sensing. In this theory, the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all models - e.g. Gaussian, frequency measurements - discussed in the literature, but also provides a framework for new measurement strategies as well. We prove that if the probability distribution F obeys a simple incoherence property and an isotropy property, one can faithfully recover approximately sparse signals from a minimal number of noisy measurements. The novelty is that our recovery results do not require the restricted isometry property (RIP) - they make use of a much weaker notion - or a random model for the signal. As an example, the paper shows that a signal with s nonzero entries can be faithfully recovered from about s log n Fourier coefficients that are contaminated with noise."
papers
have-read
compressed-sensing
sparsity
geometric-functional-analysis
random-matrices
signal-processing
november 2010 by mraginsky
Algorithmic Thermodynamics « Azimuth
october 2010 by mraginsky
John Baez blogs about his paper with Mike Stay on algorithmic thermodynamics
have-read
blogs
information-theory
complexity
computation
thermodynamics
october 2010 by mraginsky
A note on exponential families of distributions
august 2010 by mraginsky
"We show that an arbitrary probability distribution can be represented in an exponential form. In physical contexts, this implies that the equilibrium distribution of any classical or quantum dynamical system is expressible in a grand canonical form." Is there anything really new here, though? ETA: No, cf. Barron and Sheu.
papers
have-read
meh
probability
exponential-families
statistical-physics
august 2010 by mraginsky
[1006.4039] Cooperative Autonomous Online Learning
june 2010 by mraginsky
Abstract: "Online learning is becoming increasingly popular for training on large datasets. However, the sequential nature of online learning requires a centralized learner to store data and update parameters. In this paper, we consider a fully decentralized setting, cooperative autonomous online learning, with a distributed data source. The learners perform learning with local parameters while periodically communicating with a small subset of neighbors to exchange information. We define the regret in terms of an implicit aggregated parameter of the learners for such a setting and prove regret bounds similar to the classical sequential online learning." Damn, looks like I've been scooped!
papers
have-read
decision-making
online-learning
distributed-systems
june 2010 by mraginsky
[1003.1377] Entropy: The Markov Ordering Approach
june 2010 by mraginsky
"The focus of this article is on entropy and Markov processes. We study the properties of functionals which are invariant with respect to monotonic transformations and analyze two invariant "additivity" properties: (i) existence of a monotonic transformation which makes the functional additive with respect to the joining of independent systems and (ii) existence of a monotonic transformation which makes the functional additive with respect to the partitioning of the space of states. All Lyapunov functionals for Markov chains which have properties (i) and (ii) are derived. We describe the most general ordering of the distribution space, with respect to which all continuous-time Markov processes are monotonic (the {\em Markov order}). The solution differs significantly from the ordering given by the inequality of entropy growth. For inference, this approach results in a convex compact set of conditionally "most random" distributions."
papers
have-read
statistics
probability
majorization
information-theory
june 2010 by mraginsky
[1002.0042] Lower bounds for the minimax risk using $f$-divergences and applications
june 2010 by mraginsky
Very nice: "A new lower bound involving $f$-divergences between the underlying probability measures is proved for the minimax risk in estimation problems. The proof just uses the convexity of the function $f$ and is extremely simple. Special cases and straightforward corollaries of our bound include well known inequalities for establishing minimax lower bounds such as Fano's inequality, Pinsker's inequality and inequalities based on global entropy conditions. Two applications are provided: a new minimax lower bound for the reconstruction of convex bodies from noisy support function measurements and a different proof of a recent minimax lower bound for the estimation of a covariance matrix."
papers
have-read
information-theory
statistics
heard-the-talk
june 2010 by mraginsky
Bayesian Learning in Social Networks
april 2010 by mraginsky
Daron Acemoglu, Munther A. Dahleh, Ilan Lobel, Asuman Ozdaglar; NBER Working Paper No. 14040
papers
have-read
social-networks
bayesian-learning
game-theory
decision-making
distributed-systems
april 2010 by mraginsky
[0911.0054] Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity (Sham M. Kakade, Ohad Shamir, Karthik Sridharan, Ambuj Tewari)
november 2009 by mraginsky
Abstract: "The versatility of exponential families, along with their attendant convexity properties, make them a popular and effective statistical model. A central issue is learning these models in high-dimensions, such as when there is some sparsity pattern of the optimal parameter. This work characterizes a certain strong convexity property of general exponential families, which allow their generalization ability to be quantified. In particular, we show how this property can be used to analyze generic exponential families under L_1 regularization."
papers
have-read
statistics
learning-theory
machine-learning
exponential-families
graphical-models
convex-programming
sparsity
november 2009 by mraginsky
[0910.4627] Self-concordant analysis for logistic regression
october 2009 by mraginsky
Francis Bach (INRIA Rocquencourt)
papers
have-read
statistics
learning-theory
optimization
algorithms
convex-programming
october 2009 by mraginsky
[0909.2408] Coordination Capacity
september 2009 by mraginsky
Authors: Paul Cuff (Stanford University), Haim Permuter (Stanford University), Thomas Cover (Stanford University)
papers
have-read
information-theory
september 2009 by mraginsky
[0901.0633] Optimal control as a graphical model inference problem
september 2009 by mraginsky
B. Kappen, V. Gomez, M. Opper
papers
have-read
graphical-models
control-theory
optimization
september 2009 by mraginsky
[0908.3108] Nonparametric estimation by convex programming
august 2009 by mraginsky
Anatoli B. Juditsky, Arkadi S. Nemirovski
papers
have-read
statistics
optimization
august 2009 by mraginsky
[cs/0610145v3] A Simple Converse of Burnashev's Reliability
may 2009 by mraginsky
Peter Berlin, Baris Nakiboglu, Bixio Rimoldi, Emre Telatar
papers
have-read
information-theory
communications
feedback-information-theory
martingales
may 2009 by mraginsky
[0905.1990] Sparse Linear Representation
may 2009 by mraginsky
Halyun Jeong, Young-Han Kim
papers
have-read
information-theory
sparsity
compressed-sensing
signal-processing
may 2009 by mraginsky
Machine Learning (Theory) » Computability in Artificial Intelligence
may 2009 by mraginsky
"Let me show by analogy why limiting research to computational questions is bad for any field. Except in computer science, computational aspects play little role in the development of fundamental theories: Consider e.g. set theory with axiom of choice, foundations of logic, exact/full minimax for zero-sum games, quantum (field) theory, string theory, … Indeed, at least in physics, every new fundamental theory seems to be less computable than previous ones."
blogs
have-read
science
complexity
computer-science
computation
philosophy
epistemology
AI
learning-theory
may 2009 by mraginsky
Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods
may 2009 by mraginsky
Holger Höfling, Robert Tibshirani; JMLR 10(Apr):883--906, 2009 (via cshalizi)
papers
have-read
statistics
machine-learning
graphical-models
optimization
algorithms
may 2009 by mraginsky
[0904.2311] Source Coding with a Side Information "Vending Machine"
april 2009 by mraginsky
Tsachy Weissman and Haim H. Permuter
papers
have-read
information-theory
control-theory
april 2009 by mraginsky
[0711.1460] On the Thermodynamic Temperature of a General Distribution
march 2009 by mraginsky
(by Krishna R. Narayanan, Arun R. Srinivasa) Even though one of the references is B. Roy Frieden's execrable attempt to "derive" physics from Fisher's information, the information theory in this paper is definitely interesting.
papers
have-read
statistics
statistical-physics
information-theory
march 2009 by mraginsky
[0903.0649] The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
march 2009 by mraginsky
Han Liu, John Lafferty, Larry Wasserman
have-read
statistics
network-data-analysis
machine-learning
graph-theory
march 2009 by mraginsky
[0809.2085] Clustered Multi-Task Learning: A Convex Formulation
march 2009 by mraginsky
Laurent Jacob, Francis Bach (INRIA Rocquencourt), Jean-Philippe Vert
papers
have-read
machine-learning
algorithms
optimization
march 2009 by mraginsky
[0902.1284] Multi-Label Prediction via Compressed Sensing
february 2009 by mraginsky
by Daniel Hsu, Sham Kakade, John Langford and Tong Zhang
papers
have-read
machine-learning
compressed-sensing
february 2009 by mraginsky
[0812.4889] Statistical Physics of Signal Estimation in Gaussian Noise: Theory and Examples of Phase Transitions
january 2009 by mraginsky
by Neri Merhav, Dongning Guo, Shlomo Shamai
papers
have-read
statistical-physics
information-theory
january 2009 by mraginsky
[0712.4273] Online EM Algorithm for Latent Data Models
august 2008 by mraginsky
Olivier Cappe and Eric Moulines
papers
have-read
machine-learning
optimization
statistics
algorithms
august 2008 by mraginsky
[0804.0551] Statistical performance of support vector machines
april 2008 by mraginsky
Gilles Blanchard, Olivier Bousquet, Pascal Massart
papers
have-read
machine-learning
statistics
learning-theory
april 2008 by mraginsky
related tags
active-learning ⊕ adaptive-systems ⊕ AI ⊕ algorithms ⊕ asymptotic-normality ⊕ bayesian ⊕ bayesian-learning ⊕ biology ⊕ blogs ⊕ books ⊕ coding-theory ⊕ communication ⊕ communications ⊕ complexity ⊕ compressed-sensing ⊕ computation ⊕ computer-science ⊕ control-theory ⊕ convex-programming ⊕ criticism ⊕ culture ⊕ decentralized-control ⊕ decision-making ⊕ distributed-computing ⊕ distributed-systems ⊕ dynamic-programming ⊕ dynamical-systems ⊕ epistemology ⊕ estimation ⊕ evolution ⊕ experimental-design ⊕ exponential-families ⊕ fantasy ⊕ feedback ⊕ feedback-information-theory ⊕ filetype:pdf ⊕ film ⊕ filtering ⊕ frequentist ⊕ game-theory ⊕ geometric-functional-analysis ⊕ graph-theory ⊕ graphical-models ⊕ has:re ⊕ has:via ⊕ have-read ⊖ heard-the-talk ⊕ history_of_statistics ⊕ information-geometry ⊕ information-theory ⊕ interesting ⊕ learning-theory ⊕ Le_Cam-theory ⊕ machine-learning ⊕ majorization ⊕ markets ⊕ markov-chains ⊕ martingales ⊕ mathematics ⊕ measure-concentration ⊕ media:document ⊕ meh ⊕ network-data-analysis ⊕ online-learning ⊕ optimization ⊕ papers ⊕ people ⊕ philosophy ⊕ planning ⊕ policy ⊕ politics ⊕ probability ⊕ random-matrices ⊕ rate-distortion ⊕ re:distributed_decisions_project ⊕ re:knightian-uncertainty ⊕ re:typical_sequences_project ⊕ reviews ⊕ robust_statistics ⊕ sci-fi ⊕ science ⊕ sensor-networks ⊕ signal-processing ⊕ social-networks ⊕ socialism ⊕ source-coding ⊕ sparsity ⊕ statistical-learning ⊕ statistical-physics ⊕ statistics ⊕ stochastic-search ⊕ submodular-functions ⊕ thermodynamics ⊕ via:arthegall ⊕ via:cshalizi ⊕Copy this bookmark: