A Geometric Approach to Sample Compression
4 weeks ago by shivak
"Two promising ways forward are: embedding maximal classes into maximum classes with at most a polynomial increase to VC dimension, and compression via operating on geometric representations. This paper presents positive results on the latter approach and a first negative result on the former, through a systematic investigation of finite maximum classes."
learning_theory
geometry
papers
to_read
4 weeks ago by shivak
A new look at shifting regret
february 2012 by shivak
"Here we show, with a common, general, and simpler analysis, that weight sharing in fact achieves much more than what it was designed for. We use it to simultaneously prove new shifting regret bounds for online convex optimization on the simplex in terms of the total variation distance as well as new bounds for the related setting of adaptive regret. Finally, we exhibit the first logarithmic shifting bounds for exp-concave loss functions on the simplex."
online_learning
papers
to_read
february 2012 by shivak
High-Confidence Predictions under Adversarial Uncertainty
january 2012 by shivak
ITCS 2011 best paper
online_learning
papers
to_read
january 2012 by shivak
Sparse Algorithms are not Stable: A No-free-lunch Theorem
january 2012 by shivak
"We show that these two properties are fundamentally at odds with each other: a sparse algorithm cannot be stable and vice versa."
sparsity
sensitivity
papers
to_read
january 2012 by shivak
Computing without memory
november 2011 by shivak
Cute! "In this paper, we generalise the famous algorithm for swapping the contents of two variables without using a buffer. We introduce a novel combinatorial framework for procedural programming languages, where programs are only allowed to update one variable at a time. We first consider programs which do not have any additional memory. We prove that any function of all the variables can be computed this way in a number of updates which grows linearly with the number of variables. Similarly, any linear function can be computed using a linear number of linear instructions. We then derive the exact number of instructions required to compute any manipulation of variables. Second, we show that allowing programs to use additional memory is also incorporated in our framework. We quantify the gains obtained by using additional memory. This leads to shorter programs and allows to use only binary instructions, which is not sufficient in general when no memory is used."
memory
papers
to_read
november 2011 by shivak
Dimension reduction by random hyperplane tesselations
november 2011 by shivak
"Given a subset K of the unit Euclidean sphere, we estimate the minimal number m = m(K) of hyperplanes that generate a uniform tessellation of K, in the sense that the fraction of the hyperplanes separating any pair x,y in K is nearly proportional to the Euclidean distance between x and y. Random hyperplanes prove to be almost ideal for this problem; they achieve the almost optimal bound m = O(w(K)^2) where w(K) is the Gaussian mean width of K."
tesselations
dimension_reduction
embeddings
papers
to_read
november 2011 by shivak
The Bernstein-Orlicz norm and deviation inequalities
november 2011 by shivak
"We introduce two new concepts designed for the study of empirical processes. First, we introduce a new Orlicz norm which we call the Bernstein-Orlicz norm. This new norm interpolates sub-Gaussian and sub-exponential tail behavior. In particular, we show how this norm can be used to simplify the derivation of deviation inequalities for suprema of collections of random variables. Secondly, we introduce chaining and generic chaining along a tree. These simplify the well-known concepts of chaining and generic chaining. The supremum of the empirical process is then studied as a special case. We show that chaining along a tree can be done using entropy with bracketing. Finally, we establish a deviation inequality for the empirical process for the unbounded case."
concentration_of_measure
deviation_inequalities
empirical_processes
generic_chaining
functional_analysis
papers
to_read
november 2011 by shivak
Improved Smoothed Analysis of Multiobjective Optimization
november 2011 by shivak
Slightly tighter analysis of Moitra & O'Donnell. More importantly: bounds higher moments.
combinatorial_optimization
smoothed_analysis
papers
to_read
november 2011 by shivak
A Variant of Azuma's Inequality for Martingales with Subgaussian Tail
october 2011 by shivak
"We present a variant of Azuma's concentration inequality for martingales, in which the standard boundedness requirement is replaced by the milder requirement of a subgaussian tail."
concentration_of_measure
martingales
papers
to_read
october 2011 by shivak
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
october 2011 by shivak
To attain O(1/T) convergence for non-smooth problems, SGD just needs a little tweak: an averaging window.
optimization
convexity
papers
to_read
october 2011 by shivak
Inner Regularization of Log-Concave Measures and Small-Ball Estimates
august 2011 by shivak
To deduce anticoncentration properties about X, use their "inner thickening" algorithm to produce Y, and deduce anticoncentration properties about that instead.
concentration_of_measure
anticoncentration
convexity
papers
to_read
august 2011 by shivak
Sparsity regret bounds for individual sequences in online linear regression
july 2011 by shivak
Online equivalent to sparsity oracle inequalities.
sparsity
online_learning
papers
to_read
july 2011 by shivak
On the Universality of Online Mirror Descent
july 2011 by shivak
Lots of typos right now.
optimization
online_learning
learning_theory
to_read
papers
july 2011 by shivak
Efficient Online Learning via Randomized Rounding
june 2011 by shivak
"Most online algorithms used in machine learning today are based on variants of mirror descent or follow-the-leader. In this paper, we present an online algorithm based on a completely different approach, which combines "random playout" and randomized rounding of loss subgradients. As an application of our approach, we provide the first computationally efficient online algorithm for collaborative filtering with norm-constrained matrices. As a second application, we solve an open question linking batch learning and transductive online learning."
online_learning
rounding
papers
to_read
june 2011 by shivak
Dimension-free tail inequalities for sums of random matrices
april 2011 by shivak
"We derive exponential tail inequalities for sums of random matrices with no dependence on the explicit matrix dimensions. These are similar to the matrix versions of the Chernoff bound and Bernstein inequality except with the explicit matrix dimensions replaced by a trace quantity that can be small even when the dimension is large or infinite. Some applications to principal component analysis and approximate matrix multiplication are given to illustrate the utility of the new bounds. "
concentration_of_measure
random_structures
papers
to_read
april 2011 by shivak
The Communication Complexity of Gap Hamming Distance
april 2011 by shivak
Much shorter proof based on interplay between concentration and anticoncentration.
communication_complexity
lower_bounds
to_read
april 2011 by shivak
Less Regret via Online Conditioning
january 2011 by shivak
"We analyze and evaluate an online gradient descent algorithm with adaptive per-coordinate adjustment of learning rates. Our algorithm can be thought of as an online version of batch gradient descent with a diagonal preconditioner. This approach leads to regret bounds that are stronger than those of standard online gradient descent for general online convex optimization problems."
online_learning
preconditioners
papers
to_read
january 2011 by shivak
The Sparsity Gap: Uncertainty Principles Proportional to Dimension
january 2011 by shivak
"In an incoherent dictionary, most signals that admit a sparse representation admit a unique sparse representation. In other words, there is no way to express the signal without using strictly more atoms. This work demonstrates that sparse signals typically enjoy a higher privilege: each nonoptimal representation of the signal requires far more atoms than the sparsest representation—unless it contains many of the same atoms as the sparsest representation."
sparsity
papers
to_read
filetype:pdf
media:document
january 2011 by shivak
A local maximal inequality under uniform entropy
december 2010 by shivak
"We derive an upper bound for the mean of the supremum of the empirical process indexed by a class of functions that are known to have variance bounded by a small constant $\delta$. The bound is expressed in the uniform entropy integral of the class at $\delta$."
stochastic_process_suprema
papers
to_read
december 2010 by shivak
Bounds on the suprema of Gaussian processes, and omega results for the sum of a random multiplicative function
december 2010 by shivak
"In this paper we develop an alternative approach to lower bounding the upper tail probability [P(supt∈T Z(t) > u)]. The ingredients are an initial conditioning step, followed by a comparison (in the sense of the method of comparison) with a “model” Gaussian process that can be explicitly analysed. The resulting bounds are clean and can be non-trivial for moderately sized u, which is very important in our two applications."
stochastic_process_suprema
lower_bounds
papers
to_read
december 2010 by shivak
An elementary proof of anti-concentration of polynomials in Gaussian variables
november 2010 by shivak
"The important work in this area is by Carbery and Wright, who gave a tight bound for anti-concentration of polynomials in normal variables. However, the proof of their result is quite complex. We give a weaker anti-concentration result which has an elementary proof, based on some convexity arguments, simple analysis and induction on the degree. Moreover, our proof technique is robust and extends to other distributions."
anticoncentration
papers
to_read
november 2010 by shivak
Concentration of empirical distribution functions with applications to non-i.i.d. models
november 2010 by shivak
"The concentration of empirical measures is studied for dependent data, whose joint distribution satisfies Poincaré-type or logarithmic Sobolev inequalities. The general concentration results are then applied to spectral empirical distribution functions associated with high-dimensional random matrices."
concentration_of_measure
dependent_data
logarithmic_sobolev_inequalities
poincare_inequalities
random_structures
papers
to_read
november 2010 by shivak
Online Learning: Beyond Regret
november 2010 by shivak
Followup to the "online learnability" paper.
online_learning
learning_theory
papers
to_read
filetype:pdf
media:document
november 2010 by shivak
Permutation Complexity Bound on Out-Sample Error
november 2010 by shivak
"We define a data dependent permutation complexity for a hypothesis set H, which is similar to a Rademacher complexity or maximum discrepancy. The crucial difference is that it is based on permutation (dependent) sampling. We prove a uniform bound on the generalization error, as well as a concentration result which means that the permutation estimate can be efficiently estimated."
capacity_control
learning_theory
concentration_of_measure
papers
to_read
filetype:pdf
media:document
november 2010 by shivak
Tight Sample Complexity of Large-Margin Learning
november 2010 by shivak
"We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the $\gamma$-adapted-dimension, which is a simple function of the spectrum of a distribution’s covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the $\gamma$-adapted-dimension of the source distribution. We conclude that this new quantity tightly characterizes the true sample complexity of large-margin classification. The bounds hold for a rich family of sub-Gaussian distributions."
learning_theory
noise_conditions
margins
papers
to_read
filetype:pdf
media:document
november 2010 by shivak
Bounds for Rademacher Processes via Chaining
october 2010 by shivak
How are these better/different from the "Bernoulli conjecture" chapter in _The Generic Chaining_?
stochastic_process_suprema
generic_chaining
papers
to_read
october 2010 by shivak
A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers
october 2010 by shivak
"...we isolate and highlight two key properties of a regularized M-estimator—namely, a decomposability property for the regularizer, and a notion of restricted strong convexity that depends on the interaction between the regularizer and the loss function."
regularization
loss_functions
convexity
learning_theory
papers
to_read
october 2010 by shivak
Optimal learning rates for Kernel Conjugate Gradient regression
september 2010 by shivak
"We prove rates of convergence in the statistical sense for kernel-based least squares regression using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is directly related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. The rates depend on two key quantities: first, on the regularity of the target regression function and second, on the intrinsic dimensionality of the data mapped into the kernel space."
kernel_methods
regression
to_read
papers
september 2010 by shivak
Smoothness, Low-Noise and Fast Rates
september 2010 by shivak
"We establish am excess risk bound of order $H \Rad_n^2 + \sqrt{H L^*}\Rad_n$ for ERM with an H-smooth loss function and a hypothesis class with Rademacher complexity $\Rad_n$, where $L^*$ is the best risk achievable by the hypothesis class. For typical hypothesis classes where $\Rad_n = \sqrt{R/n}$, this translates to a learning rate of order $RH/n$ in the separable ($L^*=0$) case and $RH/n + \sqrt{L^* RH/n}$ more generally. We also provide similar guarantees for online and stochastic convex optimization of a smooth non-negative objective."
learning_theory
to_read
papers
september 2010 by shivak
The Equivalence of Sampling and Searching
august 2010 by shivak
"... for any sampling problem S, there exists a search problem R such that, if C is any "reasonable" complexity class, then R is in the search version of C if and only if S is in the sampling version."
sampling
theoretical_computer_science
algorithmic_information_theory
papers
to_read
august 2010 by shivak
User-friendly tail bounds for matrix martingales
july 2010 by shivak
"This paper provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Freedman, Hoeffding, and McDiarmid. The matrix inequalities promise the same diversity of application, ease of use, and strength of conclusion that have made the scalar inequalities so valuable."
deviation_inequalities
random_structures
to_read
july 2010 by shivak
A note on concentration of submodular functions
may 2010 by shivak
"We survey a few concentration inequalities for submodular and fractionally subadditive functions of independent random variables, implied by the entropy method for self-bounding functions. The power of these concentration bounds is that they are dimension-free, in particular implying standard deviation O(\sqrt{\E[f]}) rather than O(\sqrt{n}) which can be obtained for any 1-Lipschitz function of n variables."
concentration_of_measure
submodularity
papers
to_read
may 2010 by shivak
Constructive Proofs of Concentration Bounds
april 2010 by shivak
Of the Hoeffding/Chernoff variety. Constructive in the sense that "if the sum of the given random variables is not concentrated around the expectation, then we can efficiently find (with high probability) a subset of the random variables that are statistically dependent."
concentration_of_measure
direct_products
constructive_proofs
papers
to_read
april 2010 by shivak
Chebyshev’s inequality for Hilbert-space valued random elements
march 2010 by shivak
Check if this is actually new.
concentration_of_measure
to_read
papers
march 2010 by shivak
Information and Learning in Markets: The Impact of Market Microstructure
february 2010 by shivak
"Provides the most complete analysis of the ways markets aggregate information [and] bridges the gap between the rational expectations and herding literatures."
markets
game_theory
to_read
papers
february 2010 by shivak
Interactive Submodular Set Cover
february 2010 by shivak
A decent looking greedy criterion.
active_learning
sequential_decisions
submodularity
to_read
papers
february 2010 by shivak
The Complexity of Forecast Testing
may 2009 by shivak
The adversarial strategies used by forecasters to manipulate tests are computationally intractable.
to_read
prediction
prequentialism
bounded_rationality
fortnow
lance
vohra
rakesh
may 2009 by shivak
On Uniform Deviations of General Empirical Risks with Unboundedness, Dependence, and High Dimensionality
may 2009 by shivak
Bound the risk of your hypothesis without strong mixing, β-mixing, or iid requirements.
learning_theory
concentration_of_measure
dependent_data
regression
to_read
jiang
wenxin
filetype:pdf
media:document
may 2009 by shivak
Data-driven Calibration of Penalties for Least-Squares Regression
february 2009 by shivak
"Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from data. We propose a completely data-driven calibration algorithm for these parameters in the least-squares regression framework, without assuming a particular shape for the penalty. Our algorithm relies on the concept of minimal penalty, recently introduced by Birge and Massart (2007) in the context of penalized least squares for Gaussian homoscedastic regression. On the positive side, the minimal penalty can be evaluated from the data themselves, leading to a data-driven estimation of an optimal penalty which can be used in practice;
on the negative side, their approach heavily relies on the homoscedastic Gaussian nature of their stochastic framework."
data_dependent_methods
regularization
model_selection
learning_theory
statistics
arlot
sylvain
massart
pascal
to_read
filetype:pdf
media:document
on the negative side, their approach heavily relies on the homoscedastic Gaussian nature of their stochastic framework."
february 2009 by shivak
Measure Concentration (Math 710 at UMich)
february 2009 by shivak
Topics include: flattening lemma, isoperimetric inequalities, the martingale method, large deviations, Lipschitz functions...
concentration_of_measure
geometry
isoperimetry
stochastic_processes
large_deviations
to_read
barvinok
alexander
filetype:pdf
media:document
february 2009 by shivak
Teaching dimension and the complexity of active learning
february 2009 by shivak
"We study the label complexity of pool-based active learning in the PAC model with noise. Taking inspiration from extant literature on Exact learning with membership queries, we derive upper and lower bounds on the label complexity in terms of generalizations of extended teaching dimension. Among the contributions of this work is the first nontrivial general upper bound on label complexity in the presence of persistent classification noise."
active_learning
learning_theory
hanneke
steve
to_read
filetype:pdf
media:document
february 2009 by shivak
Rates of convergence in active learning
february 2009 by shivak
"We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the more general problem of active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions. In particular, we state sufficient conditions for these rates to be dramatically faster than those achievable by passive learning."
active_learning
learning_theory
hanneke
steve
to_read
updated_content
filetype:pdf
media:document
february 2009 by shivak
Concentration inequalities for dependent random variables via the martingale method
february 2009 by shivak
Like McDiarmid's inequality? Then you'll love Kontorovich & Ramanan.
concentration_of_measure
learning_theory
martingales
sensitivity
stochastic_processes
kontorovich
leonid
ramanan
kavita
to_read
february 2009 by shivak
Reliable reasoning: induction and statistical learning theory
november 2008 by shivak
Philosophical portions apparently disliked by CRS and Kevin Kelly.
harman
gilbert
kulkarni
sanjeev
induction
complexity
transduction
philosophy_of_science
machine_learning
statistical_learning_theory
books
to_read
filetype:pdf
media:document
november 2008 by shivak
related tags
active_learning ⊕ alexander ⊕ algorithmic_information_theory ⊕ anticoncentration ⊕ approximation_algorithms ⊕ arlot ⊕ barvinok ⊕ benford's_law ⊕ books ⊕ bounded_rationality ⊕ capacity_control ⊕ central_limit_theorems ⊕ classification ⊕ coding_theory ⊕ combinatorial_optimization ⊕ communication_complexity ⊕ complexity ⊕ concentration_of_measure ⊕ constructive_proofs ⊕ convexity ⊕ credit_collapse ⊕ data_dependent_methods ⊕ debunking ⊕ dependent_data ⊕ deviation_inequalities ⊕ dimension_reduction ⊕ direct_products ⊕ economics ⊕ embeddings ⊕ empirical_processes ⊕ filetype:pdf ⊕ forgotten ⊕ fortnow ⊕ functional_analysis ⊕ game_theory ⊕ generic_chaining ⊕ geometry ⊕ gilbert ⊕ graph_theory ⊕ hanneke ⊕ harman ⊕ induction ⊕ information_theory ⊕ isoperimetry ⊕ jiang ⊕ kavita ⊕ kernel_methods ⊕ kontorovich ⊕ kulkarni ⊕ lance ⊕ large_deviations ⊕ learning_theory ⊕ leonid ⊕ logarithmic_sobolev_inequalities ⊕ loss_functions ⊕ lower_bounds ⊕ machine_learning ⊕ margins ⊕ markets ⊕ martingales ⊕ massart ⊕ media:document ⊕ memory ⊕ model_selection ⊕ noise_conditions ⊕ online_learning ⊕ optimization ⊕ papers ⊕ pascal ⊕ philosophy_of_science ⊕ poincare_inequalities ⊕ preconditioners ⊕ prediction ⊕ prequentialism ⊕ rakesh ⊕ ramanan ⊕ random_structures ⊕ regression ⊕ regularization ⊕ rounding ⊕ sampling ⊕ sanjeev ⊕ sensitivity ⊕ sequential_decisions ⊕ smoothed_analysis ⊕ sparse_recovery ⊕ sparsity ⊕ spectral_methods ⊕ statistical_learning_theory ⊕ statistics ⊕ steve ⊕ stochastic_processes ⊕ stochastic_process_suprema ⊕ submodularity ⊕ surveys ⊕ sylvain ⊕ tesselations ⊕ theoretical_computer_science ⊕ to_read ⊖ transduction ⊕ updated_content ⊕ via:ded_maxim ⊕ vohra ⊕ wenxin ⊕Copy this bookmark: