shivak + to_read   59

A Geometric Approach to Sample Compression
"Two promising ways forward are: embedding maximal classes into maximum classes with at most a polynomial increase to VC dimension, and compression via operating on geometric representations. This paper presents positive results on the latter approach and a first negative result on the former, through a systematic investigation of finite maximum classes."
learning_theory  geometry  papers  to_read 
4 weeks ago by shivak
A new look at shifting regret
"Here we show, with a common, general, and simpler analysis, that weight sharing in fact achieves much more than what it was designed for. We use it to simultaneously prove new shifting regret bounds for online convex optimization on the simplex in terms of the total variation distance as well as new bounds for the related setting of adaptive regret. Finally, we exhibit the first logarithmic shifting bounds for exp-concave loss functions on the simplex."
online_learning  papers  to_read 
february 2012 by shivak
Sparse Algorithms are not Stable: A No-free-lunch Theorem
"We show that these two properties are fundamentally at odds with each other: a sparse algorithm cannot be stable and vice versa."
sparsity  sensitivity  papers  to_read 
january 2012 by shivak
Computing without memory
Cute! "In this paper, we generalise the famous algorithm for swapping the contents of two variables without using a buffer. We introduce a novel combinatorial framework for procedural programming languages, where programs are only allowed to update one variable at a time. We first consider programs which do not have any additional memory. We prove that any function of all the variables can be computed this way in a number of updates which grows linearly with the number of variables. Similarly, any linear function can be computed using a linear number of linear instructions. We then derive the exact number of instructions required to compute any manipulation of variables. Second, we show that allowing programs to use additional memory is also incorporated in our framework. We quantify the gains obtained by using additional memory. This leads to shorter programs and allows to use only binary instructions, which is not sufficient in general when no memory is used."
memory  papers  to_read 
november 2011 by shivak
Dimension reduction by random hyperplane tesselations
"Given a subset K of the unit Euclidean sphere, we estimate the minimal number m = m(K) of hyperplanes that generate a uniform tessellation of K, in the sense that the fraction of the hyperplanes separating any pair x,y in K is nearly proportional to the Euclidean distance between x and y. Random hyperplanes prove to be almost ideal for this problem; they achieve the almost optimal bound m = O(w(K)^2) where w(K) is the Gaussian mean width of K."
tesselations  dimension_reduction  embeddings  papers  to_read 
november 2011 by shivak
The Bernstein-Orlicz norm and deviation inequalities
"We introduce two new concepts designed for the study of empirical processes. First, we introduce a new Orlicz norm which we call the Bernstein-Orlicz norm. This new norm interpolates sub-Gaussian and sub-exponential tail behavior. In particular, we show how this norm can be used to simplify the derivation of deviation inequalities for suprema of collections of random variables. Secondly, we introduce chaining and generic chaining along a tree. These simplify the well-known concepts of chaining and generic chaining. The supremum of the empirical process is then studied as a special case. We show that chaining along a tree can be done using entropy with bracketing. Finally, we establish a deviation inequality for the empirical process for the unbounded case."
concentration_of_measure  deviation_inequalities  empirical_processes  generic_chaining  functional_analysis  papers  to_read 
november 2011 by shivak
Improved Smoothed Analysis of Multiobjective Optimization
Slightly tighter analysis of Moitra & O'Donnell. More importantly: bounds higher moments.
combinatorial_optimization  smoothed_analysis  papers  to_read 
november 2011 by shivak
A Variant of Azuma's Inequality for Martingales with Subgaussian Tail
"We present a variant of Azuma's concentration inequality for martingales, in which the standard boundedness requirement is replaced by the milder requirement of a subgaussian tail."
concentration_of_measure  martingales  papers  to_read 
october 2011 by shivak
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
To attain O(1/T) convergence for non-smooth problems, SGD just needs a little tweak: an averaging window.
optimization  convexity  papers  to_read 
october 2011 by shivak
Inner Regularization of Log-Concave Measures and Small-Ball Estimates
To deduce anticoncentration properties about X, use their "inner thickening" algorithm to produce Y, and deduce anticoncentration properties about that instead.
concentration_of_measure  anticoncentration  convexity  papers  to_read 
august 2011 by shivak
Efficient Online Learning via Randomized Rounding
"Most online algorithms used in machine learning today are based on variants of mirror descent or follow-the-leader. In this paper, we present an online algorithm based on a completely different approach, which combines "random playout" and randomized rounding of loss subgradients. As an application of our approach, we provide the first computationally efficient online algorithm for collaborative filtering with norm-constrained matrices. As a second application, we solve an open question linking batch learning and transductive online learning."
online_learning  rounding  papers  to_read 
june 2011 by shivak
Dimension-free tail inequalities for sums of random matrices
"We derive exponential tail inequalities for sums of random matrices with no dependence on the explicit matrix dimensions. These are similar to the matrix versions of the Chernoff bound and Bernstein inequality except with the explicit matrix dimensions replaced by a trace quantity that can be small even when the dimension is large or infinite. Some applications to principal component analysis and approximate matrix multiplication are given to illustrate the utility of the new bounds. "
concentration_of_measure  random_structures  papers  to_read 
april 2011 by shivak
The Communication Complexity of Gap Hamming Distance
Much shorter proof based on interplay between concentration and anticoncentration.
communication_complexity  lower_bounds  to_read 
april 2011 by shivak
Less Regret via Online Conditioning
"We analyze and evaluate an online gradient descent algorithm with adaptive per-coordinate adjustment of learning rates. Our algorithm can be thought of as an online version of batch gradient descent with a diagonal preconditioner. This approach leads to regret bounds that are stronger than those of standard online gradient descent for general online convex optimization problems."
online_learning  preconditioners  papers  to_read 
january 2011 by shivak
The Sparsity Gap: Uncertainty Principles Proportional to Dimension
"In an incoherent dictionary, most signals that admit a sparse representation admit a unique sparse representation. In other words, there is no way to express the signal without using strictly more atoms. This work demonstrates that sparse signals typically enjoy a higher privilege: each nonoptimal representation of the signal requires far more atoms than the sparsest representation—unless it contains many of the same atoms as the sparsest representation."
sparsity  papers  to_read  filetype:pdf  media:document 
january 2011 by shivak
A local maximal inequality under uniform entropy
"We derive an upper bound for the mean of the supremum of the empirical process indexed by a class of functions that are known to have variance bounded by a small constant $\delta$. The bound is expressed in the uniform entropy integral of the class at $\delta$."
stochastic_process_suprema  papers  to_read 
december 2010 by shivak
Bounds on the suprema of Gaussian processes, and omega results for the sum of a random multiplicative function
"In this paper we develop an alternative approach to lower bounding the upper tail probability [P(supt∈T Z(t) > u)]. The ingredients are an initial conditioning step, followed by a comparison (in the sense of the method of comparison) with a “model” Gaussian process that can be explicitly analysed. The resulting bounds are clean and can be non-trivial for moderately sized u, which is very important in our two applications."
stochastic_process_suprema  lower_bounds  papers  to_read 
december 2010 by shivak
An elementary proof of anti-concentration of polynomials in Gaussian variables
"The important work in this area is by Carbery and Wright, who gave a tight bound for anti-concentration of polynomials in normal variables. However, the proof of their result is quite complex. We give a weaker anti-concentration result which has an elementary proof, based on some convexity arguments, simple analysis and induction on the degree. Moreover, our proof technique is robust and extends to other distributions."
anticoncentration  papers  to_read 
november 2010 by shivak
Concentration of empirical distribution functions with applications to non-i.i.d. models
"The concentration of empirical measures is studied for dependent data, whose joint distribution satisfies Poincaré-type or logarithmic Sobolev inequalities. The general concentration results are then applied to spectral empirical distribution functions associated with high-dimensional random matrices."
concentration_of_measure  dependent_data  logarithmic_sobolev_inequalities  poincare_inequalities  random_structures  papers  to_read 
november 2010 by shivak
Permutation Complexity Bound on Out-Sample Error
"We define a data dependent permutation complexity for a hypothesis set H, which is similar to a Rademacher complexity or maximum discrepancy. The crucial difference is that it is based on permutation (dependent) sampling. We prove a uniform bound on the generalization error, as well as a concentration result which means that the permutation estimate can be efficiently estimated."
capacity_control  learning_theory  concentration_of_measure  papers  to_read  filetype:pdf  media:document 
november 2010 by shivak
Tight Sample Complexity of Large-Margin Learning
"We obtain a tight distribution-specific characterization of the sample complexity of large-margin classification with L2 regularization: We introduce the $\gamma$-adapted-dimension, which is a simple function of the spectrum of a distribution’s covariance matrix, and show distribution-specific upper and lower bounds on the sample complexity, both governed by the $\gamma$-adapted-dimension of the source distribution. We conclude that this new quantity tightly characterizes the true sample complexity of large-margin classification. The bounds hold for a rich family of sub-Gaussian distributions."
learning_theory  noise_conditions  margins  papers  to_read  filetype:pdf  media:document 
november 2010 by shivak
Bounds for Rademacher Processes via Chaining
How are these better/different from the "Bernoulli conjecture" chapter in _The Generic Chaining_?
stochastic_process_suprema  generic_chaining  papers  to_read 
october 2010 by shivak
A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers
"...we isolate and highlight two key properties of a regularized M-estimator—namely, a decomposability property for the regularizer, and a notion of restricted strong convexity that depends on the interaction between the regularizer and the loss function."
regularization  loss_functions  convexity  learning_theory  papers  to_read 
october 2010 by shivak
Optimal learning rates for Kernel Conjugate Gradient regression
"We prove rates of convergence in the statistical sense for kernel-based least squares regression using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is directly related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. The rates depend on two key quantities: first, on the regularity of the target regression function and second, on the intrinsic dimensionality of the data mapped into the kernel space."
kernel_methods  regression  to_read  papers 
september 2010 by shivak
Smoothness, Low-Noise and Fast Rates
"We establish am excess risk bound of order $H \Rad_n^2 + \sqrt{H L^*}\Rad_n$ for ERM with an H-smooth loss function and a hypothesis class with Rademacher complexity $\Rad_n$, where $L^*$ is the best risk achievable by the hypothesis class. For typical hypothesis classes where $\Rad_n = \sqrt{R/n}$, this translates to a learning rate of order $RH/n$ in the separable ($L^*=0$) case and $RH/n + \sqrt{L^* RH/n}$ more generally. We also provide similar guarantees for online and stochastic convex optimization of a smooth non-negative objective."
learning_theory  to_read  papers 
september 2010 by shivak
The Equivalence of Sampling and Searching
"... for any sampling problem S, there exists a search problem R such that, if C is any "reasonable" complexity class, then R is in the search version of C if and only if S is in the sampling version."
sampling  theoretical_computer_science  algorithmic_information_theory  papers  to_read 
august 2010 by shivak
User-friendly tail bounds for matrix martingales
"This paper provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Freedman, Hoeffding, and McDiarmid. The matrix inequalities promise the same diversity of application, ease of use, and strength of conclusion that have made the scalar inequalities so valuable."
deviation_inequalities  random_structures  to_read 
july 2010 by shivak
A note on concentration of submodular functions
"We survey a few concentration inequalities for submodular and fractionally subadditive functions of independent random variables, implied by the entropy method for self-bounding functions. The power of these concentration bounds is that they are dimension-free, in particular implying standard deviation O(\sqrt{\E[f]}) rather than O(\sqrt{n}) which can be obtained for any 1-Lipschitz function of n variables."
concentration_of_measure  submodularity  papers  to_read 
may 2010 by shivak
Constructive Proofs of Concentration Bounds
Of the Hoeffding/Chernoff variety. Constructive in the sense that "if the sum of the given random variables is not concentrated around the expectation, then we can efficiently find (with high probability) a subset of the random variables that are statistically dependent."
concentration_of_measure  direct_products  constructive_proofs  papers  to_read 
april 2010 by shivak
Information and Learning in Markets: The Impact of Market Microstructure
"Provides the most complete analysis of the ways markets aggregate information [and] bridges the gap between the rational expectations and herding literatures."
markets  game_theory  to_read  papers 
february 2010 by shivak
The Complexity of Forecast Testing
The adversarial strategies used by forecasters to manipulate tests are computationally intractable.
to_read  prediction  prequentialism  bounded_rationality  fortnow  lance  vohra  rakesh 
may 2009 by shivak
Data-driven Calibration of Penalties for Least-Squares Regression
"Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from data. We propose a completely data-driven calibration algorithm for these parameters in the least-squares regression framework, without assuming a particular shape for the penalty. Our algorithm relies on the concept of minimal penalty, recently introduced by Birge and Massart (2007) in the context of penalized least squares for Gaussian homoscedastic regression. On the positive side, the minimal penalty can be evaluated from the data themselves, leading to a data-driven estimation of an optimal penalty which can be used in practice;
on the negative side, their approach heavily relies on the homoscedastic Gaussian nature of their stochastic framework."
data_dependent_methods  regularization  model_selection  learning_theory  statistics  arlot  sylvain  massart  pascal  to_read  filetype:pdf  media:document 
february 2009 by shivak
Measure Concentration (Math 710 at UMich)
Topics include: flattening lemma, isoperimetric inequalities, the martingale method, large deviations, Lipschitz functions...
concentration_of_measure  geometry  isoperimetry  stochastic_processes  large_deviations  to_read  barvinok  alexander  filetype:pdf  media:document 
february 2009 by shivak
Teaching dimension and the complexity of active learning
"We study the label complexity of pool-based active learning in the PAC model with noise. Taking inspiration from extant literature on Exact learning with membership queries, we derive upper and lower bounds on the label complexity in terms of generalizations of extended teaching dimension. Among the contributions of this work is the first nontrivial general upper bound on label complexity in the presence of persistent classification noise."
active_learning  learning_theory  hanneke  steve  to_read  filetype:pdf  media:document 
february 2009 by shivak
Rates of convergence in active learning
"We study the rates of convergence in generalization error achievable by active learning under various types of label noise. Additionally, we study the more general problem of active learning with a nested hierarchy of hypothesis classes, and propose an algorithm whose error rate provably converges to the best achievable error among classifiers in the hierarchy at a rate adaptive to both the complexity of the optimal classifier and the noise conditions. In particular, we state sufficient conditions for these rates to be dramatically faster than those achievable by passive learning."
active_learning  learning_theory  hanneke  steve  to_read  updated_content  filetype:pdf  media:document 
february 2009 by shivak

related tags

active_learning  alexander  algorithmic_information_theory  anticoncentration  approximation_algorithms  arlot  barvinok  benford's_law  books  bounded_rationality  capacity_control  central_limit_theorems  classification  coding_theory  combinatorial_optimization  communication_complexity  complexity  concentration_of_measure  constructive_proofs  convexity  credit_collapse  data_dependent_methods  debunking  dependent_data  deviation_inequalities  dimension_reduction  direct_products  economics  embeddings  empirical_processes  filetype:pdf  forgotten  fortnow  functional_analysis  game_theory  generic_chaining  geometry  gilbert  graph_theory  hanneke  harman  induction  information_theory  isoperimetry  jiang  kavita  kernel_methods  kontorovich  kulkarni  lance  large_deviations  learning_theory  leonid  logarithmic_sobolev_inequalities  loss_functions  lower_bounds  machine_learning  margins  markets  martingales  massart  media:document  memory  model_selection  noise_conditions  online_learning  optimization  papers  pascal  philosophy_of_science  poincare_inequalities  preconditioners  prediction  prequentialism  rakesh  ramanan  random_structures  regression  regularization  rounding  sampling  sanjeev  sensitivity  sequential_decisions  smoothed_analysis  sparse_recovery  sparsity  spectral_methods  statistical_learning_theory  statistics  steve  stochastic_processes  stochastic_process_suprema  submodularity  surveys  sylvain  tesselations  theoretical_computer_science  to_read  transduction  updated_content  via:ded_maxim  vohra  wenxin 

Copy this bookmark:



description:


tags: