shivak + papers   329

An Entropic Proof of Chang's Inequality
Chang's lemma is a useful tool in additive combinatorics and the analysis of Boolean functions. Here we give an elementary proof using entropy. The constant we obtain is tight, and we give a slight improvement in the case where the variables are highly biased.
fourier_analysis  papers 
4 weeks ago by shivak
A Geometric Approach to Sample Compression
"Two promising ways forward are: embedding maximal classes into maximum classes with at most a polynomial increase to VC dimension, and compression via operating on geometric representations. This paper presents positive results on the latter approach and a first negative result on the former, through a systematic investigation of finite maximum classes."
learning_theory  geometry  papers  to_read 
4 weeks ago by shivak
Contextual Bandit Learning with Predictable Rewards
"Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context. We consider this problem under a realizability assumption: there exists a function in a (known) function class, always capable of predicting the expected reward, given the action and context. Under this assumption, we show three things. We present a new algorithm---Regressor Elimination--- with a regret similar to the agnostic setting (i.e. in the absence of realizability assumption). We prove a new lower bound showing no algorithm can achieve superior performance in the worst case even with the realizability assumption. However, we do show that for any set of policies (mapping contexts to actions), there is a distribution over rewards (given context) such that our new algorithm has constant regret unlike the previous approaches."
bandit_problems  papers 
5 weeks ago by shivak
On the Distribution of the Fourier Spectrum of Halfspaces
"Bourgain showed that any noise stable Boolean function $f$ can be well-approximated by a junta. In this note we give an exponential sharpening of the parameters of Bourgain's result under the additional assumption that $f$ is a halfspace."
sensitivity  boolean_functions  fourier_analysis  papers 
5 weeks ago by shivak
Graphical Models for Bandit Problems
"We introduce a rich class of graphical models for multi-armed bandit problems that permit both the state or context space and the action space to be very large, yet succinctly specify the payoffs for any context-action pair. Our main result is an algorithm for such models whose regret is bounded by the number of parameters and whose running time depends only on the treewidth of the graph substructure induced by the action space."
graphical_models  bandit_problems  papers 
5 weeks ago by shivak
Learning DNF Expressions from Fourier Spectrum
"We give a new, simple algorithm for approximating any polynomial-size DNF expression from its "heavy" low-degree Fourier coefficients alone. Our algorithm greatly simplifies the proof of learnability of DNF expressions over smoothed product distributions."
fourier_analysis  learning_theory  smoothed_analysis  papers 
5 weeks ago by shivak
On the Equivalence between Herding and Conditional Gradient Algorithms
"We show that the herding procedure of Welling (2009) takes exactly the form of a standard convex optimization algorithm--namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider faster alternatives for the task of approximating integrals in a reproducing kernel Hilbert space. We study the behavior of the different variants through numerical simulations. The experiments indicate that while we can improve over herding on the task of approximating integrals, the original herding algorithm tends to approach more often the maximum entropy distribution, shedding more light on the learning bias behind herding."
optimization  metaheuristics  papers 
5 weeks ago by shivak
A Structure Theorem for Poorly Anticoncentrated Gaussian Chaoses and Applications to the Study of Polynomial Threshold Functions
"We prove a structural result for degree-$d$ polynomials. In particular, we show that any degree-$d$ polynomial, $p$ can be approximated by another polynomial, $p_0$, which can be decomposed as some function of polynomials $q_1,...,q_m$ with $q_i$ normalized and $m=O_d(1)$, so that if $X$ is a Gaussian random variable, the probability distribution on $(q_1(X),...,q_m(X))$ does not have too much mass in any small box.
Using this result, we prove improved versions of a number of results about polynomial threshold functions, including producing better pseudorandom generators, obtaining a better invariance principle, and proving improved bounds on noise sensitivity."
anticoncentration  polynomials  papers 
5 weeks ago by shivak
The Kernelized Stochastic Batch Perceptron
"We present a novel approach for training kernel Support Vector Machines, establish learning runtime guarantees for our method that are better then those of any other known kernelized SVM optimization approach, and show that our method works well in practice compared to existing alternatives."
kernel_methods  machine_learning  papers 
5 weeks ago by shivak
Estimating Unknown Sparsity in Compressed Sensing
"Although we show that estimation of ||x||_0 is generally intractable in this framework, we consider an alternative measure of sparsity s(x):=frac{|x|_1^2}{|x|_2^2}, which is a sharp lower bound on ||x||_0, and is more amenable to estimation. When $x$ is a non-negative vector, we propose a computationally efficient estimator hat{s}(x), and use non-asymptotic methods to bound the relative error of hat{s}(x) in terms of a finite number of measurements. Remarkably, the quality of estimation is emph{dimension-free}, which ensures that hat{s}(x) is well-suited to the high-dimensional regime where n<<p."
sparsity  papers 
5 weeks ago by shivak
Algebraic Geometric Comparison of Probability Distributions
"... treating the cumulants as elements of the polynomial ring."
algebraic_geometry  probability  papers 
8 weeks ago by shivak
A new look at shifting regret
"Here we show, with a common, general, and simpler analysis, that weight sharing in fact achieves much more than what it was designed for. We use it to simultaneously prove new shifting regret bounds for online convex optimization on the simplex in terms of the total variation distance as well as new bounds for the related setting of adaptive regret. Finally, we exhibit the first logarithmic shifting bounds for exp-concave loss functions on the simplex."
online_learning  papers  to_read 
february 2012 by shivak
Finding the most biased coin with fewest flips
"The problem is equivalent to nding the best arm in the multi-armed bandit problem using adaptive strategies."
bandit_problems  sequential_decisions  papers 
february 2012 by shivak
On measure concentration of vector valued maps
Includes a 'domination' lemma which reduces (sharp) concentration of vector-valued functions to that of the one-dimensional projections.
concentration_of_measure  papers 
february 2012 by shivak
Beyond Logarithmic Bounds in Online Learning
Smoothness + exp-concave loss allows online Newton to achieve regret bounded in terms of the "cumulative loss of the best point over the first T steps", which leads to regret no worse than O(log T), and much better if, for example, there's a perfect function.
online_learning  convexity  smoothness  papers 
february 2012 by shivak
Matrix Concentration Inequalities via the Method of Exchangeable Pairs
"This paper derives exponential concentration inequalities and polynomial moment inequalities for the spectral norm of a random matrix. The analysis requires a matrix extension of the scalar concentration theory developed by Sourav Chatterjee using Stein’s method of exchangeable pairs. When applied to a sum of independent random matrices, this approach yields matrix generalizations of the classical inequalities due to Hoeffding, Bernstein, Khintchine, and Rosenthal. The same technique delivers bounds for sums of dependent random matrices and more general matrix-valued functions of dependent random variables."
deviation_inequalities  exchangeability  papers 
january 2012 by shivak
Variations on the Berry-Esseen theorem
"We analyze the quality of the gaussian approximation to linear combinations of n independent, identically-distributed random variables with finite fourth moments. It turns out that there exist universal, simple linear combinations that perform better than the sum of the variables."
invariance_principles  central_limit_theorems  papers 
january 2012 by shivak
Fast Distributed Gradient Methods
Suppose I have a (strongly) convex function F(x) which is a sum of N (strongly) convex functions f_i(x). To optimize F using N networked computers, optimize f_i separately on each machine while averaging in gradients from the neighboring machines. To prove this works, show you are actually solving an optimization problem over N variables x_1,...,x_N in which there is a regularization term enforcing closeness of the x_i.
optimization  distributed_systems  papers  have_read 
january 2012 by shivak
On a conjecture concerning the sum of independent Rademacher random variables
"It is shown that at least 50% of the probability mass of a sum of independent Rademacher random variables is within one standard deviation from its mean. This lower bound is sharp, it is much better than for instance the bound that can be obtained from application of the Chebishev inequality..."
probability  combinatorics  papers 
january 2012 by shivak
Central Binomial Tail Bounds
"An alternate form for the binomial tail is presented, which leads to a variety of bounds for the central tail. A few can be weakened into the corresponding Chernoff and Slud bounds, which not only demonstrates the quality of the presented bounds, but also provides alternate proofs for the classical bounds."
probability  combinatorics  papers 
january 2012 by shivak
A Scalable Bootstrap for Massive Data
"...we introduce the Bag of Little Bootstraps (BLB), a new procedure which incorporates features of both the bootstrap and subsampling to obtain a robust, computationally efficient means of assessing the quality of estimators. BLB is well suited to modern parallel and distributed computing architectures and furthermore retains the generic applicability and statistical efficiency of the bootstrap."
resampling  large-scale_learning  papers 
january 2012 by shivak
A Simpler Approach to Matrix Completion
"This paper provides the best bounds to date on the number of randomly sampled entries required to reconstruct an unknown low-rank matrix...The proof of this assertion is short, self contained, and uses very elementary analysis. The novel techniques herein are based on recent work in quantum information theory."
matrix_completion  convex_relaxations  papers 
january 2012 by shivak
The Complexity of Statistical Algorithms
I didn't especially care for the 1st draft of this paper, but now it's fortified with Vitaly Feldman.
stochastic_optimization  lower_bounds  papers 
january 2012 by shivak
Sparse Algorithms are not Stable: A No-free-lunch Theorem
"We show that these two properties are fundamentally at odds with each other: a sparse algorithm cannot be stable and vice versa."
sparsity  sensitivity  papers  to_read 
january 2012 by shivak
Computing without memory
Cute! "In this paper, we generalise the famous algorithm for swapping the contents of two variables without using a buffer. We introduce a novel combinatorial framework for procedural programming languages, where programs are only allowed to update one variable at a time. We first consider programs which do not have any additional memory. We prove that any function of all the variables can be computed this way in a number of updates which grows linearly with the number of variables. Similarly, any linear function can be computed using a linear number of linear instructions. We then derive the exact number of instructions required to compute any manipulation of variables. Second, we show that allowing programs to use additional memory is also incorporated in our framework. We quantify the gains obtained by using additional memory. This leads to shorter programs and allows to use only binary instructions, which is not sufficient in general when no memory is used."
memory  papers  to_read 
november 2011 by shivak
Falsification and future performance
"We information-theoretically reformulate two measures of capacity from statistical learning theory: empirical VC-entropy and empirical Rademacher complexity. We show these capacity measures count the number of hypotheses about a dataset that a learning algorithm falsifies when it finds the classifier in its repertoire minimizing empirical risk. It then follows from that the future performance of predictors on unseen data is controlled in part by how many hypotheses the learner falsifies. As a corollary we show that empirical VC-entropy quantifies the message length of the true hypothesis in the optimal code of a particular probability distribution, the so-called actual repertoire."
capacity_control  information_theory  papers 
november 2011 by shivak
On the Fundamental Limits of Adaptive Sensing
"We prove that the advantages offered by clever adaptive strategies and sophisticated estimation procedures---no matter how intractable---over classical compressed acquisition/recovery schemes are, in general, minimal."
sequential_decisions  compressed_sensing  papers 
november 2011 by shivak
Dimension reduction by random hyperplane tesselations
"Given a subset K of the unit Euclidean sphere, we estimate the minimal number m = m(K) of hyperplanes that generate a uniform tessellation of K, in the sense that the fraction of the hyperplanes separating any pair x,y in K is nearly proportional to the Euclidean distance between x and y. Random hyperplanes prove to be almost ideal for this problem; they achieve the almost optimal bound m = O(w(K)^2) where w(K) is the Gaussian mean width of K."
tesselations  dimension_reduction  embeddings  papers  to_read 
november 2011 by shivak
Lifts of convex sets and cone factorizations
"In this paper we address the basic geometric question of when a given convex set is the image under a linear map of an affine slice of a given closed convex cone. Such a representation or 'lift' of the convex set is especially useful if the cone admits an efficient algorithm for linear optimization over its affine slices."
convexity  geometry  functional_analysis  papers 
november 2011 by shivak
Estimated VC dimension for risk bounds
It's been a while since I visited VC-land, but: VC dimension is expressly a distribution-independent concept which is known to be computationally hard to estimate. Those interested in distribution-specific bounds are probably more interested in VC entropy. IIRC VC entropy concentrates around its expectation. (Edit: they want to use unbounded losses.)
capacity_control  papers 
november 2011 by shivak
The Bernstein-Orlicz norm and deviation inequalities
"We introduce two new concepts designed for the study of empirical processes. First, we introduce a new Orlicz norm which we call the Bernstein-Orlicz norm. This new norm interpolates sub-Gaussian and sub-exponential tail behavior. In particular, we show how this norm can be used to simplify the derivation of deviation inequalities for suprema of collections of random variables. Secondly, we introduce chaining and generic chaining along a tree. These simplify the well-known concepts of chaining and generic chaining. The supremum of the empirical process is then studied as a special case. We show that chaining along a tree can be done using entropy with bracketing. Finally, we establish a deviation inequality for the empirical process for the unbounded case."
concentration_of_measure  deviation_inequalities  empirical_processes  generic_chaining  functional_analysis  papers  to_read 
november 2011 by shivak
Robust Interactive Learning
"In each query, the algorithm proposes a label and a subset of the unlabeled examples, and asks the oracle to point to one of these examples whose true label agrees with the specified label, if any exist. This is a strict generalization of the traditional model of active learning by label requests." This is the model that Avrim Blum casually proposed to me in his class two years ago.
sequential_decisions  active_learning  papers 
november 2011 by shivak
Improved Smoothed Analysis of Multiobjective Optimization
Slightly tighter analysis of Moitra & O'Donnell. More importantly: bounds higher moments.
combinatorial_optimization  smoothed_analysis  papers  to_read 
november 2011 by shivak
Approximate Stochastic Subgradient Estimation Training for Support Vector Machines
"...we use Vapnik’s original SVM formulation without modifying the objective to be strongly convex." It's strongly convex in the hyperplane but only convex in the offset. But what's wrong with adding a dummy feature?
optimization  machine_learning  kernel_methods  papers 
november 2011 by shivak
Active Testing
Uses the active learning model to achieve property testing goals.
active_learning  property_testing  papers 
november 2011 by shivak
Computing a Nonnegative Matrix Factorization -- Provably
"Vavasis proved that this problem is NP-complete...We give a polynomial-time algorithm for exact and approximate NMF for every constant $r$...We give an algorithm that runs in time polynomial in $n$, $m$ and $r$ under the separablity condition identified by Donoho and Stodden in 2003."
matrix_factorizations  papers 
november 2011 by shivak
Adaptive Hedge
"Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods."
online_learning  data_dependent_methods  papers 
october 2011 by shivak
Streaming Algorithms from Precision Sampling
"Precision Sampling itself addresses the problem of estimating a sum S = sum_i a_i from weak estimates of each real a_i in [0, 1]. More precisely, the estimator first chooses a desired precision u_i in (0, 1] for each i in [n], and then it receives an estimate of every a_i within additive u_i. Its goal is to provide a good approximation to S ai while keeping a tab on the “approximation cost” sum_i (1/u_i). Here we refine previous work [Andoni, Krauthgamer, and Onak, FOCS 2010]which shows that as long as S=Omega(1), a good multiplicative approximation can be achieved using total precision of only O(n log n)."
streaming_algorithms  approximation_algorithms  papers 
october 2011 by shivak
The Power of Linear Estimators
There is a near-optimal linear estimator (which can be computed in polynomial time) for distribution properties such as entropy and L1 distance to uniformity.
learning_theory  linearity  papers 
october 2011 by shivak
An Optimal Algorithm for Linear Bandits
"We provide the first algorithm for online bandit linear optimization whose regret after T rounds is of order sqrt{Td ln N} on any finite class X of N actions in d dimensions, and of order d*sqrt{T} (up to log factors) when X is infinite. These bounds are not improvable in general. The basic idea utilizes tools from convex geometry to construct what is essentially an optimal exploration basis. We also present an application to a model of linear bandits with expert advice. Interestingly, these results show that bandit linear optimization with expert advice in d dimensions is no more difficult (in terms of the achievable regret) than the online d-armed bandit problem with expert advice (where EXP4 is optimal)."
bandit_problems  linearity  papers 
october 2011 by shivak
Bourgain's discretization theorem
Up to a constant, in terms of distortion, embedding an entire Banach space into another is no harder than embedding an epsilon net.
embeddings  functional_analysis  metric_spaces  papers 
october 2011 by shivak
A tail inequality for quadratic forms of subgaussian random vectors
"We prove an exponential probability tail inequality for positive semidefinite quadratic forms in a subgaussian random vector. The bound is analogous to one that holds when the vector has independent Gaussian entries."
concentration_of_measure  quadratic_forms  random_structures  papers 
october 2011 by shivak
A Variant of Azuma's Inequality for Martingales with Subgaussian Tail
"We present a variant of Azuma's concentration inequality for martingales, in which the standard boundedness requirement is replaced by the milder requirement of a subgaussian tail."
concentration_of_measure  martingales  papers  to_read 
october 2011 by shivak
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
To attain O(1/T) convergence for non-smooth problems, SGD just needs a little tweak: an averaging window.
optimization  convexity  papers  to_read 
october 2011 by shivak
« earlier      

related tags

active_learning  adaptivity_gap  adversaries  agnostic_setting  algebraic_geometry  algorithmic_information_theory  algorithms  anticoncentration  approximation  approximation_algorithms  approximation_theory  bandit_problems  bayesian_inference  books  boolean_functions  boosting  brownian_motion  capacity_control  category_theory  catoni  causal_inference  central_limit_theorems  circuits  classification  cloud_computing  colleagues  combinatorial_optimization  combinatorics  communication_complexity  compressed_sensing  compression  computability  computational-statistical_tradeoffs  computational_efficiency  computational_finance  computational_learning_theory  concentration_of_measure  conditional_probability  constraint_satisfaction  constructive_proofs  convexity  convex_relaxations  counterexamples  coupling  cover_times  cross_validation  cryptography  curse_of_dimensionality  databases  data_dependent_methods  data_structures  decision_trees  densities  density_estimation  dependent_data  deviation_inequalities  differential_equations  differential_privacy  dimension_reduction  direct_products  discrepancy  distributed_learning  distributed_systems  dynamical_systems  dynamic_programming  economics  embeddings  empirical_processes  ensemble_methods  equilibria  ergodicity  evasiveness  exchangeability  expanders  experimental_mathematics  experiment_design  eye_on_the_prize  feature_selection  filetype:pdf  fourier_analysis  functional_analysis  functional_inequalities  game_theory  gaussians  generative_models  generic_chaining  geometry  good_timing  graphical_models  graph_theory  hardness_of_approximation  have_read  heard_the_talk  heavy_tails  hitting_times  homomorphic_encryption  huh?  hypercontractivity  hypothesis_testing  incentives  incoherence  information_design  information_theory  interactive_proofs  interpretability  invariance_principles  isoperimetry  jackson  john  kernel_methods  large-scale_learning  large_deviations  lasso  laws_of_large_numbers  learning_theory  likelihood  linearity  linear_algebra  logarithmic_sobolev_inequalities  loss_functions  lower_bounds  low_rank_matrices  machine_learning  margins  markets  market_making  markov_chains  martingales  mathematics  matrix_completion  matrix_factorization  matrix_factorizations  matrix_multiplication  matthew  measure_theory  mechanism_design  media:document  memory  metaheuristics  method_of_moments  metric_spaces  mine_the_lemmas  minimaxity  model_selection  multiple_testing  networks  noise_conditions  noncommutative_probability  nuclear_norm_minimization  olivier  online_algorithms  online_learning  open_problems  optimization  pac-bayes  papers  persistent_data_structures  phase_transitions  poincare_inequalities  polyhedra  polynomials  practice_without_theory  preconditioners  prediction_markets  principal_components  probability  probability_and_programs  projection  proof_techniques  property_testing  quadratic_forms  quantitative_finance  quantum_computing  quantum_mechanics  randomness  random_projections  random_structures  regression  regularization  regularization_paths  reinforcement_learning  resampling  ricci_curvature  rounding  sampling  semidefinite_programming  sensitivity  sequential_decisions  sheaves  similarity  simplifications  smoothed_analysis  smoothness  sparse_recovery  sparsification  sparsity  spectral_methods  stability  stable_distributions  statistical_mechanics  statistical_query_learning  statistics  stein's_method  stochastic_calculus  stochastic_optimization  stochastic_processes  stochastic_process_suprema  streaming_algorithms  structured_prediction  submodularity  support_vectors  surveys  symmetry  tedious  tesselations  theoretical_computer_science  theory_to_practice  time_series  topic_models  to_read  trading  transaction_costs  trees  turing_machines  turned_tables  type_systems  type_theory  uniqueness  unique_games  versioning  via:cshalizi  via:ded_maxim  via:dr._dobbs  via:langford  via:ltu 

Copy this bookmark:



description:


tags: