1525
The Swiss Army Knife of Cryptography
Barak and Brakerski demonstrate the versatility of fully homorphic encryption.
cryptography  popular_science 
4 weeks ago
An Entropic Proof of Chang's Inequality
Chang's lemma is a useful tool in additive combinatorics and the analysis of Boolean functions. Here we give an elementary proof using entropy. The constant we obtain is tight, and we give a slight improvement in the case where the variables are highly biased.
fourier_analysis  papers 
4 weeks ago
A Geometric Approach to Sample Compression
"Two promising ways forward are: embedding maximal classes into maximum classes with at most a polynomial increase to VC dimension, and compression via operating on geometric representations. This paper presents positive results on the latter approach and a first negative result on the former, through a systematic investigation of finite maximum classes."
learning_theory  geometry  papers  to_read 
4 weeks ago
Generalization Bounds and Consistency for Latent-Structural Probit and Ramp Loss
"Linear predictors are scale-insensitive — the prediction does not change when the weight vector defining the predictor is scaled up or down. This implies that direct regularization of the performance of a linear predictor with a scale sensitive regularizer (such as a norm of the weight vector) is meaningless. Linear predictors are typically learned by introducing a scale-sensitive surrogate loss function such as the hinge loss of an SVM. However, no convex surrogate loss function can be consistent in general — in finite dimension SVMs are not consistent. Here we generalize probit loss and ramp loss to the latent-structural setting and show that both of these loss functions are consistent in arbitrary dimension for an arbitrary bounded task loss. Empirical experience with probit loss and ramp loss will be briefly discussed."
convex_relaxations  machine_learning  videos  to_view 
5 weeks ago
How Accurate is inv(A)*b?
"Several widely-used textbooks lead the reader to believe that solving a linear system of equations Ax = b by multiplying the vector b by a computed inverse inv(A) is inaccurate. Virtually all other textbooks on numerical analysis and numerical linear algebra advise against using computed inverses without stating whether this is accurate or not. In fact, under reasonable assumptions on how the inverse is computed, x = inv(A)*b is as accurate as the solution computed by the best backward-stable solvers. This fact is not new, but obviously obscure. We review the literature on the accuracy of this computation and present a self-contained numerical analysis of it."
numerical_methods  linear_algebra 
5 weeks ago
Contextual Bandit Learning with Predictable Rewards
"Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context. We consider this problem under a realizability assumption: there exists a function in a (known) function class, always capable of predicting the expected reward, given the action and context. Under this assumption, we show three things. We present a new algorithm---Regressor Elimination--- with a regret similar to the agnostic setting (i.e. in the absence of realizability assumption). We prove a new lower bound showing no algorithm can achieve superior performance in the worst case even with the realizability assumption. However, we do show that for any set of policies (mapping contexts to actions), there is a distribution over rewards (given context) such that our new algorithm has constant regret unlike the previous approaches."
bandit_problems  papers 
5 weeks ago
On the Distribution of the Fourier Spectrum of Halfspaces
"Bourgain showed that any noise stable Boolean function $f$ can be well-approximated by a junta. In this note we give an exponential sharpening of the parameters of Bourgain's result under the additional assumption that $f$ is a halfspace."
sensitivity  boolean_functions  fourier_analysis  papers 
5 weeks ago
Graphical Models for Bandit Problems
"We introduce a rich class of graphical models for multi-armed bandit problems that permit both the state or context space and the action space to be very large, yet succinctly specify the payoffs for any context-action pair. Our main result is an algorithm for such models whose regret is bounded by the number of parameters and whose running time depends only on the treewidth of the graph substructure induced by the action space."
graphical_models  bandit_problems  papers 
5 weeks ago
Learning DNF Expressions from Fourier Spectrum
"We give a new, simple algorithm for approximating any polynomial-size DNF expression from its "heavy" low-degree Fourier coefficients alone. Our algorithm greatly simplifies the proof of learnability of DNF expressions over smoothed product distributions."
fourier_analysis  learning_theory  smoothed_analysis  papers 
5 weeks ago
On the Equivalence between Herding and Conditional Gradient Algorithms
"We show that the herding procedure of Welling (2009) takes exactly the form of a standard convex optimization algorithm--namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider faster alternatives for the task of approximating integrals in a reproducing kernel Hilbert space. We study the behavior of the different variants through numerical simulations. The experiments indicate that while we can improve over herding on the task of approximating integrals, the original herding algorithm tends to approach more often the maximum entropy distribution, shedding more light on the learning bias behind herding."
optimization  metaheuristics  papers 
5 weeks ago
A Structure Theorem for Poorly Anticoncentrated Gaussian Chaoses and Applications to the Study of Polynomial Threshold Functions
"We prove a structural result for degree-$d$ polynomials. In particular, we show that any degree-$d$ polynomial, $p$ can be approximated by another polynomial, $p_0$, which can be decomposed as some function of polynomials $q_1,...,q_m$ with $q_i$ normalized and $m=O_d(1)$, so that if $X$ is a Gaussian random variable, the probability distribution on $(q_1(X),...,q_m(X))$ does not have too much mass in any small box.
Using this result, we prove improved versions of a number of results about polynomial threshold functions, including producing better pseudorandom generators, obtaining a better invariance principle, and proving improved bounds on noise sensitivity."
anticoncentration  polynomials  papers 
5 weeks ago
The Kernelized Stochastic Batch Perceptron
"We present a novel approach for training kernel Support Vector Machines, establish learning runtime guarantees for our method that are better then those of any other known kernelized SVM optimization approach, and show that our method works well in practice compared to existing alternatives."
kernel_methods  machine_learning  papers 
5 weeks ago
Estimating Unknown Sparsity in Compressed Sensing
"Although we show that estimation of ||x||_0 is generally intractable in this framework, we consider an alternative measure of sparsity s(x):=frac{|x|_1^2}{|x|_2^2}, which is a sharp lower bound on ||x||_0, and is more amenable to estimation. When $x$ is a non-negative vector, we propose a computationally efficient estimator hat{s}(x), and use non-asymptotic methods to bound the relative error of hat{s}(x) in terms of a finite number of measurements. Remarkably, the quality of estimation is emph{dimension-free}, which ensures that hat{s}(x) is well-suited to the high-dimensional regime where n<<p."
sparsity  papers 
5 weeks ago
Algebraic Geometric Comparison of Probability Distributions
"... treating the cumulants as elements of the polynomial ring."
algebraic_geometry  probability  papers 
8 weeks ago
A new look at shifting regret
"Here we show, with a common, general, and simpler analysis, that weight sharing in fact achieves much more than what it was designed for. We use it to simultaneously prove new shifting regret bounds for online convex optimization on the simplex in terms of the total variation distance as well as new bounds for the related setting of adaptive regret. Finally, we exhibit the first logarithmic shifting bounds for exp-concave loss functions on the simplex."
online_learning  papers  to_read 
february 2012
Finding the most biased coin with fewest flips
"The problem is equivalent to nding the best arm in the multi-armed bandit problem using adaptive strategies."
bandit_problems  sequential_decisions  papers 
february 2012
On measure concentration of vector valued maps
Includes a 'domination' lemma which reduces (sharp) concentration of vector-valued functions to that of the one-dimensional projections.
concentration_of_measure  papers 
february 2012
Beyond Logarithmic Bounds in Online Learning
Smoothness + exp-concave loss allows online Newton to achieve regret bounded in terms of the "cumulative loss of the best point over the first T steps", which leads to regret no worse than O(log T), and much better if, for example, there's a perfect function.
online_learning  convexity  smoothness  papers 
february 2012
Matrix Concentration Inequalities via the Method of Exchangeable Pairs
"This paper derives exponential concentration inequalities and polynomial moment inequalities for the spectral norm of a random matrix. The analysis requires a matrix extension of the scalar concentration theory developed by Sourav Chatterjee using Stein’s method of exchangeable pairs. When applied to a sum of independent random matrices, this approach yields matrix generalizations of the classical inequalities due to Hoeffding, Bernstein, Khintchine, and Rosenthal. The same technique delivers bounds for sums of dependent random matrices and more general matrix-valued functions of dependent random variables."
deviation_inequalities  exchangeability  papers 
january 2012
Variations on the Berry-Esseen theorem
"We analyze the quality of the gaussian approximation to linear combinations of n independent, identically-distributed random variables with finite fourth moments. It turns out that there exist universal, simple linear combinations that perform better than the sum of the variables."
invariance_principles  central_limit_theorems  papers 
january 2012
The empirical method of Maurey
Approximate version of the Caratheodory theorem. A constructive version would be nice.
geometry  probabilistic_method 
january 2012
Quantitative geometry and efficient classification procedures
"This talk will illustrate to non-experts the use of geometric methods to design efficient partitioning algorithms for discrete objects or, in some cases, to prove that such algorithms must fail to perform well. These connections show that combinatorial optimization problems are intimately related to classical questions in continuous geometry, and we will describe some recent progress on old questions that translates to the best known results on algorithmic problems of central interest."
to_view 
january 2012
Fast Distributed Gradient Methods
Suppose I have a (strongly) convex function F(x) which is a sum of N (strongly) convex functions f_i(x). To optimize F using N networked computers, optimize f_i separately on each machine while averaging in gradients from the neighboring machines. To prove this works, show you are actually solving an optimization problem over N variables x_1,...,x_N in which there is a regularization term enforcing closeness of the x_i.
optimization  distributed_systems  papers  have_read 
january 2012
Hadoop AllReduce and Terascale Learning « Machine Learning (Theory)
Cool, but John's claim that it's "provably faster than any future single machine learning algorithm" is obviously wrong; just consider e.g. sublinear time algorithms.
large-scale_learning  folly 
january 2012
On a conjecture concerning the sum of independent Rademacher random variables
"It is shown that at least 50% of the probability mass of a sum of independent Rademacher random variables is within one standard deviation from its mean. This lower bound is sharp, it is much better than for instance the bound that can be obtained from application of the Chebishev inequality..."
probability  combinatorics  papers 
january 2012
« earlier      
academia accounting active_learning address_spaces algorithmic_game_theory algorithms amusing animals anticoncentration approximation_algorithms asset_pricing automotive_death_spiral bailout_nation bandit_problems barack books boolean_functions brecher bugs business caching capacity_control cars central_limit_theorems classification clustering cmu coding_theory combinatorial_optimization combinatorics comics compilers complexity compressed_sensing computational_efficiency computational_learning_theory computer_vision concentration_of_measure concurrency conferences convexity courses credit_collapse crime cryptography daniel dasgupta data_structures databases david debt debunking denninger dependent_data design deviation_inequalities dimension_reduction distributed_systems dynamical_systems economic_policy economics embeddings empirical_processes engineering ensemble_methods entertainment entitlements equality exploits fat_clients filesystems filetype:gif filetype:jpg filetype:pdf finance financial_speculation folly food forgotten fourier_analysis functional_analysis functional_programming funny:geeky funny:interweb funny:malicious funny:painful funny:silly funny:whimsical game_theory gary generic_chaining geometry google government government_intervention graph_theory graphical_models hacks hanneke hardware hashing have_read have_viewed heavy_tails history http hypothesis_testing i/o inanity incentives india information_design information_theory instrumentation intellectual_property interfaces internals investment javascript john journalism karl kernel_methods langford language languages large_deviations law learning_theory lectures linear_algebra linearity lipton logic lower_bounds machine_learning markets martingales mathematics mechanism_design media:document media:image memory metric_spaces michael military model_selection modeling music networking networks news obama online_algorithms online_learning optimization pac-bayes papers parallelism paul peter phase_transitions philosophy_of_science pittsburgh politics popular_science prediction presentations probability programming programming_languages proof_techniques proofs property_testing protocols psychology random_projections random_structures randomness reference regression regularization regulation reinforcement_learning reinvention research richard risk_assessment ryan sampling sanjoy science self_promotion sensitivity sequential_decisions software sparsity spectacles spectral_methods stability statistics steve stochastic_process_suprema stochastic_processes submodularity surveys taxes teaching technology terence theoretical_computer_science theory_to_practice to_read to_view trade trading type_systems typography unicode unique_games updated_content via:chl via:cshalizi via:mreid videos war weaponry web whiskey_tango_foxtrot

Copy this bookmark:



description:


tags: