The Swiss Army Knife of Cryptography
4 weeks ago
Barak and Brakerski demonstrate the versatility of fully homorphic encryption.
cryptography
popular_science
4 weeks ago
An Entropic Proof of Chang's Inequality
4 weeks ago
Chang's lemma is a useful tool in additive combinatorics and the analysis of Boolean functions. Here we give an elementary proof using entropy. The constant we obtain is tight, and we give a slight improvement in the case where the variables are highly biased.
fourier_analysis
papers
4 weeks ago
A Geometric Approach to Sample Compression
4 weeks ago
"Two promising ways forward are: embedding maximal classes into maximum classes with at most a polynomial increase to VC dimension, and compression via operating on geometric representations. This paper presents positive results on the latter approach and a first negative result on the former, through a systematic investigation of finite maximum classes."
learning_theory
geometry
papers
to_read
4 weeks ago
Symbolic Variable Elimination for Discrete and Continuous Graphical Models
4 weeks ago
Reminds me of the face lattice.
data_structures
graphical_models
heard_the_talk
papers
4 weeks ago
Generalization Bounds and Consistency for Latent-Structural Probit and Ramp Loss
5 weeks ago
"Linear predictors are scale-insensitive — the prediction does not change when the weight vector defining the predictor is scaled up or down. This implies that direct regularization of the performance of a linear predictor with a scale sensitive regularizer (such as a norm of the weight vector) is meaningless. Linear predictors are typically learned by introducing a scale-sensitive surrogate loss function such as the hinge loss of an SVM. However, no convex surrogate loss function can be consistent in general — in finite dimension SVMs are not consistent. Here we generalize probit loss and ramp loss to the latent-structural setting and show that both of these loss functions are consistent in arbitrary dimension for an arbitrary bounded task loss. Empirical experience with probit loss and ramp loss will be briefly discussed."
convex_relaxations
machine_learning
videos
to_view
5 weeks ago
How Accurate is inv(A)*b?
5 weeks ago
"Several widely-used textbooks lead the reader to believe that solving a linear system of equations Ax = b by multiplying the vector b by a computed inverse inv(A) is inaccurate. Virtually all other textbooks on numerical analysis and numerical linear algebra advise against using computed inverses without stating whether this is accurate or not. In fact, under reasonable assumptions on how the inverse is computed, x = inv(A)*b is as accurate as the solution computed by the best backward-stable solvers. This fact is not new, but obviously obscure. We review the literature on the accuracy of this computation and present a self-contained numerical analysis of it."
numerical_methods
linear_algebra
5 weeks ago
Contextual Bandit Learning with Predictable Rewards
5 weeks ago
"Contextual bandit learning is a reinforcement learning problem where the learner repeatedly receives a set of features (context), takes an action and receives a reward based on the action and context. We consider this problem under a realizability assumption: there exists a function in a (known) function class, always capable of predicting the expected reward, given the action and context. Under this assumption, we show three things. We present a new algorithm---Regressor Elimination--- with a regret similar to the agnostic setting (i.e. in the absence of realizability assumption). We prove a new lower bound showing no algorithm can achieve superior performance in the worst case even with the realizability assumption. However, we do show that for any set of policies (mapping contexts to actions), there is a distribution over rewards (given context) such that our new algorithm has constant regret unlike the previous approaches."
bandit_problems
papers
5 weeks ago
On the Distribution of the Fourier Spectrum of Halfspaces
5 weeks ago
"Bourgain showed that any noise stable Boolean function $f$ can be well-approximated by a junta. In this note we give an exponential sharpening of the parameters of Bourgain's result under the additional assumption that $f$ is a halfspace."
sensitivity
boolean_functions
fourier_analysis
papers
5 weeks ago
Graphical Models for Bandit Problems
5 weeks ago
"We introduce a rich class of graphical models for multi-armed bandit problems that permit both the state or context space and the action space to be very large, yet succinctly specify the payoffs for any context-action pair. Our main result is an algorithm for such models whose regret is bounded by the number of parameters and whose running time depends only on the treewidth of the graph substructure induced by the action space."
graphical_models
bandit_problems
papers
5 weeks ago
Trace Lasso: a trace norm regularization for correlated designs
5 weeks ago
Reinterpreting familiar norms.
sparse_recovery
papers
5 weeks ago
Learning DNF Expressions from Fourier Spectrum
5 weeks ago
"We give a new, simple algorithm for approximating any polynomial-size DNF expression from its "heavy" low-degree Fourier coefficients alone. Our algorithm greatly simplifies the proof of learnability of DNF expressions over smoothed product distributions."
fourier_analysis
learning_theory
smoothed_analysis
papers
5 weeks ago
On the Equivalence between Herding and Conditional Gradient Algorithms
5 weeks ago
"We show that the herding procedure of Welling (2009) takes exactly the form of a standard convex optimization algorithm--namely a conditional gradient algorithm minimizing a quadratic moment discrepancy. This link enables us to invoke convergence results from convex optimization and to consider faster alternatives for the task of approximating integrals in a reproducing kernel Hilbert space. We study the behavior of the different variants through numerical simulations. The experiments indicate that while we can improve over herding on the task of approximating integrals, the original herding algorithm tends to approach more often the maximum entropy distribution, shedding more light on the learning bias behind herding."
optimization
metaheuristics
papers
5 weeks ago
A Structure Theorem for Poorly Anticoncentrated Gaussian Chaoses and Applications to the Study of Polynomial Threshold Functions
5 weeks ago
"We prove a structural result for degree-$d$ polynomials. In particular, we show that any degree-$d$ polynomial, $p$ can be approximated by another polynomial, $p_0$, which can be decomposed as some function of polynomials $q_1,...,q_m$ with $q_i$ normalized and $m=O_d(1)$, so that if $X$ is a Gaussian random variable, the probability distribution on $(q_1(X),...,q_m(X))$ does not have too much mass in any small box.
Using this result, we prove improved versions of a number of results about polynomial threshold functions, including producing better pseudorandom generators, obtaining a better invariance principle, and proving improved bounds on noise sensitivity."
anticoncentration
polynomials
papers
Using this result, we prove improved versions of a number of results about polynomial threshold functions, including producing better pseudorandom generators, obtaining a better invariance principle, and proving improved bounds on noise sensitivity."
5 weeks ago
The Kernelized Stochastic Batch Perceptron
5 weeks ago
"We present a novel approach for training kernel Support Vector Machines, establish learning runtime guarantees for our method that are better then those of any other known kernelized SVM optimization approach, and show that our method works well in practice compared to existing alternatives."
kernel_methods
machine_learning
papers
5 weeks ago
Estimating Unknown Sparsity in Compressed Sensing
5 weeks ago
"Although we show that estimation of ||x||_0 is generally intractable in this framework, we consider an alternative measure of sparsity s(x):=frac{|x|_1^2}{|x|_2^2}, which is a sharp lower bound on ||x||_0, and is more amenable to estimation. When $x$ is a non-negative vector, we propose a computationally efficient estimator hat{s}(x), and use non-asymptotic methods to bound the relative error of hat{s}(x) in terms of a finite number of measurements. Remarkably, the quality of estimation is emph{dimension-free}, which ensures that hat{s}(x) is well-suited to the high-dimensional regime where n<<p."
sparsity
papers
5 weeks ago
Relax and Localize: From Value to Algorithms
8 weeks ago
Local sequential Rademacher complexities.
online_learning
learning_theory
papers
8 weeks ago
Algebraic Geometric Comparison of Probability Distributions
8 weeks ago
"... treating the cumulants as elements of the polynomial ring."
algebraic_geometry
probability
papers
8 weeks ago
A new look at shifting regret
february 2012
"Here we show, with a common, general, and simpler analysis, that weight sharing in fact achieves much more than what it was designed for. We use it to simultaneously prove new shifting regret bounds for online convex optimization on the simplex in terms of the total variation distance as well as new bounds for the related setting of adaptive regret. Finally, we exhibit the first logarithmic shifting bounds for exp-concave loss functions on the simplex."
online_learning
papers
to_read
february 2012
Finding the most biased coin with fewest flips
february 2012
"The problem is equivalent to nding the best arm in the multi-armed bandit problem using adaptive strategies."
bandit_problems
sequential_decisions
papers
february 2012
On measure concentration of vector valued maps
february 2012
Includes a 'domination' lemma which reduces (sharp) concentration of vector-valued functions to that of the one-dimensional projections.
concentration_of_measure
papers
february 2012
Beyond Logarithmic Bounds in Online Learning
february 2012
Smoothness + exp-concave loss allows online Newton to achieve regret bounded in terms of the "cumulative loss of the best point over the first T steps", which leads to regret no worse than O(log T), and much better if, for example, there's a perfect function.
online_learning
convexity
smoothness
papers
february 2012
Matrix Concentration Inequalities via the Method of Exchangeable Pairs
january 2012
"This paper derives exponential concentration inequalities and polynomial moment inequalities for the spectral norm of a random matrix. The analysis requires a matrix extension of the scalar concentration theory developed by Sourav Chatterjee using Stein’s method of exchangeable pairs. When applied to a sum of independent random matrices, this approach yields matrix generalizations of the classical inequalities due to Hoeffding, Bernstein, Khintchine, and Rosenthal. The same technique delivers bounds for sums of dependent random matrices and more general matrix-valued functions of dependent random variables."
deviation_inequalities
exchangeability
papers
january 2012
Variations on the Berry-Esseen theorem
january 2012
"We analyze the quality of the gaussian approximation to linear combinations of n independent, identically-distributed random variables with finite fourth moments. It turns out that there exist universal, simple linear combinations that perform better than the sum of the variables."
invariance_principles
central_limit_theorems
papers
january 2012
The empirical method of Maurey
january 2012
Approximate version of the Caratheodory theorem. A constructive version would be nice.
geometry
probabilistic_method
january 2012
CMU 15-359: Probability and Computing
january 2012
I guess I'll be TAing it this semester.
to_teach
probability
cmu
courses
january 2012
Quantitative geometry and efficient classification procedures
january 2012
"This talk will illustrate to non-experts the use of geometric methods to design efficient partitioning algorithms for discrete objects or, in some cases, to prove that such algorithms must fail to perform well. These connections show that combinatorial optimization problems are intimately related to classical questions in continuous geometry, and we will describe some recent progress on old questions that translates to the best known results on algorithmic problems of central interest."
to_view
january 2012
Fast Distributed Gradient Methods
january 2012
Suppose I have a (strongly) convex function F(x) which is a sum of N (strongly) convex functions f_i(x). To optimize F using N networked computers, optimize f_i separately on each machine while averaging in gradients from the neighboring machines. To prove this works, show you are actually solving an optimization problem over N variables x_1,...,x_N in which there is a regularization term enforcing closeness of the x_i.
optimization
distributed_systems
papers
have_read
january 2012
Hadoop AllReduce and Terascale Learning « Machine Learning (Theory)
january 2012
Cool, but John's claim that it's "provably faster than any future single machine learning algorithm" is obviously wrong; just consider e.g. sublinear time algorithms.
large-scale_learning
folly
january 2012
On a conjecture concerning the sum of independent Rademacher random variables
january 2012
"It is shown that at least 50% of the probability mass of a sum of independent Rademacher random variables is within one standard deviation from its mean. This lower bound is sharp, it is much better than for instance the bound that can be obtained from application of the Chebishev inequality..."
probability
combinatorics
papers
january 2012
academia
accounting
active_learning
address_spaces
algorithmic_game_theory
algorithms
amusing
animals
anticoncentration
approximation_algorithms
asset_pricing
automotive_death_spiral
bailout_nation
bandit_problems
barack
books
boolean_functions
brecher
bugs
business
caching
capacity_control
cars
central_limit_theorems
classification
clustering
cmu
coding_theory
combinatorial_optimization
combinatorics
comics
compilers
complexity
compressed_sensing
computational_efficiency
computational_learning_theory
computer_vision
concentration_of_measure
concurrency
conferences
convexity
courses
credit_collapse
crime
cryptography
daniel
dasgupta
data_structures
databases
david
debt
debunking
denninger
dependent_data
design
deviation_inequalities
dimension_reduction
distributed_systems
dynamical_systems
economic_policy
economics
embeddings
empirical_processes
engineering
ensemble_methods
entertainment
entitlements
equality
exploits
fat_clients
filesystems
filetype:gif
filetype:jpg
filetype:pdf
finance
financial_speculation
folly
food
forgotten
fourier_analysis
functional_analysis
functional_programming
funny:geeky
funny:interweb
funny:malicious
funny:painful
funny:silly
funny:whimsical
game_theory
gary
generic_chaining
geometry
google
government
government_intervention
graph_theory
graphical_models
hacks
hanneke
hardware
hashing
have_read
have_viewed
heavy_tails
history
http
hypothesis_testing
i/o
inanity
incentives
india
information_design
information_theory
instrumentation
intellectual_property
interfaces
internals
investment
javascript
john
journalism
karl
kernel_methods
langford
language
languages
large_deviations
law
learning_theory
lectures
linear_algebra
linearity
lipton
logic
lower_bounds
machine_learning
markets
martingales
mathematics
mechanism_design
media:document
media:image
memory
metric_spaces
michael
military
model_selection
modeling
music
networking
networks
news
obama
online_algorithms
online_learning
optimization
pac-bayes
papers
parallelism
paul
peter
phase_transitions
philosophy_of_science
pittsburgh
politics
popular_science
prediction
presentations
probability
programming
programming_languages
proof_techniques
proofs
property_testing
protocols
psychology
random_projections
random_structures
randomness
reference
regression
regularization
regulation
reinforcement_learning
reinvention
research
richard
risk_assessment
ryan
sampling
sanjoy
science
self_promotion
sensitivity
sequential_decisions
software
sparsity
spectacles
spectral_methods
stability
statistics
steve
stochastic_process_suprema
stochastic_processes
submodularity
surveys
taxes
teaching
technology
terence
theoretical_computer_science
theory_to_practice
to_read
to_view
trade
trading
type_systems
typography
unicode
unique_games
updated_content
via:chl
via:cshalizi
via:mreid
videos
war
weaponry
web
whiskey_tango_foxtrot