cshalizi + online_learning 21
No-Regret Learning and a Mechanism for Distributed Multiagent Planning
20 days ago by cshalizi
"We develop a novel mechanism for coordinated, distributed multiagent planning. We consider problems stated as a col- lection of single-agent planning problems coupled by com- mon soft constraints on resource consumption. (Resources may be real or fictitious, the latter introduced as a tool for factoring the problem). A key idea is to recast the dis- tributed planning problem as learning in a repeated game between the original agents and a newly introduced group of adversarial agents who influence prices for the resources. The adversarial agents benefit from arbitrage: that is, their incentive is to uncover violations of the resource usage con- straints and, by selfishly adjusting prices, encourage the original agents to avoid plans that cause such violations. If all agents employ no-regret learning algorithms in the course of this repeated interaction, we are able to show that our mechanism can achieve design goals such as social op- timality (efficiency), budget balance, and Nash-equilibrium convergence to within an error which approaches zero as the agents gain experience. In particular, the agents’ average plans converge to a socially optimal solution for the original planning task. We present experiments in a simulated net- work routing domain demonstrating our method’s ability to reliably generate sound plans."
online_learning
economics
markets_as_collective_calculating_devices
re:knightian_uncertainty
gordon.geoff
to:NB
low-regret_learning
20 days ago by cshalizi
[1204.5721] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
4 weeks ago by cshalizi
"Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model."
to:NB
individual_sequence_prediction
online_learning
bandit_problems
re:knightian_uncertainty
low-regret_learning
4 weeks ago by cshalizi
Online Learning and Online Convex Optimization
8 weeks ago by cshalizi
"Online learning is a well established learning paradigm which has both theoretical and practical appeals. The goal of online learning is to make a sequence of accurate predictions given knowledge of the correct answer to previous prediction tasks and possibly additional available information. Online learning has been studied in several research fields including game theory, information theory, and machine learning. It also became of great interest to practitioners due the recent emergence of large scale applications such as online advertisement placement and online web ranking. In this survey we provide a modern overview of online learning. Our goal is to give the reader a sense of some of the interesting ideas and in particular to underscore the centrality of convexity in deriving efficient online learning algorithms. We do not mean to be comprehensive but rather to give a high-level, rigorous yet easy to follow, survey."
Ungated version (via shivak): http://www.cs.huji.ac.il/~shais/papers/OLsurvey.pdf
to:NB
online_learning
individual_sequence_prediction
optimization
learning_theory
machine_learning
learning_in_games
low-regret_learning
Ungated version (via shivak): http://www.cs.huji.ac.il/~shais/papers/OLsurvey.pdf
8 weeks ago by cshalizi
[1202.3323] A new look at shifting regret
12 weeks ago by cshalizi
We investigate extensions of well-known online learning algorithms such as fixed-share of Herbster and Warmuth (1998) or the methods proposed by Bousquet and Warmuth (2002). These algorithms use weight sharing schemes to perform as well as the best sequence of experts with a limited number of changes. Here we show, with a common, general, and simpler analysis, that weight sharing in fact achieves much more than what it was designed for. We use it to simultaneously prove new shifting regret bounds for online convex optimization on the simplex in terms of the total variation distance as well as new bounds for the related setting of adaptive regret. Finally, we exhibit the first logarithmic shifting bounds for exp-concave loss functions on the simplex.
online_learning
to_read
individual_sequence_prediction
non-stationarity
re:growing_ensemble_project
in_NB
low-regret_learning
have_read
12 weeks ago by cshalizi
[0810.3023] Iterated Regret Minimization: A More Realistic Solution Concept
february 2012 by cshalizi
"For some well-known games, such as the Traveler's Dilemma or the Centipede Game, traditional game-theoretic solution concepts--and most notably Nash equilibrium--predict outcomes that are not consistent with empirical observations. In this paper, we introduce a new solution concept, iterated regret minimization, which exhibits the same qualitative behavior as that observed in experiments in many games of interest, including Traveler's Dilemma, the Centipede Game, Nash bargaining, and Bertrand competition. As the name suggests, iterated regret minimization involves the iterated deletion of strategies that do not minimize regret."
--- Quite astonishingly, no mention at all of low-regret learning!
game_theory
online_learning
have_read
in_NB
halpern.joseph_y.
re:knightian_uncertainty
low-regret_learning
--- Quite astonishingly, no mention at all of low-regret learning!
february 2012 by cshalizi
[1202.3079] Towards minimax policies for online linear optimization with bandit feedback
february 2012 by cshalizi
"We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order $sqrt{d n log N}$ for any finite action set with $N$ actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous $sqrt{d}$ factor compared to previous works, and gives a regret bound of order $d sqrt{n log n}$ for any compact set of actions. Without further assumptions on the action set, this last bound is minimax optimal up to a logarithmic factor. Interestingly, our result also shows that the minimax regret for bandit linear optimization with expert advice in $d$ dimension is the same as for the basic $d$-armed bandit with expert advice. Our second contribution is to show how to use the Mirror Descent algorithm to obtain computationally efficient strategies with minimax optimal regret bounds in specific examples. More precisely we study two canonical action sets: the hypercube and the Euclidean ball. In the former case, we obtain the first computationally efficient algorithm with a $d sqrt{n}$ regret, thus improving by a factor $sqrt{d log n}$ over the best known result for a computationally efficient algorithm. In the latter case, our approach gives the first algorithm with a $sqrt{d n log n}$ regret, again shaving off an extraneous $sqrt{d}$ compared to previous works."
in_NB
online_learning
decision_theory
optimization
re:growing_ensemble_project
cesa-bianchi.nicolo
kakade.sham
bubeck.sebastien
february 2012 by cshalizi
Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
february 2012 by cshalizi
"We design an online algorithm for Principal Component Analysis. In each trial the current instance is centered and projected into a probabilistically chosen low dimensional subspace. The regret of our online algorithm, that is, the total expected quadratic compression loss of the online algorithm minus the total quadratic compression loss of the batch algorithm, is bounded by a term whose dependence on the dimension of the instances is only logarithmic.
"We first develop our methodology in the expert setting of online learning by giving an algorithm for learning as well as the best subset of experts of a certain size. This algorithm is then lifted to the matrix setting where the subsets of experts correspond to subspaces. The algorithm represents the uncertainty over the best subspace as a density matrix whose eigenvalues are bounded. The running time is O(n2) per trial, where n is the dimension of the instances."
to:NB
online_learning
dimension_reduction
machine_learning
learning_theory
warmuth.manfred
principal_components
low-regret_learning
"We first develop our methodology in the expert setting of online learning by giving an algorithm for learning as well as the best subset of experts of a certain size. This algorithm is then lifted to the matrix setting where the subsets of experts correspond to subspaces. The algorithm represents the uncertainty over the best subspace as a density matrix whose eigenvalues are bounded. The running time is O(n2) per trial, where n is the dimension of the instances."
february 2012 by cshalizi
Introduction to Online Optimization (Bubeck)
december 2011 by cshalizi
"to_teach" tag a sudden brainstorm for how to make next year's statistical computing class either unbeatably awesome or an absolute disaster
to:NB
online_learning
regression
individual_sequence_prediction
optimization
machine_learning
learning_theory
via:mraginsky
to_read
to_teach:statcomp
december 2011 by cshalizi
IEEE Xplore - Online Learning of Noisy Data
december 2011 by cshalizi
"We study online learning of linear and kernel-based predictors, when individual examples are corrupted by random noise, and both examples and noise type can be chosen adversarially and change over time. We begin with the setting where some auxiliary information on the noise distribution is provided, and we wish to learn predictors with respect to the squared loss. Depending on the auxiliary information, we show how one can learn linear and kernel-based predictors, using just 1 or 2 noisy copies of each example. We then turn to discuss a general setting where virtually nothing is known about the noise distribution, and one wishes to learn with respect to general losses and using linear and kernel-based predictors. We show how this can be achieved using a random, essentially constant number of noisy copies of each example. Allowing multiple copies cannot be avoided: Indeed, we show that the setting becomes impossible when only one noisy copy of each instance can be accessed. To obtain our results we introduce several novel techniques, some of which might be of independent interest."
to:NB
online_learning
filtering
kernel_methods
machine_learning
cesa-bianchi.nicolo
low-regret_learning
december 2011 by cshalizi
[1112.0076] Bandit Market Makers
december 2011 by cshalizi
"We propose a flexible framework for profit-seeking market making by combining cost function based automated market makers with bandit learning algorithms. The key idea is to consider each parametrisation of the cost function as a bandit arm, and the minimum expected profits from trades executed during a period as the rewards. This allows for the creation of market makers that can adjust liquidity and bid-asks spreads dynamically to maximise profits."
to:NB
online_learning
financial_markets
della_penna.nicholas
december 2011 by cshalizi
[1110.6416] Adaptive Hedge
october 2011 by cshalizi
"Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods."
to:NB
to_read
re:growing_ensemble_project
online_learning
prediction
grunwald.peter
low-regret_learning
october 2011 by cshalizi
[1107.4080] On the Universality of Online Mirror Descent
july 2011 by cshalizi
"We show that for a general class of convex online learning problems, Mirror Descent can always achieve a (nearly) optimal regret guarantee."
optimization
learning_theory
online_learning
low-regret_learning
july 2011 by cshalizi
IEEE Xplore - On the generalization ability of on-line learning algorithms
july 2010 by cshalizi
"how to extract a hypothesis with small risk from the ensemble of hypotheses generated by an arbitrary on-line learning algorithm run on [IID data]. ... a simple large deviation argument [proves] tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic Mn associated with the on-line performance of the ensemble. Via sharp pointwise bounds on Mn, we then obtain risk tail bounds for kernel perceptron algorithms in terms of the spectrum of the empirical kernel matrix. ... A distinctive feature of our approach is that the key tools for our analysis come from the model of prediction of individual sequences; i.e., a model making no probabilistic assumptions on the source generating the data. In fact, these tools turn out to be so powerful that we only need very elementary statistical facts to obtain our final risk bounds." Bounced off this 2004; try again.
have_read
learning_theory
large_deviations
online_learning
individual_sequence_prediction
via:djm1107
re:your_favorite_dsge_sucks
re:XV_for_mixing
ensemble_methods
low-regret_learning
july 2010 by cshalizi
[1005.2296] Online Learning of Noisy Data with Kernels
may 2010 by cshalizi
"We study online learning when individual instances are corrupted by random noise. We assume the noise distribution is unknown, and may change over time with no restriction other than having zero mean and bounded variance. Our technique relies on a family of unbiased estimators for non-linear functions, which may be of independent interest. We show that a variant of online gradient descent can learn functions in any dot-product (e.g., polynomial) or Gaussian kernel space with any analytic convex loss function. Our variant uses randomized estimates that need to query a random number of noisy copies of each instance, where with high probability this number is upper bounded by a constant. Allowing such multiple queries cannot be avoided: Indeed, we show that online learning is in general impossible when only one noisy copy of each instance can be accessed."
learning_theory
regression
online_learning
kernel_methods
may 2010 by cshalizi
Sequential Probability Assignment Via Online Convex Programming Using Exponential Families (Raginsky, Marcia, Silva and Willett)
october 2009 by cshalizi
Today's seminar. Very cool.
have_read
statistics
statistical_inference_for_stochastic_processes
exponential_families
information_theory
prediction
minimax
optimization
to:blog
raginsky.maxim
online_learning
willett.rebecca
low-regret-learning
in_NB
october 2009 by cshalizi
[0812.3973] Revisiting R\'ev\'esz's stochastic approximation method for the estimation of a regression function
january 2009 by cshalizi
A recursive/on-line kernel regression estimator, with proofs that it's about as efficient as the off-line/batch-mode Nadaraya-Watson estimator. Sounds cool...
nonparametrics
regression
kernel_estimators
stochastic_approximation
online_learning
to_read
to_teach:data-mining
january 2009 by cshalizi
Online Learning of Complex Prediction Problems Using Simultaneous Projections
july 2008 by cshalizi
"framework for online classification where each online trial consists of multiple prediction tasks that are tied together"
machine_learning
classifiers
prediction
online_learning
to_read
july 2008 by cshalizi
related tags
bandit_problems ⊕ bubeck.sebastien ⊕ cesa-bianchi.nicolo ⊕ classifiers ⊕ decision_theory ⊕ della_penna.nicholas ⊕ dimension_reduction ⊕ economics ⊕ ensemble_methods ⊕ exponential_families ⊕ filtering ⊕ financial_markets ⊕ game_theory ⊕ gordon.geoff ⊕ grunwald.peter ⊕ halpern.joseph_y. ⊕ have_read ⊕ individual_sequence_prediction ⊕ information_theory ⊕ in_NB ⊕ kakade.sham ⊕ kernel_estimators ⊕ kernel_methods ⊕ large_deviations ⊕ learning_in_games ⊕ learning_theory ⊕ low-regret-learning ⊕ low-regret_learning ⊕ machine_learning ⊕ markets_as_collective_calculating_devices ⊕ minimax ⊕ non-stationarity ⊕ nonparametrics ⊕ online_learning ⊖ optimization ⊕ prediction ⊕ principal_components ⊕ raginsky.maxim ⊕ re:growing_ensemble_project ⊕ re:knightian_uncertainty ⊕ re:XV_for_mixing ⊕ re:your_favorite_dsge_sucks ⊕ regression ⊕ stability_of_learning ⊕ statistical_inference_for_stochastic_processes ⊕ statistics ⊕ stochastic_approximation ⊕ to:blog ⊕ to:NB ⊕ to_read ⊕ to_teach:data-mining ⊕ to_teach:statcomp ⊕ via:djm1107 ⊕ via:mraginsky ⊕ warmuth.manfred ⊕ willett.rebecca ⊕Copy this bookmark: