cshalizi + cesa-bianchi.nicolo 3
[1202.3079] Towards minimax policies for online linear optimization with bandit feedback
february 2012 by cshalizi
"We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order $sqrt{d n log N}$ for any finite action set with $N$ actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous $sqrt{d}$ factor compared to previous works, and gives a regret bound of order $d sqrt{n log n}$ for any compact set of actions. Without further assumptions on the action set, this last bound is minimax optimal up to a logarithmic factor. Interestingly, our result also shows that the minimax regret for bandit linear optimization with expert advice in $d$ dimension is the same as for the basic $d$-armed bandit with expert advice. Our second contribution is to show how to use the Mirror Descent algorithm to obtain computationally efficient strategies with minimax optimal regret bounds in specific examples. More precisely we study two canonical action sets: the hypercube and the Euclidean ball. In the former case, we obtain the first computationally efficient algorithm with a $d sqrt{n}$ regret, thus improving by a factor $sqrt{d log n}$ over the best known result for a computationally efficient algorithm. In the latter case, our approach gives the first algorithm with a $sqrt{d n log n}$ regret, again shaving off an extraneous $sqrt{d}$ compared to previous works."
in_NB
online_learning
decision_theory
optimization
re:growing_ensemble_project
cesa-bianchi.nicolo
kakade.sham
bubeck.sebastien
february 2012 by cshalizi
IEEE Xplore - Online Learning of Noisy Data
december 2011 by cshalizi
"We study online learning of linear and kernel-based predictors, when individual examples are corrupted by random noise, and both examples and noise type can be chosen adversarially and change over time. We begin with the setting where some auxiliary information on the noise distribution is provided, and we wish to learn predictors with respect to the squared loss. Depending on the auxiliary information, we show how one can learn linear and kernel-based predictors, using just 1 or 2 noisy copies of each example. We then turn to discuss a general setting where virtually nothing is known about the noise distribution, and one wishes to learn with respect to general losses and using linear and kernel-based predictors. We show how this can be achieved using a random, essentially constant number of noisy copies of each example. Allowing multiple copies cannot be avoided: Indeed, we show that the setting becomes impossible when only one noisy copy of each instance can be accessed. To obtain our results we introduce several novel techniques, some of which might be of independent interest."
to:NB
online_learning
filtering
kernel_methods
machine_learning
cesa-bianchi.nicolo
low-regret_learning
december 2011 by cshalizi
Prediction, Learning, and Games - Cesa-Bianch and Lugosi (@Labyrinth)
january 2008 by cshalizi
How to predict an individual sequence nearly as well as the best possible predictor would, without any probabilistic assumptions.
books:recommended
statistics
machine_learning
universal_prediction
information_theory
learning_in_games
cesa-bianchi.nicolo
lugosi.gabor
january 2008 by cshalizi
related tags
books:recommended ⊕ bubeck.sebastien ⊕ cesa-bianchi.nicolo ⊖ decision_theory ⊕ filtering ⊕ information_theory ⊕ in_NB ⊕ kakade.sham ⊕ kernel_methods ⊕ learning_in_games ⊕ low-regret_learning ⊕ lugosi.gabor ⊕ machine_learning ⊕ online_learning ⊕ optimization ⊕ re:growing_ensemble_project ⊕ statistics ⊕ to:NB ⊕ universal_prediction ⊕Copy this bookmark: