cshalizi + individual_sequence_prediction 18
[1204.5721] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
4 weeks ago by cshalizi
"Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model."
to:NB
individual_sequence_prediction
online_learning
bandit_problems
re:knightian_uncertainty
low-regret_learning
4 weeks ago by cshalizi
Online Learning and Online Convex Optimization
8 weeks ago by cshalizi
"Online learning is a well established learning paradigm which has both theoretical and practical appeals. The goal of online learning is to make a sequence of accurate predictions given knowledge of the correct answer to previous prediction tasks and possibly additional available information. Online learning has been studied in several research fields including game theory, information theory, and machine learning. It also became of great interest to practitioners due the recent emergence of large scale applications such as online advertisement placement and online web ranking. In this survey we provide a modern overview of online learning. Our goal is to give the reader a sense of some of the interesting ideas and in particular to underscore the centrality of convexity in deriving efficient online learning algorithms. We do not mean to be comprehensive but rather to give a high-level, rigorous yet easy to follow, survey."
Ungated version (via shivak): http://www.cs.huji.ac.il/~shais/papers/OLsurvey.pdf
to:NB
online_learning
individual_sequence_prediction
optimization
learning_theory
machine_learning
learning_in_games
low-regret_learning
Ungated version (via shivak): http://www.cs.huji.ac.il/~shais/papers/OLsurvey.pdf
8 weeks ago by cshalizi
[1202.3323] A new look at shifting regret
12 weeks ago by cshalizi
We investigate extensions of well-known online learning algorithms such as fixed-share of Herbster and Warmuth (1998) or the methods proposed by Bousquet and Warmuth (2002). These algorithms use weight sharing schemes to perform as well as the best sequence of experts with a limited number of changes. Here we show, with a common, general, and simpler analysis, that weight sharing in fact achieves much more than what it was designed for. We use it to simultaneously prove new shifting regret bounds for online convex optimization on the simplex in terms of the total variation distance as well as new bounds for the related setting of adaptive regret. Finally, we exhibit the first logarithmic shifting bounds for exp-concave loss functions on the simplex.
online_learning
to_read
individual_sequence_prediction
non-stationarity
re:growing_ensemble_project
in_NB
low-regret_learning
have_read
12 weeks ago by cshalizi
[math/0701419] Strategies for prediction under imperfect monitoring
february 2012 by cshalizi
"We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal. The proposed strategies are consistent in the sense that they achieve, asymptotically, the best possible average reward. It was Rustichini (1999) who first proved the existence of such consistent predictors. The forecasters presented here offer the first constructive proof of consistency. Moreover, the proposed algorithms are computationally efficient. We also establish upper bounds for the rates of convergence. In the case of deterministic feedback, these rates are optimal up to logarithmic terms."
to:NB
prediction
individual_sequence_prediction
learning_in_games
re:growing_ensemble_project
february 2012 by cshalizi
Introduction to Online Optimization (Bubeck)
december 2011 by cshalizi
"to_teach" tag a sudden brainstorm for how to make next year's statistical computing class either unbeatably awesome or an absolute disaster
to:NB
online_learning
regression
individual_sequence_prediction
optimization
machine_learning
learning_theory
via:mraginsky
to_read
to_teach:statcomp
december 2011 by cshalizi
[1111.6337] Regret Bound by Variation for Online Convex Optimization
december 2011 by cshalizi
"In citep{Hazan-2008-extract}, the authors showed that the regret of online linear optimization can be bounded by the total variation of the cost vectors. In this paper, we extend this result to general online convex optimization. We first analyze the limitations of the algorithm in citep{Hazan-2008-extract} when applied it to online convex optimization. We then present two algorithms for online convex optimization whose regrets are bounded by the variation of cost functions. We finally consider the bandit setting, and present a randomized algorithm for online bandit convex optimization with a variation-based regret bound. We show that the regret bound for online bandit convex optimization is optimal when the variation of cost functions is independent of the number of trials."
in_NB
to_read
re:growing_ensemble_project
learning_theory
individual_sequence_prediction
december 2011 by cshalizi
Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback
november 2011 by cshalizi
The study of online convex optimization in the bandit setting was initiated by Kleinberg (2004) and Flaxman et al. (2005). Such a setting models a decision maker that has to make decisions in the face of adversarially chosen convex loss functions. Moreover, the only information the decision maker receives are the losses. The identity of the loss functions themselves is not revealed. In this setting, we reduce the gap between the best known lower and upper bounds for the class of smooth convex functions, i.e. convex functions with a Lipschitz continuous gradient. Building upon existing work on self-concordant regularizers and one-point gradient estimation, we give the first algorithm whose expected regret, ignoring constant and logarithmic factors, is O(T^{2/3}).
to:NB
decision_theory
learning_theory
machine_learning
bandit_problems
individual_sequence_prediction
november 2011 by cshalizi
Adaptive Bandits: Towards the Best History-Dependent Strategy
november 2011 by cshalizi
"We consider multi-armed bandit games with possibly adaptive opponents. We introduce models Theta of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which define two learning scenarios: (1) The opponent is constrained, i.e.~he provides rewards that are stochastic functions of equivalence classes defined by some model theta* in Theta. The regret is measured with respect to (w.r.t.) the best history-dependent strategy. (2) The opponent is arbitrary and we measure the regret w.r.t.~the best strategy among all mappings from classes to actions (i.e.~the best history-class-based strategy) for the best model in Theta. This allows to model opponents (case 1) or strategies (case 2) which handles finite memory, periodicity, standard stochastic bandits and other situations. When Theta={theta}, i.e.~only one model is considered, we derive tractable algorithms achieving a tight regret (at time T) bounded by tilde O(sqrt{TAC}), where C is the number of classes of theta. Now, when many models are available, all known algorithms achieving a nice regret O(sqrt{T}) are unfortunately not tractable and scale poorly with the number of models $|Theta|$. Our contribution here is to provide tractable algorithms with regret bounded by T^{2/3}C^{1/3}log(|Theta|)^{1/2}. "
to:NB
decision_theory
learning_theory
machine_learning
bandit_problems
individual_sequence_prediction
november 2011 by cshalizi
[1110.2755] Efficient Tracking of Large Classes of Experts
october 2011 by cshalizi
"In the framework for prediction of individual sequences, sequential prediction methods are to be constructed that perform nearly as well as the best expert from a given class. We consider prediction strategies that compete with the class of switching strategies that can segment a given sequence into several blocks, and follow the advice of a different "base" expert in each block. As usual, the performance of the algorithm is measured by the regret defined as the excess loss relative to the best switching strategy %(with an arbitrary number of switches) selected in hindsight for the particular sequence to be predicted. In this paper we construct %strongly sequential (i.e., horizon-independent) prediction strategies of low computational cost for the case where the set of base experts is large. In particular we derive a family of efficient tracking algorithms that, for any prediction algorithm $A$ designed for the base class, can be implemented with time and space complexity $O(n^{gamma} log n)$ times larger than that of $A$, where $n$ is the time horizon and $gamma ge 0$ is a parameter of the algorithm. With $A$ properly chosen, our algorithm achieves a regret bound of optimal order for $gamma>0$, and only $O(log n)$ times larger than the optimal order for $gamma=0$ for all typical regret bound types we examined. For example, for predicting binary sequences with switching parameters, our method achieves the optimal $O(log n)$ regret rate with time complexity $O(n^{1+gamma}log n)$ for any $gammain (0,1)$."
to:NB
to_read
re:growing_ensemble_project
learning_theory
individual_sequence_prediction
lugosi.gabor
october 2011 by cshalizi
[1110.2529] The Generalization Ability of Online Algorithms for Dependent Data
october 2011 by cshalizi
"We study the generalization performance of arbitrary online learning algorithms trained on samples coming from a dependent source of data. We show that the generalization error of any stable online algorithm concentrates around its regret--an easily computable statistic of the online performance of the algorithm--when the underlying ergodic process is $beta$- or $phi$-mixing. We show high probability error bounds assuming the loss function is convex, and we also establish sharp convergence rates and deviation bounds for strongly convex losses and several linear prediction problems such as linear and logistic regression, least-squares SVM, and boosting on dependent data. In addition, our results have straightforward applications to stochastic optimization with dependent data, and our analysis requires only martingale convergence arguments; we need not rely on more powerful statistical tools such as empirical process theory."
in_NB
learning_theory
individual_sequence_prediction
ergodic_theory
mixing
re:growing_ensemble_project
re:XV_for_mixing
stability_of_learning
concentration_of_measure
have_read
re:your_favorite_dsge_sucks
october 2011 by cshalizi
[1102.2836] Finite-Memory Universal Prediction of Individual Continuous Sequences
february 2011 by cshalizi
It's hard for me to tell from the abstract what the contrast class is here.
universal_prediction
individual_sequence_prediction
to:NB
february 2011 by cshalizi
Extracting certainty from uncertainty: regret bounded by variation in costs
july 2010 by cshalizi
Cool, but not sure it works really for our setting.
learning_theory
individual_sequence_prediction
ensemble_methods
re:growing_ensemble_project
have_read
july 2010 by cshalizi
IEEE Xplore - On the generalization ability of on-line learning algorithms
july 2010 by cshalizi
"how to extract a hypothesis with small risk from the ensemble of hypotheses generated by an arbitrary on-line learning algorithm run on [IID data]. ... a simple large deviation argument [proves] tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic Mn associated with the on-line performance of the ensemble. Via sharp pointwise bounds on Mn, we then obtain risk tail bounds for kernel perceptron algorithms in terms of the spectrum of the empirical kernel matrix. ... A distinctive feature of our approach is that the key tools for our analysis come from the model of prediction of individual sequences; i.e., a model making no probabilistic assumptions on the source generating the data. In fact, these tools turn out to be so powerful that we only need very elementary statistical facts to obtain our final risk bounds." Bounced off this 2004; try again.
have_read
learning_theory
large_deviations
online_learning
individual_sequence_prediction
via:djm1107
re:your_favorite_dsge_sucks
re:XV_for_mixing
ensemble_methods
low-regret_learning
july 2010 by cshalizi
related tags
bandit_problems ⊕ calibration ⊕ concentration_of_measure ⊕ decision_theory ⊕ ensemble_methods ⊕ ergodic_theory ⊕ have_read ⊕ individual_sequence_prediction ⊖ information_theory ⊕ in_NB ⊕ large_deviations ⊕ learning_in_games ⊕ learning_theory ⊕ low-regret_learning ⊕ lugosi.gabor ⊕ machine_learning ⊕ mixing ⊕ non-stationarity ⊕ online_learning ⊕ optimization ⊕ prediction ⊕ raginsky.maxim ⊕ re:growing_ensemble_project ⊕ re:knightian_uncertainty ⊕ re:XV_for_mixing ⊕ re:your_favorite_dsge_sucks ⊕ regression ⊕ self-promotion ⊕ stability_of_learning ⊕ time_series ⊕ to:NB ⊕ to_read ⊕ to_teach:statcomp ⊕ universal_prediction ⊕ via:djm1107 ⊕ via:mraginsky ⊕Copy this bookmark: