cshalizi + individual_sequence_prediction   18

[1204.5721] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
"Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model."
to:NB  individual_sequence_prediction  online_learning  bandit_problems  re:knightian_uncertainty  low-regret_learning 
4 weeks ago by cshalizi
Online Learning and Online Convex Optimization
"Online learning is a well established learning paradigm which has both theoretical and practical appeals. The goal of online learning is to make a sequence of accurate predictions given knowledge of the correct answer to previous prediction tasks and possibly additional available information. Online learning has been studied in several research fields including game theory, information theory, and machine learning. It also became of great interest to practitioners due the recent emergence of large scale applications such as online advertisement placement and online web ranking. In this survey we provide a modern overview of online learning. Our goal is to give the reader a sense of some of the interesting ideas and in particular to underscore the centrality of convexity in deriving efficient online learning algorithms. We do not mean to be comprehensive but rather to give a high-level, rigorous yet easy to follow, survey."

Ungated version (via shivak): http://www.cs.huji.ac.il/~shais/papers/OLsurvey.pdf
to:NB  online_learning  individual_sequence_prediction  optimization  learning_theory  machine_learning  learning_in_games  low-regret_learning 
8 weeks ago by cshalizi
[1202.3323] A new look at shifting regret
We investigate extensions of well-known online learning algorithms such as fixed-share of Herbster and Warmuth (1998) or the methods proposed by Bousquet and Warmuth (2002). These algorithms use weight sharing schemes to perform as well as the best sequence of experts with a limited number of changes. Here we show, with a common, general, and simpler analysis, that weight sharing in fact achieves much more than what it was designed for. We use it to simultaneously prove new shifting regret bounds for online convex optimization on the simplex in terms of the total variation distance as well as new bounds for the related setting of adaptive regret. Finally, we exhibit the first logarithmic shifting bounds for exp-concave loss functions on the simplex.
online_learning  to_read  individual_sequence_prediction  non-stationarity  re:growing_ensemble_project  in_NB  low-regret_learning  have_read 
12 weeks ago by cshalizi
[math/0701419] Strategies for prediction under imperfect monitoring
"We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal. The proposed strategies are consistent in the sense that they achieve, asymptotically, the best possible average reward. It was Rustichini (1999) who first proved the existence of such consistent predictors. The forecasters presented here offer the first constructive proof of consistency. Moreover, the proposed algorithms are computationally efficient. We also establish upper bounds for the rates of convergence. In the case of deterministic feedback, these rates are optimal up to logarithmic terms."
to:NB  prediction  individual_sequence_prediction  learning_in_games  re:growing_ensemble_project 
february 2012 by cshalizi
Introduction to Online Optimization (Bubeck)
"to_teach" tag a sudden brainstorm for how to make next year's statistical computing class either unbeatably awesome or an absolute disaster
to:NB  online_learning  regression  individual_sequence_prediction  optimization  machine_learning  learning_theory  via:mraginsky  to_read  to_teach:statcomp 
december 2011 by cshalizi
[1111.6337] Regret Bound by Variation for Online Convex Optimization
"In citep{Hazan-2008-extract}, the authors showed that the regret of online linear optimization can be bounded by the total variation of the cost vectors. In this paper, we extend this result to general online convex optimization. We first analyze the limitations of the algorithm in citep{Hazan-2008-extract} when applied it to online convex optimization. We then present two algorithms for online convex optimization whose regrets are bounded by the variation of cost functions. We finally consider the bandit setting, and present a randomized algorithm for online bandit convex optimization with a variation-based regret bound. We show that the regret bound for online bandit convex optimization is optimal when the variation of cost functions is independent of the number of trials."
in_NB  to_read  re:growing_ensemble_project  learning_theory  individual_sequence_prediction 
december 2011 by cshalizi
Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback
The study of online convex optimization in the bandit setting was initiated by Kleinberg (2004) and Flaxman et al. (2005). Such a setting models a decision maker that has to make decisions in the face of adversarially chosen convex loss functions. Moreover, the only information the decision maker receives are the losses. The identity of the loss functions themselves is not revealed. In this setting, we reduce the gap between the best known lower and upper bounds for the class of smooth convex functions, i.e. convex functions with a Lipschitz continuous gradient. Building upon existing work on self-concordant regularizers and one-point gradient estimation, we give the first algorithm whose expected regret, ignoring constant and logarithmic factors, is O(T^{2/3}).
to:NB  decision_theory  learning_theory  machine_learning  bandit_problems  individual_sequence_prediction 
november 2011 by cshalizi
Adaptive Bandits: Towards the Best History-Dependent Strategy
"We consider multi-armed bandit games with possibly adaptive opponents. We introduce models Theta of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which define two learning scenarios: (1) The opponent is constrained, i.e.~he provides rewards that are stochastic functions of equivalence classes defined by some model theta* in Theta. The regret is measured with respect to (w.r.t.) the best history-dependent strategy. (2) The opponent is arbitrary and we measure the regret w.r.t.~the best strategy among all mappings from classes to actions (i.e.~the best history-class-based strategy) for the best model in Theta. This allows to model opponents (case 1) or strategies (case 2) which handles finite memory, periodicity, standard stochastic bandits and other situations. When Theta={theta}, i.e.~only one model is considered, we derive tractable algorithms achieving a tight regret (at time T) bounded by tilde O(sqrt{TAC}), where C is the number of classes of theta. Now, when many models are available, all known algorithms achieving a nice regret O(sqrt{T}) are unfortunately not tractable and scale poorly with the number of models $|Theta|$. Our contribution here is to provide tractable algorithms with regret bounded by T^{2/3}C^{1/3}log(|Theta|)^{1/2}. "
to:NB  decision_theory  learning_theory  machine_learning  bandit_problems  individual_sequence_prediction 
november 2011 by cshalizi
[1110.2755] Efficient Tracking of Large Classes of Experts
"In the framework for prediction of individual sequences, sequential prediction methods are to be constructed that perform nearly as well as the best expert from a given class. We consider prediction strategies that compete with the class of switching strategies that can segment a given sequence into several blocks, and follow the advice of a different "base" expert in each block. As usual, the performance of the algorithm is measured by the regret defined as the excess loss relative to the best switching strategy %(with an arbitrary number of switches) selected in hindsight for the particular sequence to be predicted. In this paper we construct %strongly sequential (i.e., horizon-independent) prediction strategies of low computational cost for the case where the set of base experts is large. In particular we derive a family of efficient tracking algorithms that, for any prediction algorithm $A$ designed for the base class, can be implemented with time and space complexity $O(n^{gamma} log n)$ times larger than that of $A$, where $n$ is the time horizon and $gamma ge 0$ is a parameter of the algorithm. With $A$ properly chosen, our algorithm achieves a regret bound of optimal order for $gamma>0$, and only $O(log n)$ times larger than the optimal order for $gamma=0$ for all typical regret bound types we examined. For example, for predicting binary sequences with switching parameters, our method achieves the optimal $O(log n)$ regret rate with time complexity $O(n^{1+gamma}log n)$ for any $gammain (0,1)$."
to:NB  to_read  re:growing_ensemble_project  learning_theory  individual_sequence_prediction  lugosi.gabor 
october 2011 by cshalizi
[1110.2529] The Generalization Ability of Online Algorithms for Dependent Data
"We study the generalization performance of arbitrary online learning algorithms trained on samples coming from a dependent source of data. We show that the generalization error of any stable online algorithm concentrates around its regret--an easily computable statistic of the online performance of the algorithm--when the underlying ergodic process is $beta$- or $phi$-mixing. We show high probability error bounds assuming the loss function is convex, and we also establish sharp convergence rates and deviation bounds for strongly convex losses and several linear prediction problems such as linear and logistic regression, least-squares SVM, and boosting on dependent data. In addition, our results have straightforward applications to stochastic optimization with dependent data, and our analysis requires only martingale convergence arguments; we need not rely on more powerful statistical tools such as empirical process theory."
in_NB  learning_theory  individual_sequence_prediction  ergodic_theory  mixing  re:growing_ensemble_project  re:XV_for_mixing  stability_of_learning  concentration_of_measure  have_read  re:your_favorite_dsge_sucks 
october 2011 by cshalizi
IEEE Xplore - On the generalization ability of on-line learning algorithms
"how to extract a hypothesis with small risk from the ensemble of hypotheses generated by an arbitrary on-line learning algorithm run on [IID data]. ... a simple large deviation argument [proves] tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic Mn associated with the on-line performance of the ensemble. Via sharp pointwise bounds on Mn, we then obtain risk tail bounds for kernel perceptron algorithms in terms of the spectrum of the empirical kernel matrix. ... A distinctive feature of our approach is that the key tools for our analysis come from the model of prediction of individual sequences; i.e., a model making no probabilistic assumptions on the source generating the data. In fact, these tools turn out to be so powerful that we only need very elementary statistical facts to obtain our final risk bounds." Bounced off this 2004; try again.
have_read  learning_theory  large_deviations  online_learning  individual_sequence_prediction  via:djm1107  re:your_favorite_dsge_sucks  re:XV_for_mixing  ensemble_methods  low-regret_learning 
july 2010 by cshalizi

Copy this bookmark:



description:


tags: