mraginsky + online-learning   21

[1204.5721] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.
papers  to-read  bandit-problems  online-learning  control-theory  dynamic-programming  decision-making 
4 weeks ago by mraginsky
"Online sequential optimization with biased gradients"
We study a class of stochastic optimization problems where though the objective functions
may not be convex, they satisfy a generalization of convexity, called the sequentially convex
property. We focus on a setting where the distribution of the underlying uncertainty is unknown and the manager must make a decision in real-time based on historical data. Since sequentially convex functions are not necessarily convex, they pose difficulties in applying standard adaptive methods for convex optimization. We propose a nonparametric algorithm based on a gradient descent method, and show that the T-season average expected cost di ffers from the minimum cost by at most O(1/sqrt(T)). Our analysis is based on a careful quanti cation of the bias that is inherent in gradient estimation due to the adaptive nature of the problem. We demonstrate the usefulness of the concept of sequential convexity by applying it to three canonical problems in inventory control, capacity allocation, and the lifetime buy decision, under the assumption that the manager does not know the demand distributions and has access only to historical sales (censored demand) data.
papers  to-read  online-learning  optimization  convex-programming 
february 2012 by mraginsky
[1105.2274] Data-Distributed Weighted Majority and Online Mirror Descent
In this paper, we focus on the question of the extent to which online learning can benefit from distributed computing. We focus on the setting in which $N$ agents online-learn cooperatively, where each agent only has access to its own data. We propose a generic data-distributed online learning meta-algorithm. We then introduce the Distributed Weighted Majority and Distributed Online Mirror Descent algorithms, as special cases. We show, using both theoretical analysis and experiments, that compared to a single agent: given the same computation time, these distributed algorithms achieve smaller generalization errors; and given the same generalization errors, they can be $N$ times faster.
papers  to-read  machine-learning  online-learning  optimization  distributed-systems 
february 2012 by mraginsky
Sebastien Bubeck, "Introduction to Online Optimization"
lecture notes for ORF570: Special Topics in Statistics and Operations Research, Prediction Games (Princeton)
lecture-notes  online-learning  optimization  prediction 
december 2011 by mraginsky
[1102.4442] Internal Regret with Partial Monitoring. Calibration-Based Optimal Algorithms
"We provide consistent random algorithms for sequential decision under partial monitoring, i.e. when the decision maker does not observe the outcomes but receives instead random feedback signals. Those algorithms have no internal regret in the sense that, on the set of stages where the decision maker chose his action according to a given law, the average payoff could not have been improved in average by using any other fixed law. They are based on a generalization of calibration, no longer defined in terms of a Voronoi diagram but instead of a Laguerre diagram (a more general concept). This allows us to bound, for the first time in this general framework, the expected average internal -- as well as the usual external -- regret at stage $n$ by $O(n^{-1/3})$, which is known to be optimal."
papers  to-read  learning-theory  online-learning  game-theory  optimization 
february 2011 by mraginsky
[1102.0710] Universal Prior Prediction for Communication
"We consider the problem of communicating over an unknown and arbitrarily varying channel, using feedback. This paper focuses on the problem of determining the input behavior, or more specifically, a prior which is used to randomly generate a codebook. We pose the problem of setting the prior as a universal sequential prediction problem using information theoretic abstractions of the communication channel. For the case where the channel is block-wise constant, we show it is possible to asymptotically approach the best rate that can be attained by any system using a fixed prior. For the case where the channel may change on each symbol, we combine a rateless coding scheme with a prior predictor and asymptotically approach the capacity of the average channel universally for every sequence of channels."
papers  to-read  information-theory  feedback-information-theory  prediction  online-learning 
february 2011 by mraginsky
[1011.1936] Blackwell Approachability and Low-Regret Learning are Equivalent
"We consider the celebrated Blackwell Approachability Theorem for two-player games with vector payoffs. We show that Blackwell's result is equivalent, via efficient reductions, to the existence of "no-regret" algorithms for Online Linear Optimization. Indeed, we show that any algorithm for one such problem can be efficiently converted into an algorithm for the other. We provide a useful application of this reduction: the first efficient algorithm for calibrated forecasting."
to-read  papers  online-learning  decision-making  game-theory  optimization 
november 2010 by mraginsky
[1008.4654] Freezing and Sleeping: Tracking Experts that Learn by Evolving Past Posteriors
Hot damn! "... in this paper we re-examine Freund's problem in case the experts have internal structure that enables them to learn. In this case the problem has two possible interpretations: should the experts learn from all data or only from the subsequence on which they are being tracked? The MPP algorithm solves the first case. Our contribution is to generalise MPP to address the second option. The results we obtain apply to any expert structure that can be formalised using (expert) hidden Markov models. Curiously enough, for our interpretation there are \emph{two} natural reference schemes: freezing and sleeping. For each scheme, we provide an efficient prediction strategy and prove the relevant loss bound."
papers  to-read  online-learning  decision-making  machine-learning  algorithms 
august 2010 by mraginsky
[1006.4039] Cooperative Autonomous Online Learning
Abstract: "Online learning is becoming increasingly popular for training on large datasets. However, the sequential nature of online learning requires a centralized learner to store data and update parameters. In this paper, we consider a fully decentralized setting, cooperative autonomous online learning, with a distributed data source. The learners perform learning with local parameters while periodically communicating with a small subset of neighbors to exchange information. We define the regret in terms of an implicit aggregated parameter of the learners for such a setting and prove regret bounds similar to the classical sequential online learning." Damn, looks like I've been scooped!
papers  have-read  decision-making  online-learning  distributed-systems 
june 2010 by mraginsky
[1006.1138] Online Learning: Random Averages, Combinatorial Parameters, and Learnability
"We study learnability in the online learning model. We define several complexity measures which capture the difficulty of learning in a sequential manner. Among these measures are analogues of Rademacher complexity, covering numbers and fat shattering dimension from statistical learning theory. Relationship among these complexity measures, their connection to online learning, and tools for bounding them are provided. In the setting of supervised learning, finiteness of the introduced scale-sensitive parameters is shown to be equivalent to learnability. The complexities we define also ensure uniform convergence non-i.i.d. data, extending the uniform Glivenko-Cantelli type results. We conclude by showing online learnability for an array of examples."
papers  to-read  learning-theory  online-learning  measure-concentration 
june 2010 by mraginsky

Copy this bookmark:



description:


tags: