mraginsky + online-learning 21
[1204.5721] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
4 weeks ago by mraginsky
Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.
papers
to-read
bandit-problems
online-learning
control-theory
dynamic-programming
decision-making
4 weeks ago by mraginsky
"Online sequential optimization with biased gradients"
february 2012 by mraginsky
We study a class of stochastic optimization problems where though the objective functions
may not be convex, they satisfy a generalization of convexity, called the sequentially convex
property. We focus on a setting where the distribution of the underlying uncertainty is unknown and the manager must make a decision in real-time based on historical data. Since sequentially convex functions are not necessarily convex, they pose difficulties in applying standard adaptive methods for convex optimization. We propose a nonparametric algorithm based on a gradient descent method, and show that the T-season average expected cost differs from the minimum cost by at most O(1/sqrt(T)). Our analysis is based on a careful quantication of the bias that is inherent in gradient estimation due to the adaptive nature of the problem. We demonstrate the usefulness of the concept of sequential convexity by applying it to three canonical problems in inventory control, capacity allocation, and the lifetime buy decision, under the assumption that the manager does not know the demand distributions and has access only to historical sales (censored demand) data.
papers
to-read
online-learning
optimization
convex-programming
may not be convex, they satisfy a generalization of convexity, called the sequentially convex
property. We focus on a setting where the distribution of the underlying uncertainty is unknown and the manager must make a decision in real-time based on historical data. Since sequentially convex functions are not necessarily convex, they pose difficulties in applying standard adaptive methods for convex optimization. We propose a nonparametric algorithm based on a gradient descent method, and show that the T-season average expected cost differs from the minimum cost by at most O(1/sqrt(T)). Our analysis is based on a careful quantication of the bias that is inherent in gradient estimation due to the adaptive nature of the problem. We demonstrate the usefulness of the concept of sequential convexity by applying it to three canonical problems in inventory control, capacity allocation, and the lifetime buy decision, under the assumption that the manager does not know the demand distributions and has access only to historical sales (censored demand) data.
february 2012 by mraginsky
[1105.2274] Data-Distributed Weighted Majority and Online Mirror Descent
february 2012 by mraginsky
In this paper, we focus on the question of the extent to which online learning can benefit from distributed computing. We focus on the setting in which $N$ agents online-learn cooperatively, where each agent only has access to its own data. We propose a generic data-distributed online learning meta-algorithm. We then introduce the Distributed Weighted Majority and Distributed Online Mirror Descent algorithms, as special cases. We show, using both theoretical analysis and experiments, that compared to a single agent: given the same computation time, these distributed algorithms achieve smaller generalization errors; and given the same generalization errors, they can be $N$ times faster.
papers
to-read
machine-learning
online-learning
optimization
distributed-systems
february 2012 by mraginsky
Sebastien Bubeck, "Introduction to Online Optimization"
december 2011 by mraginsky
lecture notes for ORF570: Special Topics in Statistics and Operations Research, Prediction Games (Princeton)
lecture-notes
online-learning
optimization
prediction
december 2011 by mraginsky
[1102.4442] Internal Regret with Partial Monitoring. Calibration-Based Optimal Algorithms
february 2011 by mraginsky
"We provide consistent random algorithms for sequential decision under partial monitoring, i.e. when the decision maker does not observe the outcomes but receives instead random feedback signals. Those algorithms have no internal regret in the sense that, on the set of stages where the decision maker chose his action according to a given law, the average payoff could not have been improved in average by using any other fixed law. They are based on a generalization of calibration, no longer defined in terms of a Voronoi diagram but instead of a Laguerre diagram (a more general concept). This allows us to bound, for the first time in this general framework, the expected average internal -- as well as the usual external -- regret at stage $n$ by $O(n^{-1/3})$, which is known to be optimal."
papers
to-read
learning-theory
online-learning
game-theory
optimization
february 2011 by mraginsky
[1102.0710] Universal Prior Prediction for Communication
february 2011 by mraginsky
"We consider the problem of communicating over an unknown and arbitrarily varying channel, using feedback. This paper focuses on the problem of determining the input behavior, or more specifically, a prior which is used to randomly generate a codebook. We pose the problem of setting the prior as a universal sequential prediction problem using information theoretic abstractions of the communication channel. For the case where the channel is block-wise constant, we show it is possible to asymptotically approach the best rate that can be attained by any system using a fixed prior. For the case where the channel may change on each symbol, we combine a rateless coding scheme with a prior predictor and asymptotically approach the capacity of the average channel universally for every sequence of channels."
papers
to-read
information-theory
feedback-information-theory
prediction
online-learning
february 2011 by mraginsky
[1011.1936] Blackwell Approachability and Low-Regret Learning are Equivalent
november 2010 by mraginsky
"We consider the celebrated Blackwell Approachability Theorem for two-player games with vector payoffs. We show that Blackwell's result is equivalent, via efficient reductions, to the existence of "no-regret" algorithms for Online Linear Optimization. Indeed, we show that any algorithm for one such problem can be efficiently converted into an algorithm for the other. We provide a useful application of this reduction: the first efficient algorithm for calibrated forecasting."
to-read
papers
online-learning
decision-making
game-theory
optimization
november 2010 by mraginsky
[1008.4654] Freezing and Sleeping: Tracking Experts that Learn by Evolving Past Posteriors
august 2010 by mraginsky
Hot damn! "... in this paper we re-examine Freund's problem in case the experts have internal structure that enables them to learn. In this case the problem has two possible interpretations: should the experts learn from all data or only from the subsequence on which they are being tracked? The MPP algorithm solves the first case. Our contribution is to generalise MPP to address the second option. The results we obtain apply to any expert structure that can be formalised using (expert) hidden Markov models. Curiously enough, for our interpretation there are \emph{two} natural reference schemes: freezing and sleeping. For each scheme, we provide an efficient prediction strategy and prove the relevant loss bound."
papers
to-read
online-learning
decision-making
machine-learning
algorithms
august 2010 by mraginsky
[1006.4039] Cooperative Autonomous Online Learning
june 2010 by mraginsky
Abstract: "Online learning is becoming increasingly popular for training on large datasets. However, the sequential nature of online learning requires a centralized learner to store data and update parameters. In this paper, we consider a fully decentralized setting, cooperative autonomous online learning, with a distributed data source. The learners perform learning with local parameters while periodically communicating with a small subset of neighbors to exchange information. We define the regret in terms of an implicit aggregated parameter of the learners for such a setting and prove regret bounds similar to the classical sequential online learning." Damn, looks like I've been scooped!
papers
have-read
decision-making
online-learning
distributed-systems
june 2010 by mraginsky
[1006.1138] Online Learning: Random Averages, Combinatorial Parameters, and Learnability
june 2010 by mraginsky
"We study learnability in the online learning model. We define several complexity measures which capture the difficulty of learning in a sequential manner. Among these measures are analogues of Rademacher complexity, covering numbers and fat shattering dimension from statistical learning theory. Relationship among these complexity measures, their connection to online learning, and tools for bounding them are provided. In the setting of supervised learning, finiteness of the introduced scale-sensitive parameters is shown to be equivalent to learnability. The complexities we define also ensure uniform convergence non-i.i.d. data, extending the uniform Glivenko-Cantelli type results. We conclude by showing online learnability for an array of examples."
papers
to-read
learning-theory
online-learning
measure-concentration
june 2010 by mraginsky
related tags
adaptive-systems ⊕ algorithms ⊕ bandit-problems ⊕ blogs ⊕ bounded-rationality ⊕ complexity ⊕ computation ⊕ control-theory ⊕ convex-programming ⊕ decision-making ⊕ distributed-systems ⊕ dynamic-programming ⊕ feedback-information-theory ⊕ filetype:pdf ⊕ game-theory ⊕ have-read ⊕ information-theory ⊕ learning-theory ⊕ lecture-notes ⊕ machine-learning ⊕ measure-concentration ⊕ media:document ⊕ online-learning ⊖ optimization ⊕ papers ⊕ prediction ⊕ statistical-learning ⊕ statistics ⊕ to-read ⊕ via:shivak ⊕Copy this bookmark: