cshalizi + online_learning   21

No-Regret Learning and a Mechanism for Distributed Multiagent Planning
"We develop a novel mechanism for coordinated, distributed multiagent planning. We consider problems stated as a col- lection of single-agent planning problems coupled by com- mon soft constraints on resource consumption. (Resources may be real or fictitious, the latter introduced as a tool for factoring the problem). A key idea is to recast the dis- tributed planning problem as learning in a repeated game between the original agents and a newly introduced group of adversarial agents who influence prices for the resources. The adversarial agents benefit from arbitrage: that is, their incentive is to uncover violations of the resource usage con- straints and, by selfishly adjusting prices, encourage the original agents to avoid plans that cause such violations. If all agents employ no-regret learning algorithms in the course of this repeated interaction, we are able to show that our mechanism can achieve design goals such as social op- timality (efficiency), budget balance, and Nash-equilibrium convergence to within an error which approaches zero as the agents gain experience. In particular, the agents’ average plans converge to a socially optimal solution for the original planning task. We present experiments in a simulated net- work routing domain demonstrating our method’s ability to reliably generate sound plans."
online_learning  economics  markets_as_collective_calculating_devices  re:knightian_uncertainty  gordon.geoff  to:NB  low-regret_learning 
20 days ago by cshalizi
[1204.5721] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
"Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model."
to:NB  individual_sequence_prediction  online_learning  bandit_problems  re:knightian_uncertainty  low-regret_learning 
4 weeks ago by cshalizi
Online Learning and Online Convex Optimization
"Online learning is a well established learning paradigm which has both theoretical and practical appeals. The goal of online learning is to make a sequence of accurate predictions given knowledge of the correct answer to previous prediction tasks and possibly additional available information. Online learning has been studied in several research fields including game theory, information theory, and machine learning. It also became of great interest to practitioners due the recent emergence of large scale applications such as online advertisement placement and online web ranking. In this survey we provide a modern overview of online learning. Our goal is to give the reader a sense of some of the interesting ideas and in particular to underscore the centrality of convexity in deriving efficient online learning algorithms. We do not mean to be comprehensive but rather to give a high-level, rigorous yet easy to follow, survey."

Ungated version (via shivak): http://www.cs.huji.ac.il/~shais/papers/OLsurvey.pdf
to:NB  online_learning  individual_sequence_prediction  optimization  learning_theory  machine_learning  learning_in_games  low-regret_learning 
8 weeks ago by cshalizi
[1202.3323] A new look at shifting regret
We investigate extensions of well-known online learning algorithms such as fixed-share of Herbster and Warmuth (1998) or the methods proposed by Bousquet and Warmuth (2002). These algorithms use weight sharing schemes to perform as well as the best sequence of experts with a limited number of changes. Here we show, with a common, general, and simpler analysis, that weight sharing in fact achieves much more than what it was designed for. We use it to simultaneously prove new shifting regret bounds for online convex optimization on the simplex in terms of the total variation distance as well as new bounds for the related setting of adaptive regret. Finally, we exhibit the first logarithmic shifting bounds for exp-concave loss functions on the simplex.
online_learning  to_read  individual_sequence_prediction  non-stationarity  re:growing_ensemble_project  in_NB  low-regret_learning  have_read 
12 weeks ago by cshalizi
[0810.3023] Iterated Regret Minimization: A More Realistic Solution Concept
"For some well-known games, such as the Traveler's Dilemma or the Centipede Game, traditional game-theoretic solution concepts--and most notably Nash equilibrium--predict outcomes that are not consistent with empirical observations. In this paper, we introduce a new solution concept, iterated regret minimization, which exhibits the same qualitative behavior as that observed in experiments in many games of interest, including Traveler's Dilemma, the Centipede Game, Nash bargaining, and Bertrand competition. As the name suggests, iterated regret minimization involves the iterated deletion of strategies that do not minimize regret."

--- Quite astonishingly, no mention at all of low-regret learning!
game_theory  online_learning  have_read  in_NB  halpern.joseph_y.  re:knightian_uncertainty  low-regret_learning 
february 2012 by cshalizi
[1202.3079] Towards minimax policies for online linear optimization with bandit feedback
"We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order $sqrt{d n log N}$ for any finite action set with $N$ actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous $sqrt{d}$ factor compared to previous works, and gives a regret bound of order $d sqrt{n log n}$ for any compact set of actions. Without further assumptions on the action set, this last bound is minimax optimal up to a logarithmic factor. Interestingly, our result also shows that the minimax regret for bandit linear optimization with expert advice in $d$ dimension is the same as for the basic $d$-armed bandit with expert advice. Our second contribution is to show how to use the Mirror Descent algorithm to obtain computationally efficient strategies with minimax optimal regret bounds in specific examples. More precisely we study two canonical action sets: the hypercube and the Euclidean ball. In the former case, we obtain the first computationally efficient algorithm with a $d sqrt{n}$ regret, thus improving by a factor $sqrt{d log n}$ over the best known result for a computationally efficient algorithm. In the latter case, our approach gives the first algorithm with a $sqrt{d n log n}$ regret, again shaving off an extraneous $sqrt{d}$ compared to previous works."
in_NB  online_learning  decision_theory  optimization  re:growing_ensemble_project  cesa-bianchi.nicolo  kakade.sham  bubeck.sebastien 
february 2012 by cshalizi
Randomized Online PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension
"We design an online algorithm for Principal Component Analysis. In each trial the current instance is centered and projected into a probabilistically chosen low dimensional subspace. The regret of our online algorithm, that is, the total expected quadratic compression loss of the online algorithm minus the total quadratic compression loss of the batch algorithm, is bounded by a term whose dependence on the dimension of the instances is only logarithmic.
"We first develop our methodology in the expert setting of online learning by giving an algorithm for learning as well as the best subset of experts of a certain size. This algorithm is then lifted to the matrix setting where the subsets of experts correspond to subspaces. The algorithm represents the uncertainty over the best subspace as a density matrix whose eigenvalues are bounded. The running time is O(n2) per trial, where n is the dimension of the instances."
to:NB  online_learning  dimension_reduction  machine_learning  learning_theory  warmuth.manfred  principal_components  low-regret_learning 
february 2012 by cshalizi
Introduction to Online Optimization (Bubeck)
"to_teach" tag a sudden brainstorm for how to make next year's statistical computing class either unbeatably awesome or an absolute disaster
to:NB  online_learning  regression  individual_sequence_prediction  optimization  machine_learning  learning_theory  via:mraginsky  to_read  to_teach:statcomp 
december 2011 by cshalizi
IEEE Xplore - Online Learning of Noisy Data
"We study online learning of linear and kernel-based predictors, when individual examples are corrupted by random noise, and both examples and noise type can be chosen adversarially and change over time. We begin with the setting where some auxiliary information on the noise distribution is provided, and we wish to learn predictors with respect to the squared loss. Depending on the auxiliary information, we show how one can learn linear and kernel-based predictors, using just 1 or 2 noisy copies of each example. We then turn to discuss a general setting where virtually nothing is known about the noise distribution, and one wishes to learn with respect to general losses and using linear and kernel-based predictors. We show how this can be achieved using a random, essentially constant number of noisy copies of each example. Allowing multiple copies cannot be avoided: Indeed, we show that the setting becomes impossible when only one noisy copy of each instance can be accessed. To obtain our results we introduce several novel techniques, some of which might be of independent interest."
to:NB  online_learning  filtering  kernel_methods  machine_learning  cesa-bianchi.nicolo  low-regret_learning 
december 2011 by cshalizi
[1112.0076] Bandit Market Makers
"We propose a flexible framework for profit-seeking market making by combining cost function based automated market makers with bandit learning algorithms. The key idea is to consider each parametrisation of the cost function as a bandit arm, and the minimum expected profits from trades executed during a period as the rewards. This allows for the creation of market makers that can adjust liquidity and bid-asks spreads dynamically to maximise profits."
to:NB  online_learning  financial_markets  della_penna.nicholas 
december 2011 by cshalizi
[1110.6416] Adaptive Hedge
"Most methods for decision-theoretic online learning are based on the Hedge algorithm, which takes a parameter called the learning rate. In most previous analyses the learning rate was carefully tuned to obtain optimal worst-case performance, leading to suboptimal performance on easy instances, for example when there exists an action that is significantly better than all others. We propose a new way of setting the learning rate, which adapts to the difficulty of the learning problem: in the worst case our procedure still guarantees optimal performance, but on easy instances it achieves much smaller regret. In particular, our adaptive method achieves constant regret in a probabilistic setting, when there exists an action that on average obtains strictly smaller loss than all other actions. We also provide a simulation study comparing our approach to existing methods."
to:NB  to_read  re:growing_ensemble_project  online_learning  prediction  grunwald.peter  low-regret_learning 
october 2011 by cshalizi
[1107.4080] On the Universality of Online Mirror Descent
"We show that for a general class of convex online learning problems, Mirror Descent can always achieve a (nearly) optimal regret guarantee."
optimization  learning_theory  online_learning  low-regret_learning 
july 2011 by cshalizi
IEEE Xplore - On the generalization ability of on-line learning algorithms
"how to extract a hypothesis with small risk from the ensemble of hypotheses generated by an arbitrary on-line learning algorithm run on [IID data]. ... a simple large deviation argument [proves] tight data-dependent bounds for the risk of this hypothesis in terms of an easily computable statistic Mn associated with the on-line performance of the ensemble. Via sharp pointwise bounds on Mn, we then obtain risk tail bounds for kernel perceptron algorithms in terms of the spectrum of the empirical kernel matrix. ... A distinctive feature of our approach is that the key tools for our analysis come from the model of prediction of individual sequences; i.e., a model making no probabilistic assumptions on the source generating the data. In fact, these tools turn out to be so powerful that we only need very elementary statistical facts to obtain our final risk bounds." Bounced off this 2004; try again.
have_read  learning_theory  large_deviations  online_learning  individual_sequence_prediction  via:djm1107  re:your_favorite_dsge_sucks  re:XV_for_mixing  ensemble_methods  low-regret_learning 
july 2010 by cshalizi
[1005.2296] Online Learning of Noisy Data with Kernels
"We study online learning when individual instances are corrupted by random noise. We assume the noise distribution is unknown, and may change over time with no restriction other than having zero mean and bounded variance. Our technique relies on a family of unbiased estimators for non-linear functions, which may be of independent interest. We show that a variant of online gradient descent can learn functions in any dot-product (e.g., polynomial) or Gaussian kernel space with any analytic convex loss function. Our variant uses randomized estimates that need to query a random number of noisy copies of each instance, where with high probability this number is upper bounded by a constant. Allowing such multiple queries cannot be avoided: Indeed, we show that online learning is in general impossible when only one noisy copy of each instance can be accessed."
learning_theory  regression  online_learning  kernel_methods 
may 2010 by cshalizi
[0812.3973] Revisiting R\'ev\'esz's stochastic approximation method for the estimation of a regression function
A recursive/on-line kernel regression estimator, with proofs that it's about as efficient as the off-line/batch-mode Nadaraya-Watson estimator. Sounds cool...
nonparametrics  regression  kernel_estimators  stochastic_approximation  online_learning  to_read  to_teach:data-mining 
january 2009 by cshalizi
Online Learning of Complex Prediction Problems Using Simultaneous Projections
"framework for online classification where each online trial consists of multiple prediction tasks that are tied together"
machine_learning  classifiers  prediction  online_learning  to_read 
july 2008 by cshalizi

Copy this bookmark:



description:


tags: