cshalizi + re:knightian_uncertainty 8
No-Regret Learning and a Mechanism for Distributed Multiagent Planning
20 days ago by cshalizi
"We develop a novel mechanism for coordinated, distributed multiagent planning. We consider problems stated as a col- lection of single-agent planning problems coupled by com- mon soft constraints on resource consumption. (Resources may be real or fictitious, the latter introduced as a tool for factoring the problem). A key idea is to recast the dis- tributed planning problem as learning in a repeated game between the original agents and a newly introduced group of adversarial agents who influence prices for the resources. The adversarial agents benefit from arbitrage: that is, their incentive is to uncover violations of the resource usage con- straints and, by selfishly adjusting prices, encourage the original agents to avoid plans that cause such violations. If all agents employ no-regret learning algorithms in the course of this repeated interaction, we are able to show that our mechanism can achieve design goals such as social op- timality (efficiency), budget balance, and Nash-equilibrium convergence to within an error which approaches zero as the agents gain experience. In particular, the agents’ average plans converge to a socially optimal solution for the original planning task. We present experiments in a simulated net- work routing domain demonstrating our method’s ability to reliably generate sound plans."
online_learning
economics
markets_as_collective_calculating_devices
re:knightian_uncertainty
gordon.geoff
to:NB
low-regret_learning
20 days ago by cshalizi
[1204.5721] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
4 weeks ago by cshalizi
"Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model."
to:NB
individual_sequence_prediction
online_learning
bandit_problems
re:knightian_uncertainty
low-regret_learning
4 weeks ago by cshalizi
[0810.3023] Iterated Regret Minimization: A More Realistic Solution Concept
february 2012 by cshalizi
"For some well-known games, such as the Traveler's Dilemma or the Centipede Game, traditional game-theoretic solution concepts--and most notably Nash equilibrium--predict outcomes that are not consistent with empirical observations. In this paper, we introduce a new solution concept, iterated regret minimization, which exhibits the same qualitative behavior as that observed in experiments in many games of interest, including Traveler's Dilemma, the Centipede Game, Nash bargaining, and Bertrand competition. As the name suggests, iterated regret minimization involves the iterated deletion of strategies that do not minimize regret."
--- Quite astonishingly, no mention at all of low-regret learning!
game_theory
online_learning
have_read
in_NB
halpern.joseph_y.
re:knightian_uncertainty
low-regret_learning
--- Quite astonishingly, no mention at all of low-regret learning!
february 2012 by cshalizi
Learning to Compete, Coordinate and Cooperate in Repeated Games Using Reinforcement Learning
march 2011 by cshalizi
"problem of learning in repeated general-sum matrix games when a learning algorithm can observe the actions but not the payoffs of its associates. ... non-stationarity of the environment caused by learning associates in these games, most state-of-the-art algorithms perform poorly ... due to an inability to make profitable compromises.=,,, agent must effectively balance competing objectives, including bounding losses, playing optimally with respect to current beliefs, and taking calculated, but profitable, risks. ... we present ... M-Qubed, a reinforcement learning algorithm ... balancing best-response, cautious, and optimistic learning biases... learns to make profitable compromises across a wide-range of repeated matrix games played with many kinds of learners... average payoffs meet or exceed its maximin value in the limit.., in two-player games... average payoffs approach the value of the Nash bargaining solution... robust behavior in round-robin and evolutionary tournaments..."
machine_learning
learning_in_games
reinforcement_learning
re:do-institutions-evolve
re:knightian_uncertainty
game_theory
march 2011 by cshalizi
Comparison of Information Structures
february 2011 by cshalizi
Preprint, subsequently published Games and economic Behavior 30 (2000): 44--63
game_theory
re:knightian_uncertainty
value_of_information
decision_theory
statistics
via:ded-maxim
to:NB
february 2011 by cshalizi
related tags
bandit_problems ⊕ books:noted ⊕ decision_theory ⊕ econometrics ⊕ economics ⊕ exploitation-exploration_tradeoff ⊕ game_theory ⊕ gordon.geoff ⊕ halpern.joseph_y. ⊕ have_read ⊕ individual_sequence_prediction ⊕ in_NB ⊕ learning_in_games ⊕ low-regret_learning ⊕ machine_learning ⊕ markets_as_collective_calculating_devices ⊕ online_learning ⊕ re:do-institutions-evolve ⊕ re:knightian_uncertainty ⊖ reinforcement_learning ⊕ risk_vs_uncertainty ⊕ social_science_methodology ⊕ statistics ⊕ to:NB ⊕ to_read ⊕ value_of_information ⊕ via:ded-maxim ⊕Copy this bookmark: