mraginsky + control-theory 73
[1204.5721] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
4 weeks ago by mraginsky
Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.
papers
to-read
bandit-problems
online-learning
control-theory
dynamic-programming
decision-making
4 weeks ago by mraginsky
"Approximate dynamic programming via Bellman inequalities" (Wang and Boyd)
6 weeks ago by mraginsky
"In this paper we introduce new methods for finding functions that lower bound
the value function of a stochastic control problem, using an iterated form of the Bellman inequality. Our method is based on solving linear or semidefinite programs, and
produces both a bound on the optimal objective, as well as a suboptimal policy that
appears to works very well. These results extend and improve bounds obtained by
authors in a previous paper using a single Bellman inequality condition. We describe
the methods in a general setting, and show how they can be applied in specific cases
including the finite state case, constrained linear quadratic control, switched affine
control, and multi-period portfolio investment."
papers
to-read
control-theory
dynamic-programming
optimization
the value function of a stochastic control problem, using an iterated form of the Bellman inequality. Our method is based on solving linear or semidefinite programs, and
produces both a bound on the optimal objective, as well as a suboptimal policy that
appears to works very well. These results extend and improve bounds obtained by
authors in a previous paper using a single Bellman inequality condition. We describe
the methods in a general setting, and show how they can be applied in specific cases
including the finite state case, constrained linear quadratic control, switched affine
control, and multi-period portfolio investment."
6 weeks ago by mraginsky
[1203.4626] Active Sequential Hypothesis Testing
10 weeks ago by mraginsky
Consider a decision maker who is responsible to dynamically collect observations so as to enhance his information in a speedy manner about an underlying phenomena of interest while accounting for the penalty of wrong declaration. The special cases of the problem are shown to be that of variable-length coding with feedback and noisy dynamic search. Due to the sequential nature of the problem, the decision maker relies on his current information state to adaptively select the most "informative" sensing action among the available ones.
In this paper, using results in dynamic programming, a lower bound for the optimal total cost is established. Moreover, upper bounds are obtained via an analysis of heuristic policies for dynamic selection of actions. It is shown that the proposed heuristics achieve asymptotic optimality in many practically relevant problems including the problems of variable-length coding with feedback and noisy dynamic search; where asymptotic optimality implies that the relative difference between the total cost achieved by the proposed policies and the optimal total cost approaches zero as the penalty of wrong declaration or the number of hypotheses (hence the number of collected samples) increases. Furthermore, using the obtained bounds, the gain of adaptive selection of sensing actions is shown to be at least logarithmic in the penalty associated with wrong declarations.
papers
to-read
control-theory
information-theory
adaptive-systems
heard-the-talk
In this paper, using results in dynamic programming, a lower bound for the optimal total cost is established. Moreover, upper bounds are obtained via an analysis of heuristic policies for dynamic selection of actions. It is shown that the proposed heuristics achieve asymptotic optimality in many practically relevant problems including the problems of variable-length coding with feedback and noisy dynamic search; where asymptotic optimality implies that the relative difference between the total cost achieved by the proposed policies and the optimal total cost approaches zero as the penalty of wrong declaration or the number of hypotheses (hence the number of collected samples) increases. Furthermore, using the obtained bounds, the gain of adaptive selection of sensing actions is shown to be at least logarithmic in the penalty associated with wrong declarations.
10 weeks ago by mraginsky
Certainty equivalence and model uncertainty
february 2012 by mraginsky
Simon’s and Theil’s certainty equivalence property justifies a convenient algorithm for solving dynamic programming problems with quadratic objectives and linear transition laws: first, optimize under perfect foresight, then substitute optimal forecasts for unknown future values. A similar decomposition into separate optimization and forecasting steps prevails when a decision maker wants a decision rule that is robust to model misspecification. Concerns about model misspecification leave the first step of the algorithm intact and affect only the second step of forecasting the future. The decision maker attains robustness by making forecasts with a distorted model that twists probabilities relative to his approximating model. The appropriate twisting emerges from a two-player zero-sum dynamic game.
papers
to-read
adaptive-systems
control-theory
dynamic-programming
estimation
game-theory
uncertainty
february 2012 by mraginsky
Acemoglu, Ozdaglar, Tahbaz-Salehi: "Cascades in networks and aggregate volatility"
june 2011 by mraginsky
"We provide a general framework for the study of cascade effects created by interconnections between sectors, firms or financial institutions. Focusing on a multi-sector economy linked through a supply network, we show how structural properties of the supply network determine both whether aggregate volatility disappears as the number of sectors increases (i.e., whether the law of large numbers holds) and when it does, the rate at which this happens. Our main results characterize the relationship between first-order interconnections (captured by the weighted degree sequence in the graph induced by the input-output relations) and aggregate volatility, and more importantly, the relationship between higher-order interconnections and aggregate volatility ..."
papers
to-read
economics
complex-systems
control-theory
filetype:pdf
media:document
june 2011 by mraginsky
[1103.3005] The Separation Principle in Stochastic Control, Redux
march 2011 by mraginsky
"Over the last 50 years a steady stream of accounts have been written on the separation principle of stochastic control. Even in the context of the linear-quadratic regulator in continuous time with Gaussian white noise, subtle difficulties arise, unexpected by many, that are often overlooked. In this paper we provide a conceptual framework that clarifies pitfalls and possibilities. We also provide a generalizations of the separation theorem to a wide class of feedback laws, models and stochastic noise, including semimartingales with possible jumps."
papers
to-read
control-theory
feedback
dynamical-systems
march 2011 by mraginsky
[0909.0996] The Kalman Like Particle Filter : Optimal Estimation With Quantized Innovations/Measurements
january 2011 by mraginsky
"We study the problem of optimal estimation and control of linear systems using quantized measurements, with a focus on applications over sensor networks. We show that the state conditioned on a causal quantization of the measurements can be expressed as the sum of a Gaussian random vector and a certain truncated Gaussian vector. This structure bears close resemblance to the full information Kalman filter and so allows us to effectively combine the Kalman structure with a particle filter to recursively compute the state estimate. ..."
papers
to-read
estimation
quantization
control-theory
january 2011 by mraginsky
[1012.4863] Dynamical quorum-sensing and synchronization of nonlinear oscillators coupled through an external medium
january 2011 by mraginsky
"Many biological and physical systems exhibit population-density dependent transitions to synchronized oscillations in a process often termed "dynamical quorum sensing". Synchronization frequently arises through chemical communication via signaling molecules distributed through an external media. We study a simple theoretical model for dynamical quorum sensing: a heterogenous population of limit-cycle oscillators diffusively coupled through a common media. We show that this model exhibits a rich phase diagram with four qualitatively distinct mechanisms fueling population-dependent transitions to global oscillations, including a new type of transition we term "dynamic death". We derive a single pair of analytic equations that allows us to calculate all phase boundaries as a function of population density and show that the model reproduces many of the qualitative features of recent experiments of BZ catalytic particles as well as synthetically engineered bacteria."
papers
to-read
biology
dynamical-systems
cells
control-theory
feedback
january 2011 by mraginsky
Decision Making in Large Scale Systems (Spring 2004)
january 2011 by mraginsky
This course is an introduction to the theory and application of large-scale dynamic programming. Topics include Markov decision processes, dynamic programming algorithms, simulation-based algorithms, theory and algorithms for value function approximation, and policy search methods. The course examines games and applications in areas such as dynamic resource allocation, finance and queueing networks.
lecture-notes
control-theory
dynamic-programming
optimization
via:stochastix
january 2011 by mraginsky
[1011.2952] Balanced Reduction of Nonlinear Control Systems in Reproducing Kernel Hilbert Space
november 2010 by mraginsky
"We introduce a novel data-driven order reduction method for nonlinear control systems, drawing on recent progress in machine learning and statistical dimensionality reduction. The method rests on the assumption that the nonlinear system behaves linearly when lifted into a high (or infinite) dimensional feature space where balanced truncation may be carried out implicitly. This leads to a nonlinear reduction map which can be combined with a representation of the system belonging to a reproducing kernel Hilbert space to give a closed, reduced order dynamical system which captures the essential input-output characteristics of the original model. Empirical simulations illustrating the approach are also provided."
papers
to-read
heard-the-talk
control-theory
machine-learning
dynamical-systems
november 2010 by mraginsky
[1009.3958] Approximate Inference and Stochastic Optimal Control
september 2010 by mraginsky
"We propose a novel reformulation of the stochastic optimal control problem as an approximate inference problem, demonstrating, that such a interpretation leads to new practical methods for the original problem. In particular we characterise a novel class of iterative solutions to the stochastic optimal control problem based on a natural relaxation of the exact dual formulation. These theoretical insights are applied to the Reinforcement Learning problem where they lead to new model free, off policy methods for discrete and continuous problems."
papers
to-read
control-theory
reinforcement-learning
optimization
information-theory
graphical-models
september 2010 by mraginsky
[1007.5354] Synchronization and Control in Intrinsic and Designed Computation: An Information-Theoretic Analysis of Competing Models of Stochastic Computation
august 2010 by mraginsky
"We adapt tools from information theory to analyze how an observer comes to synchronize with the hidden states of a finitary, stationary stochastic process. We show that synchronization is determined by both the process's internal organization and by an observer's model of it. We analyze these components using the convergence of state-block and block-state entropies, comparing them to the previously known convergence properties of the Shannon block entropy. Along the way, we introduce a hierarchy of information quantifiers as derivatives and integrals of these entropies, which parallels a similar hierarchy introduced for block entropy. We also draw out the duality between synchronization properties and a process's controllability. The tools lead to a new classification of a process's alternative representations in terms of minimality, synchronizability, and unifilarity."
papers
to-read
information-theory
control-theory
complexity
computation
august 2010 by mraginsky
Probability Collectives (David H. Wolpert)
july 2010 by mraginsky
"We recently proved that game theory and statistical physics are identical when cast in terms of information theory.
We call the associated formalism Probability Collectives (PC). PC opens many new lines of research, and provides new approaches to problems in distributed control and distributed optimization."
research
papers
reference
control-theory
distributed-systems
decision-making
game-theory
statistical-physics
optimization
We call the associated formalism Probability Collectives (PC). PC opens many new lines of research, and provides new approaches to problems in distributed control and distributed optimization."
july 2010 by mraginsky
Generalizing Apprenticeship Learning across Hypothesis Classes .oO(ML Discuss)
july 2010 by mraginsky
"This paper develops a generalized apprenticeship learning protocol for reinforcement-learning agents with access to a teacher who provides policy traces (transition and reward observations). We characterize sufficient conditions of the underlying models for efficient apprenticeship learning and link this criteria to two established learnability classes (KWIK and Mistake Bound). We then construct efficient apprenticeship-learning algorithms in a number of domains, including two types of relational MDPs. We instantiate our approach in a software agent and a robot agent that learn effectively from a human teacher."
papers
to-read
machine-learning
reinforcement-learning
control-theory
decision-making
july 2010 by mraginsky
CWI Seminar Control and System Theory - 2008 Fall
july 2010 by mraginsky
Nice list of references on decentralized control and team decision problems.
reference
control-theory
decentralized-control
decision-making
game-theory
distributed-systems
july 2010 by mraginsky
[0912.2385] Closing the Learning-Planning Loop with Predictive State Representations
december 2009 by mraginsky
Byron Boots, Sajid M. Siddiqi, Geoffrey J. Gordon
papers
to-read
control-theory
learning-theory
AI
december 2009 by mraginsky
[0910.2065v1] Decentralized Multi-Armed Bandit with Multiple Distributed Players (Keqin Liu, Qing Zhao)
november 2009 by mraginsky
From the abstract: "We consider multi-armed bandit with distributed players, where each player independently samples one of N stochastic processes with unknown parameters and accrues reward in each slot without information exchange. Users choosing the same arm collide, and none or only one receives reward depending on the collision model. This problem can be formulated as a decentralized multi-armed bandit problem."
papers
to-read
distributed-systems
algorithms
optimization
decision-making
decentralized-control
control-theory
november 2009 by mraginsky
[0910.4955] On the Structure of Real-Time Encoders and Decoders in a Multi-Terminal Communication System
october 2009 by mraginsky
Ashutosh Nayyar, Demosthenis Teneketzis
papers
to-read
communications
control-theory
information-theory
october 2009 by mraginsky
[0901.0633] Optimal control as a graphical model inference problem
september 2009 by mraginsky
B. Kappen, V. Gomez, M. Opper
papers
have-read
graphical-models
control-theory
optimization
september 2009 by mraginsky
Hilbert Space Embeddings of Conditional Distributions with Applications to Dynamical Systems (2009) [ICML Discussion]
june 2009 by mraginsky
Song, Le and Huang, Jonathan and Smola, Alex and Fukumizu, Kenji
papers
to-read
dynamical-systems
learning-theory
control-theory
machine-learning
june 2009 by mraginsky
[0905.3369] Learning Nonlinear Dynamic Models
june 2009 by mraginsky
John Langford, Ruslan Salakhutdinov, Tong Zhang; to appear in ICML 2009
papers
to-read
dynamical-systems
learning-theory
control-theory
machine-learning
june 2009 by mraginsky
[0904.2311] Source Coding with a Side Information "Vending Machine"
april 2009 by mraginsky
Tsachy Weissman and Haim H. Permuter
papers
have-read
information-theory
control-theory
april 2009 by mraginsky
Control Techniques for Complex Networks
november 2007 by mraginsky
Forthcoming book by Sean Meyn
books
statistics
network-data-analysis
control-theory
signal-processing
probability
november 2007 by mraginsky
related tags
adaptive-systems ⊕ AI ⊕ algorithms ⊕ bandit-problems ⊕ behavioral-control ⊕ biology ⊕ books ⊕ cells ⊕ communication ⊕ communications ⊕ complex-systems ⊕ complexity ⊕ computation ⊕ computer-science ⊕ computer-vision ⊕ conferences ⊕ consensus-algorithms ⊕ control-theory ⊖ convex-programming ⊕ cybernetics ⊕ decentralized-control ⊕ decision-making ⊕ distributed-computing ⊕ distributed-systems ⊕ dynamic-programming ⊕ dynamical-systems ⊕ economics ⊕ energy ⊕ environment ⊕ estimation ⊕ evolution ⊕ feedback ⊕ feedback-information-theory ⊕ filetype:pdf ⊕ game-theory ⊕ graphical-models ⊕ have-read ⊕ heard-the-talk ⊕ homepages ⊕ information-theory ⊕ learning-theory ⊕ lecture-notes ⊕ linear-systems ⊕ machine-learning ⊕ mathematics ⊕ media:document ⊕ multiagent-systems ⊕ network-data-analysis ⊕ neuroscience ⊕ online-learning ⊕ optimization ⊕ papers ⊕ people ⊕ probability ⊕ quantization ⊕ re:distributed_decisions_project ⊕ reference ⊕ reinforcement-learning ⊕ research ⊕ robotics ⊕ sensor-networks ⊕ signal-processing ⊕ statistical-physics ⊕ statistics ⊕ sustainability ⊕ system-identification ⊕ to-read ⊕ uncertainty ⊕ via:cshalizi ⊕ via:stochastix ⊕Copy this bookmark: