bayesian dynamic programming
Recently Published Documents


TOTAL DOCUMENTS

6
(FIVE YEARS 2)

H-INDEX

3
(FIVE YEARS 1)

2020 ◽  
Vol 45 (3) ◽  
pp. 966-992
Author(s):  
Michael Jong Kim

Sequential Bayesian optimization constitutes an important and broad class of problems where model parameters are not known a priori but need to be learned over time using Bayesian updating. It is known that the solution to these problems can in principle be obtained by solving the Bayesian dynamic programming (BDP) equation. Although the BDP equation can be solved in certain special cases (for example, when posteriors have low-dimensional representations), solving this equation in general is computationally intractable and remains an open problem. A second unresolved issue with the BDP equation lies in its (rather generic) interpretation. Beyond the standard narrative of balancing immediate versus future costs—an interpretation common to all dynamic programs with or without learning—the BDP equation does not provide much insight into the underlying mechanism by which sequential Bayesian optimization trades off between learning (exploration) and optimization (exploitation), the distinguishing feature of this problem class. The goal of this paper is to develop good approximations (with error bounds) to the BDP equation that help address the issues of computation and interpretation. To this end, we show how the BDP equation can be represented as a tractable single-stage optimization problem that trades off between a myopic term and a “variance regularization” term that measures the total solution variability over the remaining planning horizon. Intuitively, the myopic term can be regarded as a pure exploitation objective that ignores the impact of future learning, whereas the variance regularization term captures a pure exploration objective that only puts value on solutions that resolve statistical uncertainty. We develop quantitative error bounds for this representation and prove that the error tends to zero like o(n-1) almost surely in the number of stages n, which as a corollary, establishes strong consistency of the approximate solution.


2019 ◽  
Vol 70 (8) ◽  
pp. 1332-1348 ◽  
Author(s):  
Mee Chi So ◽  
Christophe Mues ◽  
Adiel T. de Almeida Filho ◽  
Lyn C Thomas

1995 ◽  
Vol 9 (2) ◽  
pp. 269-284 ◽  
Author(s):  
Ulrich Rieder ◽  
Jürgen Weishaupt

A stochastic scheduling model with linear waiting costs and unknown routing probabilities is considered. Using a Bayesian approach and methods of Bayesian dynamic programming, we investigate the finite-horizon stochastic scheduling problem with incomplete information. In particular, we study an equivalent nonstationary bandit model and show the monotonicity of the total expected reward and of the Gittins index. We derive the monotonicity and well-known structural properties of the (greatest) maximizers, the so-called stay-on-a-winnerproperty and the stopping-property. The monotonicity results are based on a special partial ordering on .


1975 ◽  
Vol 7 (2) ◽  
pp. 330-348 ◽  
Author(s):  
Ulrich Rieder

We consider a non-stationary Bayesian dynamic decision model with general state, action and parameter spaces. It is shown that this model can be reduced to a non-Markovian (resp. Markovian) decision model with completely known transition probabilities. Under rather weak convergence assumptions on the expected total rewards some general results are presented concerning the restriction on deterministic generalized Markov policies, the criteria of optimality and the existence of Bayes policies. These facts are based on the above transformations and on results of Hindererand Schäl.


1975 ◽  
Vol 7 (02) ◽  
pp. 330-348 ◽  
Author(s):  
Ulrich Rieder

We consider a non-stationary Bayesian dynamic decision model with general state, action and parameter spaces. It is shown that this model can be reduced to a non-Markovian (resp. Markovian) decision model with completely known transition probabilities. Under rather weak convergence assumptions on the expected total rewards some general results are presented concerning the restriction on deterministic generalized Markov policies, the criteria of optimality and the existence of Bayes policies. These facts are based on the above transformations and on results of Hindererand Schäl.


Sign in / Sign up

Export Citation Format

Share Document