Learning Sequential Decision Tasks for Robot Manipulation with Abstract Markov Decision Processes and Demonstration-Guided Exploration

AbstractPartially-Observable Markov Decision Processes (POMDPs) are a well-known stochastic model for sequential decision making under limited information. We consider the EXPTIME-hard problem of synthesising policies that almost-surely reach some goal state without ever visiting a bad state. In particular, we are interested in computing the winning region, that is, the set of system configurations from which a policy exists that satisfies the reachability specification. A direct application of such a winning region is the safe exploration of POMDPs by, for instance, restricting the behavior of a reinforcement learning agent to the region. We present two algorithms: A novel SAT-based iterative approach and a decision-diagram based alternative. The empirical evaluation demonstrates the feasibility and efficacy of the approaches.

Download Full-text

Quantile Markov Decision Processes

Operations Research ◽

10.1287/opre.2021.2123 ◽

2021 ◽

Author(s):

Xiaocheng Li ◽

Huaiyang Zhong ◽

Margaret L. Brandeau

Keyword(s):

Markov Decision Process ◽

Markov Decision Processes ◽

Decision Process ◽

Value At Risk ◽

Infinite Horizon ◽

Decision Processes ◽

Conditional Value At Risk ◽

Sequential Decision ◽

Optimal Drug ◽

Markov Decision

Title: Sequential Decision Making Using Quantiles The goal of a traditional Markov decision process (MDP) is to maximize the expectation of cumulative reward over a finite or infinite horizon. In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward. For example, a physician may want to determine the optimal drug regime for a risk-averse patient with the objective of maximizing the 0.10 quantile of the cumulative reward; this is the cumulative improvement in health that is expected to occur with at least 90% probability for the patient. In “Quantile Markov Decision Processes,” X. Li, H. Zhong, and M. Brandeau provide analytic results to solve the quantile Markov decision process (QMDP) problem. They develop an efficient dynamic programming procedure that finds the optimal QMDP value function for all states and quantiles in one pass. The algorithm also extends to the MDP problem with a conditional value-at-risk objective.

Download Full-text

Sensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters

Journal of Applied Probability ◽

10.1017/s002190020000855x ◽

2011 ◽

Vol 48 (04) ◽

pp. 954-967 ◽

Cited By ~ 1

Author(s):

Chin Hon Tan ◽

Joseph C. Hartman

Keyword(s):

Sensitivity Analysis ◽

Markov Decision Processes ◽

Lot Sizing ◽

Optimal Solution ◽

Decision Processes ◽

Model Parameters ◽

Sequential Decision ◽

Estimation Errors ◽

Bellman Equations ◽

Markov Decision

Sequential decision problems can often be modeled as Markov decision processes. Classical solution approaches assume that the parameters of the model are known. However, model parameters are usually estimated and uncertain in practice. As a result, managers are often interested in how estimation errors affect the optimal solution. In this paper we illustrate how sensitivity analysis can be performed directly for a Markov decision process with uncertain reward parameters using the Bellman equations. In particular, we consider problems involving (i) a single stationary parameter, (ii) multiple stationary parameters, and (iii) multiple nonstationary parameters. We illustrate the applicability of this work through a capacitated stochastic lot-sizing problem.

Download Full-text

Solving Transition Independent Decentralized Markov Decision Processes

Journal of Artificial Intelligence Research ◽

10.1613/jair.1497 ◽

2004 ◽

Vol 22 ◽

pp. 423-455 ◽

Cited By ~ 51

Author(s):

R. Becker ◽

S. Zilberstein ◽

V. Lesser ◽

C. V. Goldman

Keyword(s):

Markov Decision Processes ◽

Optimal Algorithm ◽

Decision Processes ◽

Specific Class ◽

Multi Agent Systems ◽

Sequential Decision ◽

Anytime Algorithm ◽

Reward Function ◽

Markov Decision ◽

Multi Agent

Formal treatment of collaborative multi-agent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents' transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To our best knowledge, this is the first algorithm to optimally solve a non-trivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms.

Download Full-text

Sensitivity Analysis in Markov Decision Processes with Uncertain Reward Parameters

Journal of Applied Probability ◽

10.1239/jap/1324046012 ◽

2011 ◽

Vol 48 (4) ◽

pp. 954-967 ◽

Cited By ~ 7

Author(s):

Chin Hon Tan ◽

Joseph C. Hartman

Keyword(s):

Sensitivity Analysis ◽

Markov Decision Processes ◽

Lot Sizing ◽

Optimal Solution ◽

Decision Processes ◽

Model Parameters ◽

Sequential Decision ◽

Estimation Errors ◽

Bellman Equations ◽

Markov Decision

Sequential decision problems can often be modeled as Markov decision processes. Classical solution approaches assume that the parameters of the model are known. However, model parameters are usually estimated and uncertain in practice. As a result, managers are often interested in how estimation errors affect the optimal solution. In this paper we illustrate how sensitivity analysis can be performed directly for a Markov decision process with uncertain reward parameters using the Bellman equations. In particular, we consider problems involving (i) a single stationary parameter, (ii) multiple stationary parameters, and (iii) multiple nonstationary parameters. We illustrate the applicability of this work through a capacitated stochastic lot-sizing problem.

Download Full-text

Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making

Sequence Learning - Lecture Notes in Computer Science ◽

10.1007/3-540-44565-x_12 ◽

2000 ◽

pp. 264-287 ◽

Cited By ~ 7

Author(s):

Samuel P. M. Choi ◽

Dit-Yan Yeung ◽

Nevin L. Zhang

Keyword(s):

Decision Making ◽

Markov Decision Processes ◽

Decision Processes ◽

Sequential Decision Making ◽

Sequential Decision ◽

Markov Decision

Download Full-text

Markov Decision Processes: A Tool for Sequential Decision Making under Uncertainty

Medical Decision Making ◽

10.1177/0272989x09353194 ◽

2009 ◽

Vol 30 (4) ◽

pp. 474-483 ◽

Cited By ~ 100

Author(s):

Oguzhan Alagoz ◽

Heather Hsu ◽

Andrew J. Schaefer ◽

Mark S. Roberts

Keyword(s):

Decision Making ◽

Markov Decision Processes ◽

Living Donor ◽

Decision Processes ◽

Medical Decision ◽

Sequential Decision Making ◽

Optimal Timing ◽

Decision Making Under Uncertainty ◽

Sequential Decision ◽

Markov Decision

We provide a tutorial on the construction and evaluation of Markov decision processes (MDPs), which are powerful analytical tools used for sequential decision making under uncertainty that have been widely used in many industrial and manufacturing applications but are underutilized in medical decision making (MDM). We demonstrate the use of an MDP to solve a sequential clinical treatment problem under uncertainty. Markov decision processes generalize standard Markov models in that a decision process is embedded in the model and multiple decisions are made over time. Furthermore, they have significant advantages over standard decision analysis. We compare MDPs to standard Markov-based simulation models by solving the problem of the optimal timing of living-donor liver transplantation using both methods. Both models result in the same optimal transplantation policy and the same total life expectancies for the same patient and living donor. The computation time for solving the MDP model is significantly smaller than that for solving the Markov model. We briefly describe the growing literature of MDPs applied to medical decisions.

Download Full-text

An Introduction to Fully and Partially Observable Markov Decision Processes

Decision Theory Models for Applications in Artificial Intelligence ◽

10.4018/978-1-60960-165-2.ch003 ◽

2012 ◽

pp. 33-62 ◽

Cited By ~ 2

Author(s):

Pascal Poupart

Keyword(s):

Decision Making ◽

Markov Decision Processes ◽

Decision Processes ◽

Sequential Decision Making ◽

Decision Making Under Uncertainty ◽

Sequential Decision ◽

Markov Decision ◽

The Common ◽

Partially Observable Markov ◽

Partially Observable

The goal of this chapter is to provide an introduction to Markov decision processes as a framework for sequential decision making under uncertainty. The aim of this introduction is to provide practitioners with a basic understanding of the common modeling and solution techniques. Hence, we will not delve into the details of the most recent algorithms, but rather focus on the main concepts and the issues that impact deployment in practice. More precisely, we will review fully and partially observable Markov decision processes, describe basic algorithms to find good policies and discuss modeling/computational issues that arise in practice.

Download Full-text

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i06.6531 ◽

2020 ◽

Vol 34 (06) ◽

pp. 9794-9801

Author(s):

Tomáš Brázdil ◽

Krishnendu Chatterjee ◽

Petr Novotný ◽

Jiří Vahala

Keyword(s):

Markov Decision Processes ◽

Negative Impact ◽

Optimization Criterion ◽

Decision Processes ◽

Risk Averse ◽

Sequential Decision ◽

Failure State ◽

Markov Decision ◽

Planning Algorithm ◽

Low Probability

Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 106 states.

Download Full-text

Learning Control of Dynamical Systems Based on Markov Decision Processes: Research Frontiers and Outlooks

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2012.00673 ◽

2012 ◽

Vol 38 (5) ◽

pp. 673-687 ◽

Cited By ~ 1

Author(s):

Xin XU ◽

Dong SHEN ◽

Yan-Qing GAO ◽

Kai WANG

Keyword(s):

Dynamical Systems ◽

Markov Decision Processes ◽

Learning Control ◽

Decision Processes ◽

Markov Decision ◽

Research Frontiers

Download Full-text