optimal policies
Recently Published Documents


TOTAL DOCUMENTS

720
(FIVE YEARS 105)

H-INDEX

43
(FIVE YEARS 5)

2022 ◽  
Vol 73 ◽  
pp. 173-208
Author(s):  
Rodrigo Toro Icarte ◽  
Toryn Q. Klassen ◽  
Richard Valenzano ◽  
Sheila A. McIlraith

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible – to show the reward function’s code to the RL agent so it can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning. Experiments on tabular and continuous domains, across different tasks and RL agents, show the benefits of exploiting reward structure with respect to sample efficiency and the quality of resultant policies. Finally, by virtue of being a form of finite state machine, reward machines have the expressive power of a regular language and as such support loops, sequences and conditionals, as well as the expression of temporally extended properties typical of linear temporal logic and non-Markovian reward specification.


2021 ◽  
Vol 18 (4(Suppl.)) ◽  
pp. 1350
Author(s):  
Tho Nguyen Duc ◽  
Chanh Minh Tran ◽  
Phan Xuan Tan ◽  
Eiji Kamioka

Imitation learning is an effective method for training an autonomous agent to accomplish a task by imitating expert behaviors in their demonstrations. However, traditional imitation learning methods require a large number of expert demonstrations in order to learn a complex behavior. Such a disadvantage has limited the potential of imitation learning in complex tasks where the expert demonstrations are not sufficient. In order to address the problem, we propose a Generative Adversarial Network-based model which is designed to learn optimal policies using only a single demonstration. The proposed model is evaluated on two simulated tasks in comparison with other methods. The results show that our proposed model is capable of completing considered tasks despite the limitation in the number of expert demonstrations, which clearly indicate the potential of our model.


Author(s):  
Eren Gürer

AbstractThis study explores the implications of rising markups for optimal Mirrleesian income and profit taxation. Using a stylized model with two individuals, the main forces shaping welfare-optimal policies are analytically characterized. Although a higher profit tax has redistributive benefits, it adversely affects market competition, leading to a greater equilibrium cost-of-living. Rising markups directly contribute to a decline in optimal marginal taxes on labor income. The optimal policy response to higher markups includes increasingly relying on the profit tax to fund redistribution. Declining optimal marginal income taxes assists the redistributive function of the profit tax by contributing to the expansion of the profit tax base. This response alone considerably increases the equilibrium cost-of-living. Nevertheless, a majority of the individuals become better off with the optimal policy. If it is not possible to tax profits optimally, due, for example, to profit shifting, increasing redistribution via income taxes is not optimal; every individual is worse off relative to the scenario with optimal profit taxation.


Author(s):  
Akihiro Yamane ◽  
Kodo Ito ◽  
Yoshiyuki Higuchi

Social infrastructures such as roads and bridges are indispensable for our lives. They have to be maintained continuously and such maintenance has become a big issue in Japan. Social infrastructures are maintained under strict restrictions such as decreasing in local finance revenue and scarcity of skilful engineers. Various kinds of factors such as inspection periods, maintenance costs, and degradation levels, are necessary to consider in establishing efficient maintenance plans of social infrastructures. Furthermore, the special circumstances of social infrastructures such as the delay of constructions which is caused by the scarcity of budget, must be discussed for the efficient maintenance plan. For such discussion, the stochastic cost model which contains preventive and corrective maintenances is useful. Although these models have been studied in mechanical and electronic systems, unique characteristics of social infrastructures such as their enormous scale and delays due to maintenance budget restrictions must be considered when such social infrastructure models are discussed. In this paper, we establish maintenance models of infrastructures which some of preventive maintenance must be prolonged. The expected maintenance cost rate is established using the cumulative damage model and optimal policies which minimizes them are considered. Three basic models and their extended models which consider natural disasters are discussed.


2021 ◽  
Vol 3 (4) ◽  
pp. 487-502
Author(s):  
Daron Acemoglu ◽  
Victor Chernozhukov ◽  
Iván Werning ◽  
Michael D. Whinston

We study targeted lockdowns in a multigroup SIR model where infection, hospitalization, and fatality rates vary between groups—in particular between the “young,” the “middle-aged,” and the “old.” Our model enables a tractable quantitative analysis of optimal policy. For baseline parameter values for the COVID-19 pandemic applied to the US, we find that optimal policies differentially targeting risk/age groups significantly outperform optimal uniform policies and most of the gains can be realized by having stricter protective measures such as lockdowns on the more vulnerable, old group. Intuitively, a strict and long lockdown for the old both reduces infections and enables less strict lockdowns for the lower-risk groups. (JEL H51, I12, I18, J13, J14)


2021 ◽  
Author(s):  
Guodong Lyu ◽  
Mabel C. Chou ◽  
Chung-Piaw Teo ◽  
Zhichao Zheng ◽  
Yuanguang Zhong

A key challenge in the resource allocation problem is to find near-optimal policies to serve different customers with random demands/revenues, using a fixed pool of capacity (properly configured). Three classes of allocation policies, responsive (with perfect hindsight), adaptive (with information updates), and anticipative (with forecast information) policies, are widely used in practice. We analyze and compare the performances of these policies for both capacity minimization and revenue maximization models. In both models, the performance gaps between optimal anticipative policies and adaptive policies are shown to be bounded when the demand and revenue of each item are independently generated. In contrast, the gaps between the optimal adaptive policies and responsive policies can be arbitrarily large. More importantly, we show that the techniques developed, and the persistency values obtained from the optimal responsive policies can be used to design good adaptive and anticipative policies for the other two variants of resource allocation problems.


Author(s):  
A. Tsoularis ◽  
J. Wallace

This article considers the deterministic optimal control problem of profit maximization for inventory replenished at a variable rate and depleted by demand which is assumed to vary with price and stock availability. Optimal policies for the inventor, product order rate and price are derived using the maximum principle. Bounds on the maximum price possible are also derived.


2021 ◽  
Author(s):  
Siddhartha Banerjee ◽  
Daniel Freund ◽  
Thodoris Lykouris

The optimal management of shared vehicle systems, such as bike-, scooter-, car-, or ride-sharing, is more challenging compared with traditional resource allocation settings because of the presence of spatial externalities—changes in the demand/supply at any location affect future supply throughout the system within short timescales. These externalities are well captured by steady-state Markovian models, which are therefore widely used to analyze such systems. However, using Markovian models to design pricing and other control policies is computationally difficult because the resulting optimization problems are high dimensional and nonconvex. In our work, we design a framework that provides near-optimal policies, for a range of possible controls, that are based on applying the possible controls to achieve spatial balance on average. The optimality gap of these policies improves as the ratio between supply and the number of locations increases and asymptotically goes to zero.


2021 ◽  
Author(s):  
Matheus Guedes de Andrade ◽  
Wenhan Dai ◽  
Saikat Guha ◽  
Don Towsley

Sign in / Sign up

Export Citation Format

Share Document