batch reinforcement learning Latest Research Papers

When an agent has limited information on its environment, the suboptimality of an RL algorithm can be decomposed into the sum of two terms: a term related to an asymptotic bias (suboptimality with unlimited data) and a term due to overfitting (additional suboptimality due to limited data). In the context of reinforcement learning with partial observability, this paper provides an analysis of the tradeoff between these two error sources. In particular, our theoretical analysis formally characterizes how a smaller state representation increases the asymptotic bias while decreasing the risk of overfitting.

Download Full-text

A framework to shift basins of attraction of gene regulatory networks through batch reinforcement learning

Artificial Intelligence in Medicine ◽

10.1016/j.artmed.2020.101853 ◽

2020 ◽

Vol 107 ◽

pp. 101853

Author(s):

Cyntia Eico Hayama Nishida ◽

Reinaldo A. Costa Bianchi ◽

Anna Helena Reali Costa

Keyword(s):

Reinforcement Learning ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Basins Of Attraction ◽

Batch Reinforcement Learning ◽

Gene Regulatory

Download Full-text

BRPO: Batch Residual Policy Optimization

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/391 ◽

2020 ◽

Author(s):

Sungryull Sohn ◽

Yinlam Chow ◽

Jayden Ooi ◽

Ofir Nachum ◽

Honglak Lee ◽

...

Keyword(s):

Maximum Degree ◽

State Of The Art ◽

Poor Performance ◽

Policy Changes ◽

High Confidence ◽

State Action ◽

Policy Performance ◽

Allowable Deviation ◽

Batch Reinforcement Learning ◽

Policy Optimization

In batch reinforcement learning (RL), one often constrains a learned policy to be close to the behavior (data-generating) policy, e.g., by constraining the learned action distribution to differ from the behavior policy by some maximum degree that is the same at each state. This can cause batch RL to be overly conservative, unable to exploit large policy changes at frequently-visited, high-confidence states without risking poor performance at sparsely-visited states. To remedy this, we propose residual policies, where the allowable deviation of the learned policy is state-action-dependent. We derive a new for RL method, BRPO, which learns both the policy and allowable deviation that jointly maximize a lower bound on policy performance. We show that BRPO achieves the state-of-the-art performance in a number of tasks.

Download Full-text

Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning

IEEE Transactions on Power Systems ◽

10.1109/tpwrs.2019.2948132 ◽

2020 ◽

Vol 35 (3) ◽

pp. 1990-2001 ◽

Cited By ~ 9

Author(s):

Hanchen Xu ◽

Alejandro D. Dominguez-Garcia ◽

Peter W. Sauer

Keyword(s):

Reinforcement Learning ◽

Voltage Regulation ◽

Batch Reinforcement Learning

Download Full-text

Defining admissible rewards for high-confidence policy evaluation in batch reinforcement learning

Proceedings of the ACM Conference on Health, Inference, and Learning ◽

10.1145/3368555.3384450 ◽

2020 ◽

Author(s):

Niranjani Prasad ◽

Barbara Engelhardt ◽

Finale Doshi-Velez

Keyword(s):

Reinforcement Learning ◽

Policy Evaluation ◽

High Confidence ◽

Batch Reinforcement Learning

Download Full-text

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.11478 ◽

2019 ◽

Vol 65 ◽

pp. 1-30 ◽

Cited By ~ 2

Author(s):

Vincent Francois-Lavet ◽

Guillaume Rabusseau ◽

Joelle Pineau ◽

Damien Ernst ◽

Raphael Fonteneau

Keyword(s):

Reinforcement Learning ◽

Large Scale ◽

Asymptotic Bias ◽

State Representation ◽

Real World Data ◽

Partial Observability ◽

History Of ◽

Batch Reinforcement Learning ◽

Partially Observable ◽

Belief States

This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding $L_1$ error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting in the partially observable context.

Download Full-text

Parameterized Batch Reinforcement Learning for Longitudinal Control of Autonomous Land Vehicles

IEEE Transactions on Systems Man and Cybernetics Systems ◽

10.1109/tsmc.2017.2712561 ◽

2019 ◽

Vol 49 (4) ◽

pp. 730-741 ◽

Cited By ~ 18

Author(s):

Zhenhua Huang ◽

Xin Xu ◽

Haibo He ◽

Jun Tan ◽

Zhenping Sun

Keyword(s):

Reinforcement Learning ◽

Longitudinal Control ◽

Land Vehicles ◽

Batch Reinforcement Learning

Download Full-text

Gene Regulatory Networks Full Observable Cbased on Batch Reinforcement Learning: An Improved Policy

2019 27th Iranian Conference on Electrical Engineering (ICEE) ◽

10.1109/iraniancee.2019.8786638 ◽

2019 ◽

Cited By ~ 1

Author(s):

Reyhaneh Naderi ◽

Naser Mozayani

Keyword(s):

Reinforcement Learning ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Batch Reinforcement Learning ◽

Gene Regulatory

Download Full-text

batch reinforcement learning
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient

Scaling data-driven robotics with reward sketching and batch reinforcement learning

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability (Extended Abstract)

A framework to shift basins of attraction of gene regulatory networks through batch reinforcement learning

BRPO: Batch Residual Policy Optimization

Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning

Defining admissible rewards for high-confidence policy evaluation in batch reinforcement learning

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability

Parameterized Batch Reinforcement Learning for Longitudinal Control of Autonomous Land Vehicles

Gene Regulatory Networks Full Observable Cbased on Batch Reinforcement Learning: An Improved Policy

Export Citation Format

batch reinforcement learningRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Batch Reinforcement Learning with a Nonparametric Off-Policy Policy Gradient

Scaling data-driven robotics with reward sketching and batch reinforcement learning

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability (Extended Abstract)

A framework to shift basins of attraction of gene regulatory networks through batch reinforcement learning

BRPO: Batch Residual Policy Optimization

Optimal Tap Setting of Voltage Regulation Transformers Using Batch Reinforcement Learning

Defining admissible rewards for high-confidence policy evaluation in batch reinforcement learning

On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability

Parameterized Batch Reinforcement Learning for Longitudinal Control of Autonomous Land Vehicles

Gene Regulatory Networks Full Observable Cbased on Batch Reinforcement Learning: An Improved Policy

batch reinforcement learning
Recently Published Documents