scholarly journals SDRL: Interpretable and Data-Efficient Deep Reinforcement Learning Leveraging Symbolic Planning

Author(s):  
Daoming Lyu ◽  
Fangkai Yang ◽  
Bo Liu ◽  
Steven Gustafson

Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options.This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.

Author(s):  
Daoming Lyu ◽  
Fangkai Yang ◽  
Bo Liu ◽  
Daesub Yoon

Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options. This framework features a planner – controller – meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches.


2019 ◽  
Vol 109 (3) ◽  
pp. 493-512 ◽  
Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Abstract Reinforcement learning methods rely on rewards provided by the environment that are extrinsic to the agent. However, many real-world scenarios involve sparse or delayed rewards. In such cases, the agent can develop its own intrinsic reward function called curiosity to enable the agent to explore its environment in the quest of new skills. We propose a novel end-to-end curiosity mechanism for deep reinforcement learning methods, that allows an agent to gradually acquire new skills. Our method scales to high-dimensional problems, avoids the need of directly predicting the future, and, can perform in sequential decision scenarios. We formulate the curiosity as the ability of the agent to predict its own knowledge about the task. We base the prediction on the idea of skill learning to incentivize the discovery of new skills, and guide exploration towards promising solutions. To further improve data efficiency and generalization of the agent, we propose to learn a latent representation of the skills. We present a variety of sparse reward tasks in MiniGrid, MuJoCo, and Atari games. We compare the performance of an augmented agent that uses our curiosity reward to state-of-the-art learners. Experimental evaluation exhibits higher performance compared to reinforcement learning models that only learn by maximizing extrinsic rewards.


eLife ◽  
2014 ◽  
Vol 3 ◽  
Author(s):  
Nixon M Abraham ◽  
Roberto Vincis ◽  
Samuel Lagier ◽  
Ivan Rodriguez ◽  
Alan Carleton

Sensory inputs are remarkably organized along all sensory pathways. While sensory representations are known to undergo plasticity at the higher levels of sensory pathways following peripheral lesions or sensory experience, less is known about the functional plasticity of peripheral inputs induced by learning. We addressed this question in the adult mouse olfactory system by combining odor discrimination studies with functional imaging of sensory input activity in awake mice. Here we show that associative learning, but not passive odor exposure, potentiates the strength of sensory inputs up to several weeks after the end of training. We conclude that experience-dependent plasticity can occur in the periphery of adult mouse olfactory system, which should improve odor detection and contribute towards accurate and fast odor discriminations.


Author(s):  
Rundong Wang ◽  
Runsheng Yu ◽  
Bo An ◽  
Zinovi Rabinovich

Hierarchical reinforcement learning (HRL) is a promising approach to solve tasks with long time horizons and sparse rewards. It is often implemented as a high-level policy assigning subgoals to a low-level policy. However, it suffers the high-level non-stationarity problem since the low-level policy is constantly changing. The non-stationarity also leads to the data efficiency problem: policies need more data at non-stationary states to stabilize training. To address these issues, we propose a novel HRL method: Interactive Influence-based Hierarchical Reinforcement Learning (I^2HRL). First, inspired by agent modeling, we enable the interaction between the low-level and high-level policies to stabilize the high-level policy training. The high-level policy makes decisions conditioned on the received low-level policy representation as well as the state of the environment. Second, we furthermore stabilize the high-level policy via an information-theoretic regularization with minimal dependence on the changing low-level policy. Third, we propose the influence-based exploration to more frequently visit the non-stationary states where more transition data is needed. We experimentally validate the effectiveness of the proposed solution in several tasks in MuJoCo domains by demonstrating that our approach can significantly boost the learning performance and accelerate learning compared with state-of-the-art HRL methods.


2006 ◽  
Vol 27 (4) ◽  
pp. 218-228 ◽  
Author(s):  
Paul Rodway ◽  
Karen Gillies ◽  
Astrid Schepman

This study examined whether individual differences in the vividness of visual imagery influenced performance on a novel long-term change detection task. Participants were presented with a sequence of pictures, with each picture and its title displayed for 17  s, and then presented with changed or unchanged versions of those pictures and asked to detect whether the picture had been changed. Cuing the retrieval of the picture's image, by presenting the picture's title before the arrival of the changed picture, facilitated change detection accuracy. This suggests that the retrieval of the picture's representation immunizes it against overwriting by the arrival of the changed picture. The high and low vividness participants did not differ in overall levels of change detection accuracy. However, in replication of Gur and Hilgard (1975) , high vividness participants were significantly more accurate at detecting salient changes to pictures compared to low vividness participants. The results suggest that vivid images are not characterised by a high level of detail and that vivid imagery enhances memory for the salient aspects of a scene but not all of the details of a scene. Possible causes of this difference, and how they may lead to an understanding of individual differences in change detection, are considered.


Author(s):  
Ray Guillery

My thesis studies had stimulated an interest in the mamillothalamic pathways but also some puzzlement because we knew nothing about the nature of the messages passing along these pathways. Several laboratories were studying the thalamic relay of sensory pathways with great success during my post-doctoral years. Each sensory relay could be understood in terms of the appropriate sensory input, but we had no way of knowing the meaning of the mamillothalamic messages. I introduce these nuclei as an example of the many thalamic nuclei about whose input functions we still know little or nothing. Early clinical studies of mamillary lesions had suggested a role in memory formation, whereas evidence from cortical lesions suggested a role in emotional experiences. Studies of the smallest of the three nuclei forming these pathways then showed it to be concerned with sensing head direction, relevant but not sufficient for defining an animal’s position in space. More recent studies based on studies of cortical activity or cortical damage have provided a plethora of suggestions: as so often, the answers reported depend on the questions asked. That simple conclusion is relevant for all transthalamic pathways. The evidence introduced in Chapter 1, that thalamocortical messages have dual meanings, suggests that we need to rethink our questions. It may prove useful to look at the motor outputs of relevant cortical areas to get clues about some appropriate questions.


Sign in / Sign up

Export Citation Format

Share Document