reward functions Latest Research Papers

Reinforcement learning (RL) methods usually treat reward functions as black boxes. As such, these methods must extensively interact with the environment in order to discover rewards and optimal policies. In most RL applications, however, users have to program the reward function and, hence, there is the opportunity to make the reward function visible – to show the reward function’s code to the RL agent so it can exploit the function’s internal structure to learn optimal policies in a more sample efficient manner. In this paper, we show how to accomplish this idea in two steps. First, we propose reward machines, a type of finite state machine that supports the specification of reward functions while exposing reward function structure. We then describe different methodologies to exploit this structure to support learning, including automated reward shaping, task decomposition, and counterfactual reasoning with off-policy learning. Experiments on tabular and continuous domains, across different tasks and RL agents, show the benefits of exploiting reward structure with respect to sample efficiency and the quality of resultant policies. Finally, by virtue of being a form of finite state machine, reward machines have the expressive power of a regular language and as such support loops, sequences and conditionals, as well as the expression of temporally extended properties typical of linear temporal logic and non-Markovian reward specification.

Download Full-text

Evaluating the learning and performance characteristics of self-organizing systems with different task features

Artificial intelligence for engineering design analysis and manufacturing ◽

10.1017/s089006042100024x ◽

2021 ◽

pp. 1-19

Author(s):

Hao Ji ◽

Yan Jin

Keyword(s):

Systematic Evaluation ◽

Full Potential ◽

Learning Capability ◽

Reward Function ◽

Reward Functions ◽

Good Learning ◽

And Performance ◽

Generation Problem ◽

Self Organizing

Abstract Self-organizing systems (SOS) are developed to perform complex tasks in unforeseen situations with adaptability. Predefining rules for self-organizing agents can be challenging, especially in tasks with high complexity and changing environments. Our previous work has introduced a multiagent reinforcement learning (RL) model as a design approach to solving the rule generation problem of SOS. A deep multiagent RL algorithm was devised to train agents to acquire the task and self-organizing knowledge. However, the simulation was based on one specific task environment. Sensitivity of SOS to reward functions and systematic evaluation of SOS designed with multiagent RL remain an issue. In this paper, we introduced a rotation reward function to regulate agent behaviors during training and tested different weights of such reward on SOS performance in two case studies: box-pushing and T-shape assembly. Additionally, we proposed three metrics to evaluate the SOS: learning stability, quality of learned knowledge, and scalability. Results show that depending on the type of tasks; designers may choose appropriate weights of rotation reward to obtain the full potential of agents’ learning capability. Good learning stability and quality of knowledge can be achieved with an optimal range of team sizes. Scaling up to larger team sizes has better performance than scaling downwards.

Download Full-text

A Frequency and Voltage Coordinated Control Strategy of Island Microgrid including Electric Vehicles

Electronics ◽

10.3390/electronics11010017 ◽

2021 ◽

Vol 11 (1) ◽

pp. 17

Author(s):

Peixiao Fan ◽

Song Ke ◽

Salah Mohamed Kamel Mohamed Hassan ◽

Jun Yang ◽

Yonghui Li ◽

...

Keyword(s):

Electric Vehicles ◽

Control Strategy ◽

Travel Demand ◽

Wind Disturbance ◽

Capacity Model ◽

Voltage Deviation ◽

Simulation Results ◽

Islanded Microgrid ◽

Reward Functions ◽

Four Quadrant

Frequency and voltage deviation are important standards for measuring energy indicators. It is important for microgrids to maintain the stability of voltage and frequency (VF). Aiming at the VF regulation of microgrid caused by wind disturbance and load fluctuation, a comprehensive VF control strategy for an islanded microgrid with electric vehicles (EVs) based on Deep Deterministic Policy Gradient (DDPG) is proposed in this paper. First of all, the SOC constraints of EVs are added to construct a cluster-EV charging model, by considering the randomness of users’ travel demand and charging behavior. In addition, a four-quadrant two-way charger capacity model is introduced to build a microgrid VF control model including load, micro gas turbine (MT), EVs, and their random power increment constraints. Secondly, according to the two control goals of microgrid frequency and voltage, the structure of DDPG controller is designed. Then, the definition of space, the design of global and local reward functions, and the selection of optimal hyperparameters are completed. Finally, different scenarios are set up in an islanded microgrid with EVs, and the simulation results are compared with traditional PI control and R(λ) control. The simulation results show that the proposed DDPG controller can quickly and efficiently suppress the VF fluctuations caused by wind disturbance and load fluctuations at the same time.

Download Full-text

Driver-like decision-making method for vehicle longitudinal autonomous driving based on deep reinforcement learning

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/09544070211063081 ◽

2021 ◽

pp. 095440702110630

Author(s):

Zhenhai Gao ◽

Xiangtong Yan ◽

Fei Gao ◽

Lei He

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Autonomous Driving ◽

Decision Strategies ◽

Reward Function ◽

Human Driver ◽

Reward Functions ◽

A Current ◽

Better Than

Decision-making is one of the key parts of the research on vehicle longitudinal autonomous driving. Considering the behavior of human drivers when designing autonomous driving decision-making strategies is a current research hotspot. In longitudinal autonomous driving decision-making strategies, traditional rule-based decision-making strategies are difficult to apply to complex scenarios. Current decision-making methods that use reinforcement learning and deep reinforcement learning construct reward functions designed with safety, comfort, and economy. Compared with human drivers, the obtained decision strategies still have big gaps. Focusing on the above problems, this paper uses the driver’s behavior data to design the reward function of the deep reinforcement learning algorithm through BP neural network fitting, and uses the deep reinforcement learning DQN algorithm and the DDPG algorithm to establish two driver-like longitudinal autonomous driving decision-making models. The simulation experiment compares the decision-making effect of the two models with the driver curve. The results shows that the two algorithms can realize driver-like decision-making, and the consistency of the DDPG algorithm and human driver behavior is higher than that of the DQN algorithm, the effect of the DDPG algorithm is better than the DQN algorithm.

Download Full-text

Deep reinforcement learning-based rehabilitation robot trajectory planning with optimized reward functions

Advances in Mechanical Engineering ◽

10.1177/16878140211067011 ◽

2021 ◽

Vol 13 (12) ◽

pp. 168781402110670

Author(s):

Xusheng Wang ◽

Jiexin Xie ◽

Shijie Guo ◽

Yue Li ◽

Pengfei Sun ◽

...

Keyword(s):

Reinforcement Learning ◽

Trajectory Planning ◽

Working Environment ◽

Rehabilitation Robot ◽

Exploration Process ◽

Reward Function ◽

Robot Trajectory ◽

Reward Functions ◽

Low Efficiency ◽

Robot Trajectory Planning

Deep reinforcement learning (DRL) provides a new solution for rehabilitation robot trajectory planning in the unstructured working environment, which can bring great convenience to patients. Previous researches mainly focused on optimization strategies but ignored the construction of reward functions, which leads to low efficiency. Different from traditional sparse reward function, this paper proposes two dense reward functions. First, azimuth reward function mainly provides a global guidance and reasonable constraints in the exploration. To further improve the efficiency, a process-oriented aspiration reward function is proposed, it is capable of accelerating the exploration process and avoid locally optimal solution. Experiments show that the proposed reward functions are able to accelerate the convergence rate by 38.4% on average with the mainstream DRL methods. The mean of convergence also increases by 9.5%, and the percentage of standard deviation decreases by 21.2%–23.3%. Results show that the proposed reward functions can significantly improve learning efficiency of DRL methods, and then provide practical possibility for automatic trajectory planning of rehabilitation robot.

Download Full-text

Asymmetric and adaptive reward coding via normalized reinforcement learning

10.1101/2021.11.24.469880 ◽

2021 ◽

Author(s):

Kenway Louie

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Computational Models ◽

Nonlinear Function ◽

Value Functions ◽

Learning Rates ◽

Error Coding ◽

Asymmetric Learning ◽

Reward Functions ◽

The Brain

Learning is widely modeled in psychology, neuroscience, and computer science by prediction error-guided reinforcement learning (RL) algorithms. While standard RL assumes linear reward functions, reward-related neural activity is a saturating, nonlinear function of reward; however, the computational and behavioral implications of nonlinear RL are unknown. Here, we show that nonlinear RL incorporating the canonical divisive normalization computation introduces an intrinsic and tunable asymmetry in prediction error coding. At the behavioral level, this asymmetry explains empirical variability in risk preferences typically attributed to asymmetric learning rates. At the neural level, diversity in asymmetries provides a computational mechanism for recently proposed theories of distributional RL, allowing the brain to learn the full probability distribution of future rewards. This behavioral and computational flexibility argues for an incorporation of biologically valid value functions in computational models of learning and decision-making.

Download Full-text

Reinforcement learning for robotic manipulation using simulated locomotion demonstrations

Machine Learning ◽

10.1007/s10994-021-06116-1 ◽

2021 ◽

Author(s):

Ozsel Kilinc ◽

Giovanni Montana

Keyword(s):

Reinforcement Learning ◽

Space Exploration ◽

Approaches To Learning ◽

Robotic Manipulation ◽

Rigid Object ◽

State Action ◽

Learning Rates ◽

Recent Developments ◽

Manipulation Task ◽

Reward Functions

AbstractMastering robotic manipulation skills through reinforcement learning (RL) typically requires the design of shaped reward functions. Recent developments in this area have demonstrated that using sparse rewards, i.e. rewarding the agent only when the task has been successfully completed, can lead to better policies. However, state-action space exploration is more difficult in this case. Recent RL approaches to learning with sparse rewards have leveraged high-quality human demonstrations for the task, but these can be costly, time consuming or even impossible to obtain. In this paper, we propose a novel and effective approach that does not require human demonstrations. We observe that every robotic manipulation task could be seen as involving a locomotion task from the perspective of the object being manipulated, i.e. the object could learn how to reach a target state on its own. In order to exploit this idea, we introduce a framework whereby an object locomotion policy is initially obtained using a realistic physics simulator. This policy is then used to generate auxiliary rewards, called simulated locomotion demonstration rewards (SLDRs), which enable us to learn the robot manipulation policy. The proposed approach has been evaluated on 13 tasks of increasing complexity, and can achieve higher success rate and faster learning rates compared to alternative algorithms. SLDRs are especially beneficial for tasks like multi-object stacking and non-rigid object manipulation.

Download Full-text

Reinforcement learning control of robot manipulator

Revista Brasileira de Computação Aplicada ◽

10.5335/rbca.v13i3.12091 ◽

2021 ◽

Vol 13 (3) ◽

pp. 42-53

Author(s):

Lucas Pereira Cotrim ◽

Marcos Menon José ◽

Eduardo Lobo Lustosa Cabral

Keyword(s):

Reinforcement Learning ◽

Robot Manipulator ◽

Industrial Robot ◽

Industrial Applications ◽

Value Iteration ◽

Robot Arm ◽

Training Time ◽

Reward Function ◽

Simulated Environment ◽

Reward Functions

Since the establishment of robotics in industrial applications, industrial robot programming involves therepetitive and time-consuming process of manually specifying a fixed trajectory, which results in machineidle time in terms of production and the necessity of completely reprogramming the robot for different tasks.The increasing number of robotics applications in unstructured environments requires not only intelligent butalso reactive controllers, due to the unpredictability of the environment and safety measures respectively. This paper presents a comparative analysis of two classes of Reinforcement Learning algorithms, value iteration (Q-Learning/DQN) and policy iteration (REINFORCE), applied to the discretized task of positioning a robotic manipulator in an obstacle-filled simulated environment, with no previous knowledge of the obstacles’ positions or of the robot arm dynamics. The agent’s performance and algorithm convergence are analyzed under different reward functions and on four increasingly complex test projects: 1-Degree of Freedom (DOF) robot, 2-DOF robot, Kuka KR16 Industrial robot, Kuka KR16 Industrial robot with random setpoint/obstacle placement. The DQN algorithm presented significantly better performance and reduced training time across all test projects and the third reward function generated better agents for both algorithms.

Download Full-text

Physics-Based Motion Control Through DRL’s Reward Functions

10.1145/3488162.3488217 ◽

2021 ◽

Author(s):

Antonio Santos de Sousa ◽

Rubens Fernandes Nunes ◽

Creto Augusto Vidal ◽

Joaquim Bento Cavalcante-Neto ◽

Danilo Borges da Silva

Keyword(s):

Motion Control ◽

Reward Functions

Download Full-text

Modeling Object’s Affordances via Reward Functions

10.1109/smc52423.2021.9658915 ◽

2021 ◽

Author(s):

Renan Lima Baima ◽

Esther Luna Colombini

Keyword(s):

Reward Functions

Download Full-text

reward functions
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

Evaluating the learning and performance characteristics of self-organizing systems with different task features

A Frequency and Voltage Coordinated Control Strategy of Island Microgrid including Electric Vehicles

Driver-like decision-making method for vehicle longitudinal autonomous driving based on deep reinforcement learning

Deep reinforcement learning-based rehabilitation robot trajectory planning with optimized reward functions

Asymmetric and adaptive reward coding via normalized reinforcement learning

Reinforcement learning for robotic manipulation using simulated locomotion demonstrations

Reinforcement learning control of robot manipulator

Physics-Based Motion Control Through DRL’s Reward Functions

Modeling Object’s Affordances via Reward Functions

Export Citation Format

reward functionsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning

Evaluating the learning and performance characteristics of self-organizing systems with different task features

A Frequency and Voltage Coordinated Control Strategy of Island Microgrid including Electric Vehicles

Driver-like decision-making method for vehicle longitudinal autonomous driving based on deep reinforcement learning

Deep reinforcement learning-based rehabilitation robot trajectory planning with optimized reward functions

Asymmetric and adaptive reward coding via normalized reinforcement learning

Reinforcement learning for robotic manipulation using simulated locomotion demonstrations

Reinforcement learning control of robot manipulator

Physics-Based Motion Control Through DRL’s Reward Functions

Modeling Object’s Affordances via Reward Functions

reward functions
Recently Published Documents