Adaptive Reinforcement Learning and Its Application to Robot Compliance Learning
A new learning algorithm for connectionist networks that solves a class of optimal control problems is presented. The algorithm, called Adaptive Reinforcement Learning Algorithm, employs a second network to model immediate reinforcement provided from the task environment and adaptively identities it through repeated experience. Output perturbation and correlation techniques are used to translate mere critic signals into useful learning signals for the connectionist controller. Compared with the direct approaches of reinforcement learning, this algorithm shows faster and guaranteed improvement in the control performance. Robustness against inaccuracy of the model is also discussed. It is demonstrated by simulation that the adaptive reinforcement learning method is efficient and useful in learning a compliance control law in a class of robotic assembly tasks. A simple box palletizing task is used as an example, where a robot is required to move a rectangular part to the corner of a box. In the simulation, the robot is initially provided with only predetermined velocity command to follow the nominal trajectory. At each attempt, the box is randomly located and the part is randomly oriented within the grasp of the end-effector. Therefore, compliant motion control is necessary to guide the part to the corner of the box while avoiding excessive reaction forces caused by the collision with a wall. After repeating the failure in performing the task, the robot can successfully learn force feedback gains to modify its nominal motion. Our results show that the new learning method can be used to learn a compliance control law effectively.