Graphical Minimax Game and Off-Policy Reinforcement Learning for Heterogeneous MASs with Spanning Tree Condition

Guidance, Navigation and Control ◽

10.1142/s2737480721500114 ◽

2021 ◽

pp. 2150011

Author(s):

Wei Dong ◽

Jianan Wang ◽

Chunyan Wang ◽

Zhenqiang Qi ◽

Zhengtao Ding

Keyword(s):

Reinforcement Learning ◽

Spanning Tree ◽

Learning Algorithm ◽

Control Policy ◽

Game Problem ◽

Algebraic Riccati Equation ◽

Multi Agent Systems ◽

Rank Condition ◽

Minimax Game ◽

Tree Condition

In this paper, the optimal consensus control problem is investigated for heterogeneous linear multi-agent systems (MASs) with spanning tree condition based on game theory and reinforcement learning. First, the graphical minimax game algebraic Riccati equation (ARE) is derived by converting the consensus problem into a zero-sum game problem between each agent and its neighbors. The asymptotic stability and minimax validation of the closed-loop systems are proved theoretically. Then, a data-driven off-policy reinforcement learning algorithm is proposed to online learn the optimal control policy without the information of the system dynamics. A certain rank condition is established to guarantee the convergence of the proposed algorithm to the unique solution of the ARE. Finally, the effectiveness of the proposed method is demonstrated through a numerical simulation.

Download Full-text

Improvement on Supporting Machine Learning Algorithm for Solving Problem in Immediate Decision Making

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.566.572 ◽

2012 ◽

Vol 566 ◽

pp. 572-579

Author(s):

Abdolkarim Niazi ◽

Norizah Redzuan ◽

Raja Ishak Raja Hamzah ◽

Sara Esfandiari

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Combined Model ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Case Base ◽

Case Base Reasoning ◽

Robotic Tool

In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function is proposed to select the action, which led to an increase in algorithms based on Q-learning. The algorithm mentioned was used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi-agent systems. The results of experiments Indicated that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.

Download Full-text

Multiagent reinforcement learning using Non-Parametric Approximation

Respuestas ◽

10.22463/0122820x.1738 ◽

2018 ◽

Vol 23 (2) ◽

pp. 53-61

Author(s):

David Luviano Cruz ◽

Francesco José García Luna ◽

Luis Asunción Pérez Domínguez

Keyword(s):

Reinforcement Learning ◽

Hybrid Control ◽

Learning Algorithm ◽

Multi Agent Systems ◽

Generation Task ◽

Q Learning ◽

Agent Systems ◽

Multi Agent ◽

Optimal Set ◽

Parametric Approximation

This paper presents a hybrid control proposal for multi-agent systems, where the advantages of the reinforcement learning and nonparametric functions are exploited. A modified version of the Q-learning algorithm is used which will provide data training for a Kernel, this approach will provide a sub optimal set of actions to be used by the agents. The proposed algorithm is experimentally tested in a path generation task in an unknown environment for mobile robots.

Download Full-text

A Pre-Trained Fuzzy Reinforcement Learning Method for the Pursuing Satellite in a One-to-One Game in Space

Sensors ◽

10.3390/s20082253 ◽

2020 ◽

Vol 20 (8) ◽

pp. 2253

Author(s):

Xiao Wang ◽

Peng Shi ◽

Yushan Zhao ◽

Yue Sun

Keyword(s):

Reinforcement Learning ◽

Gradient Descent ◽

Fuzzy Inference ◽

Learning Algorithm ◽

Control Policy ◽

Descent Method ◽

Gradient Descent Method ◽

One To One ◽

Inference Systems ◽

First Time

In order to help the pursuer find its advantaged control policy in a one-to-one game in space, this paper proposes an innovative pre-trained fuzzy reinforcement learning algorithm, which is conducted in the x, y, and z channels separately. Compared with the previous algorithms applied in ground games, this is the first time reinforcement learning has been introduced to help the pursuer in space optimize its control policy. The known part of the environment is utilized to help the pursuer pre-train its consequent set before learning. An actor-critic framework is built in each moving channel of the pursuer. The consequent set of the pursuer is updated through the gradient descent method in fuzzy inference systems. The numerical experimental results validate the effectiveness of the proposed algorithm in improving the game ability of the pursuer.

Download Full-text

Scheduling and Power Control for Wireless Multicast Systems via Deep Reinforcement Learning

Entropy ◽

10.3390/e23121555 ◽

2021 ◽

Vol 23 (12) ◽

pp. 1555

Author(s):

Ramkumar Raghu ◽

Mahadesh Panju ◽

Vaneet Aggarwal ◽

Vinod Sharma

Keyword(s):

Reinforcement Learning ◽

Power Control ◽

Time Scale ◽

Learning Algorithm ◽

Average Power ◽

Control Policy ◽

Slight Modification ◽

Wireless Multicast ◽

Large State ◽

Content Centric Network

Multicasting in wireless systems is a natural way to exploit the redundancy in user requests in a content centric network. Power control and optimal scheduling can significantly improve the wireless multicast network’s performance under fading. However, the model-based approaches for power control and scheduling studied earlier are not scalable to large state spaces or changing system dynamics. In this paper, we use deep reinforcement learning, where we use function approximation of the Q-function via a deep neural network to obtain a power control policy that matches the optimal policy for a small network. We show that power control policy can be learned for reasonably large systems via this approach. Further, we use multi-timescale stochastic optimization to maintain the average power constraint. We demonstrate that a slight modification of the learning algorithm allows tracking of time varying system statistics. Finally, we extend the multi-time scale approach to simultaneously learn the optimal queuing strategy along with power control. We demonstrate the scalability, tracking and cross-layer optimization capabilities of our algorithms via simulations. The proposed multi-time scale approach can be used in general large state-space dynamical systems with multiple objectives and constraints, and may be of independent interest.

Download Full-text

Developing End-to-End Control Policies for Robotic Swarms Using Deep Q-learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2019.p0920 ◽

2019 ◽

Vol 23 (5) ◽

pp. 920-927 ◽

Cited By ~ 3

Author(s):

Yufei Wei ◽

Xiaotong Nie ◽

Motoaki Hiraga ◽

Kazuhiro Ohkura ◽

Zlatan Car ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Evolutionary Robotics ◽

Control Policy ◽

Control Policies ◽

Q Learning ◽

Robotic Swarms ◽

Learning Techniques ◽

End To End ◽

Large Parameter Space

In this study, the use of a popular deep reinforcement learning algorithm – deep Q-learning – in developing end-to-end control policies for robotic swarms is explored. Robots only have limited local sensory capabilities; however, in a swarm, they can accomplish collective tasks beyond the capability of a single robot. Compared with most automatic design approaches proposed so far, which belong to the field of evolutionary robotics, deep reinforcement learning techniques provide two advantages: (i) they enable researchers to develop control policies in an end-to-end fashion; and (ii) they require fewer computation resources, especially when the control policy to be developed has a large parameter space. The proposed approach is evaluated in a round-trip task, where the robots are required to travel between two destinations as much as possible. Simulation results show that the proposed approach can learn control policies directly from high-dimensional raw camera pixel inputs for robotic swarms.

Download Full-text

Training Champion-level Race Car Drivers Using Deep Reinforcement Learning

10.21203/rs.3.rs-795954/v1 ◽

2021 ◽

Author(s):

Peter Wurman ◽

Samuel Barrett ◽

Kenta Kawamoto ◽

James MacGlashan ◽

Kaushik Subramanian ◽

...

Keyword(s):

Reinforcement Learning ◽

Real Time ◽

Learning Algorithm ◽

Integrated Control ◽

Extreme Case ◽

Control Policy ◽

Reward Function ◽

Race Car ◽

Model Free ◽

Race Car Drivers

Abstract Many potential applications of artificial intelligence involve making real-time decisions in physical systems. Automobile racing represents an extreme case of real-time decision making in close proximity to other highly-skilled drivers while near the limits of vehicular control. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the nonlinear control challenges of real race cars while also encapsulating the complex multi-agent interactions. We attack, and solve for the first time, the simulated racing challenge using model-free deep reinforcement learning. We introduce a novel reinforcement learning algorithm and enhance the learning process with mixed scenario training to encourage the agent to incorporate racing tactics into an integrated control policy. In addition, we construct a reward function that enables the agent to adhere to the sport's under-specified racing etiquette rules. We demonstrate the capabilities of our agent, GT Sophy, by winning two of three races against four of the world's best Gran Turismo drivers and being competitive in the overall team score. By showing that these techniques can be successfully used to train championship-level race car drivers, we open up the possibility of their use in other complex dynamical systems and real-world applications.

Download Full-text

Decentralized Incremental Fuzzy Reinforcement Learning for Multi-Agent Systems

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s021848852050004x ◽

2020 ◽

Vol 28 (01) ◽

pp. 79-98

Author(s):

Sam Hamzeloo ◽

Mansoor Zolghadri Jahromi

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Infinite Horizon ◽

Fuzzy Rule ◽

Point Of View ◽

Benchmark Problems ◽

Rule Base ◽

Multi Agent Systems ◽

Fuzzy Rule Base ◽

Markov Decision

We present a new incremental fuzzy reinforcement learning algorithm to find a sub-optimal policy for infinite-horizon Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). The algorithm addresses the high computational complexity of solving large Dec-POMDPs by generating a compact fuzzy rule-base for each agent. In our method, each agent uses its own fuzzy rule-base to make the decisions. The fuzzy rules in these rule-bases are incrementally created and tuned according to experiences of the agents. Reinforcement learning is used to tune the behavior of each agent in such a way that maximum global reward is achieved. In addition, we propose a method to construct the initial rule-base for each agent using the solution of the underlying MDP. This drastically improves the performance of the algorithm in comparison with random initialization of the rule-base. We assess the performance of our proposed method using several benchmark problems in comparison with some state-of-the-art methods. Experimental results show that our algorithm achieves better or similar reward when compared with other methods. However, from the runtime point of view, our method is superior to all previous methods. Using a compact fuzzy rule-base not only decreases the amount of memory used but also significantly speeds up the learning phase.

Download Full-text

Bi-Level Actor-Critic for Multi-Agent Coordination

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6226 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7325-7332

Author(s):

Haifeng Zhang ◽

Weizhe Chen ◽

Zeren Huang ◽

Minne Li ◽

Yaodong Yang ◽

...

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Learning Algorithm ◽

Stackelberg Equilibrium ◽

Multi Agent Systems ◽

Matrix Games ◽

Markov Games ◽

The Arts ◽

Convergence Point ◽

Multi Agent

Coordination is one of the essential problems in multi-agent systems. Typically multi-agent reinforcement learning (MARL) methods treat agents equally and the goal is to solve the Markov game to an arbitrary Nash equilibrium (NE) when multiple equilibra exist, thus lacking a solution for NE selection. In this paper, we treat agents unequally and consider Stackelberg equilibrium as a potentially better convergence point than Nash equilibrium in terms of Pareto superiority, especially in cooperative environments. Under Markov games, we formally define the bi-level reinforcement learning problem in finding Stackelberg equilibrium. We propose a novel bi-level actor-critic learning method that allows agents to have different knowledge base (thus intelligent), while their actions still can be executed simultaneously and distributedly. The convergence proof is given, while the resulting learning algorithm is tested against the state of the arts. We found that the proposed bi-level actor-critic algorithm successfully converged to the Stackelberg equilibria in matrix games and find a asymmetric solution in a highway merge environment.

Download Full-text

Active Length Control of Shape Memory Alloy Wires Via Reinforcement Learning

Volume 1: Active Materials, Mechanics and Behavior; Modeling, Simulation and Control ◽

10.1115/smasis2009-1430 ◽

2009 ◽

Author(s):

Kenton Kirkpatrick ◽

John Valasek ◽

Dimitris Lagoudas

Keyword(s):

Numerical Simulation ◽

Reinforcement Learning ◽

Shape Memory Alloy ◽

Shape Memory ◽

Learning Algorithm ◽

Control Policy ◽

Active Length ◽

Strain Relationship ◽

Temperature Strain ◽

Reinforcement Learning Algorithm

The ability to actively control the shape of aerospace structures has initiated research regarding the use of Shape Memory Alloy actuators. These actuators can be used for morphing or shape change by controlling their temperature, which is effectively done by applying a voltage difference across their length. The ability to characterize this temperature-strain relationship using Reinforcement Learning has been previously accomplished, but in order to control Shape Memory Alloy wires it is more beneficial to learn the voltage-position relationship. Numerical simulation using Reinforcement Learning has been used for determining the temperature-strain relationship for characterizing the major and minor hysteresis loops, and determining a limited control policy relating applied temperature to desired strain. Since Reinforcement Learning creates a non-parametric control policy, and there is not currently a general parametric model for this control policy, determining the voltage-position relationship for a Shape Memory Alloy is done separately. This paper extends earlier numerical simulation results and experimental results in temperature-strain space by applying a similar Reinforcement Learning algorithm to voltage-position space using an experimental hardware apparatus. Results presented in the paper show the ability to converge on a near-optimal control policy for Shape Memory Alloy length control by means of an improved Reinforcement Learning algorithm. These results demonstrate the power of Reinforcement Learning as a method of constructing a policy capable of controlling Shape Memory Alloy wire length.

Download Full-text

An Artificial Intelligence Approach for Online Optimization of Flexible Manufacturing Systems

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.882.96 ◽

2018 ◽

Vol 882 ◽

pp. 96-108 ◽

Cited By ~ 3

Author(s):

Jupiter Bakakeu ◽

Schirin Tolksdorf ◽

Jochen Bauer ◽

Hans-Henning Klos ◽

Jörn Peschke ◽

...

Keyword(s):

Reinforcement Learning ◽

Flexible Manufacturing ◽

Manufacturing Systems ◽

Learning Algorithm ◽

Electricity Consumption ◽

Control Policy ◽

Online Optimization ◽

Energy Prices ◽

Sequential Decision ◽

Time Step

This paper addresses the problem of efficiently operating a flexible manufacturing machine in an electricity micro-grid featuring a high volatility of electricity prices. The problem of finding the optimal control policy is formulated as a sequential decision making problem under uncertainty where, at every time step the uncertainty comes from the lack of knowledge about fu-ture electricity consumption and future weather dependent energy prices. We propose to address this problem using deep reinforcement learning. To this purpose, we designed a deep learning architecture to forecast the load profile of future manufacturing schedule from past production time series. Combined with the forecast of future energy prices, the reinforcement-learning algorithm is trained to perform an online optimization of the production ma-chine in order to reduce the long-term energy costs. The concept is empirical-ly validated on a flexible production machine, where the machine speed can be optimized during the production.

Download Full-text