Graphical Minimax Game and Off-Policy Reinforcement Learning for Heterogeneous MASs with Spanning Tree Condition

2021 ◽  
pp. 2150011
Author(s):  
Wei Dong ◽  
Jianan Wang ◽  
Chunyan Wang ◽  
Zhenqiang Qi ◽  
Zhengtao Ding

In this paper, the optimal consensus control problem is investigated for heterogeneous linear multi-agent systems (MASs) with spanning tree condition based on game theory and reinforcement learning. First, the graphical minimax game algebraic Riccati equation (ARE) is derived by converting the consensus problem into a zero-sum game problem between each agent and its neighbors. The asymptotic stability and minimax validation of the closed-loop systems are proved theoretically. Then, a data-driven off-policy reinforcement learning algorithm is proposed to online learn the optimal control policy without the information of the system dynamics. A certain rank condition is established to guarantee the convergence of the proposed algorithm to the unique solution of the ARE. Finally, the effectiveness of the proposed method is demonstrated through a numerical simulation.

2012 ◽  
Vol 566 ◽  
pp. 572-579
Author(s):  
Abdolkarim Niazi ◽  
Norizah Redzuan ◽  
Raja Ishak Raja Hamzah ◽  
Sara Esfandiari

In this paper, a new algorithm based on case base reasoning and reinforcement learning (RL) is proposed to increase the convergence rate of the reinforcement learning algorithms. RL algorithms are very useful for solving wide variety decision problems when their models are not available and they must make decision correctly in every state of system, such as multi agent systems, artificial control systems, robotic, tool condition monitoring and etc. In the propose method, we investigate how making improved action selection in reinforcement learning (RL) algorithm. In the proposed method, the new combined model using case base reasoning systems and a new optimized function is proposed to select the action, which led to an increase in algorithms based on Q-learning. The algorithm mentioned was used for solving the problem of cooperative Markov’s games as one of the models of Markov based multi-agent systems. The results of experiments Indicated that the proposed algorithms perform better than the existing algorithms in terms of speed and accuracy of reaching the optimal policy.


Respuestas ◽  
2018 ◽  
Vol 23 (2) ◽  
pp. 53-61
Author(s):  
David Luviano Cruz ◽  
Francesco José García Luna ◽  
Luis Asunción Pérez Domínguez

This paper presents a hybrid control proposal for multi-agent systems, where the advantages of the reinforcement learning and nonparametric functions are exploited. A modified version of the Q-learning algorithm is used which will provide data training for a Kernel, this approach will provide a sub optimal set of actions to be used by the agents. The proposed algorithm is experimentally tested in a path generation task in an unknown environment for mobile robots.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2253
Author(s):  
Xiao Wang ◽  
Peng Shi ◽  
Yushan Zhao ◽  
Yue Sun

In order to help the pursuer find its advantaged control policy in a one-to-one game in space, this paper proposes an innovative pre-trained fuzzy reinforcement learning algorithm, which is conducted in the x, y, and z channels separately. Compared with the previous algorithms applied in ground games, this is the first time reinforcement learning has been introduced to help the pursuer in space optimize its control policy. The known part of the environment is utilized to help the pursuer pre-train its consequent set before learning. An actor-critic framework is built in each moving channel of the pursuer. The consequent set of the pursuer is updated through the gradient descent method in fuzzy inference systems. The numerical experimental results validate the effectiveness of the proposed algorithm in improving the game ability of the pursuer.


Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1555
Author(s):  
Ramkumar Raghu ◽  
Mahadesh Panju ◽  
Vaneet Aggarwal ◽  
Vinod Sharma

Multicasting in wireless systems is a natural way to exploit the redundancy in user requests in a content centric network. Power control and optimal scheduling can significantly improve the wireless multicast network’s performance under fading. However, the model-based approaches for power control and scheduling studied earlier are not scalable to large state spaces or changing system dynamics. In this paper, we use deep reinforcement learning, where we use function approximation of the Q-function via a deep neural network to obtain a power control policy that matches the optimal policy for a small network. We show that power control policy can be learned for reasonably large systems via this approach. Further, we use multi-timescale stochastic optimization to maintain the average power constraint. We demonstrate that a slight modification of the learning algorithm allows tracking of time varying system statistics. Finally, we extend the multi-time scale approach to simultaneously learn the optimal queuing strategy along with power control. We demonstrate the scalability, tracking and cross-layer optimization capabilities of our algorithms via simulations. The proposed multi-time scale approach can be used in general large state-space dynamical systems with multiple objectives and constraints, and may be of independent interest.


Author(s):  
Yufei Wei ◽  
Xiaotong Nie ◽  
Motoaki Hiraga ◽  
Kazuhiro Ohkura ◽  
Zlatan Car ◽  
...  

In this study, the use of a popular deep reinforcement learning algorithm – deep Q-learning – in developing end-to-end control policies for robotic swarms is explored. Robots only have limited local sensory capabilities; however, in a swarm, they can accomplish collective tasks beyond the capability of a single robot. Compared with most automatic design approaches proposed so far, which belong to the field of evolutionary robotics, deep reinforcement learning techniques provide two advantages: (i) they enable researchers to develop control policies in an end-to-end fashion; and (ii) they require fewer computation resources, especially when the control policy to be developed has a large parameter space. The proposed approach is evaluated in a round-trip task, where the robots are required to travel between two destinations as much as possible. Simulation results show that the proposed approach can learn control policies directly from high-dimensional raw camera pixel inputs for robotic swarms.


2021 ◽  
Author(s):  
Peter Wurman ◽  
Samuel Barrett ◽  
Kenta Kawamoto ◽  
James MacGlashan ◽  
Kaushik Subramanian ◽  
...  

Abstract Many potential applications of artificial intelligence involve making real-time decisions in physical systems. Automobile racing represents an extreme case of real-time decision making in close proximity to other highly-skilled drivers while near the limits of vehicular control. Racing simulations, such as the PlayStation game Gran Turismo, faithfully reproduce the nonlinear control challenges of real race cars while also encapsulating the complex multi-agent interactions. We attack, and solve for the first time, the simulated racing challenge using model-free deep reinforcement learning. We introduce a novel reinforcement learning algorithm and enhance the learning process with mixed scenario training to encourage the agent to incorporate racing tactics into an integrated control policy. In addition, we construct a reward function that enables the agent to adhere to the sport's under-specified racing etiquette rules. We demonstrate the capabilities of our agent, GT Sophy, by winning two of three races against four of the world's best Gran Turismo drivers and being competitive in the overall team score. By showing that these techniques can be successfully used to train championship-level race car drivers, we open up the possibility of their use in other complex dynamical systems and real-world applications.


Author(s):  
Sam Hamzeloo ◽  
Mansoor Zolghadri Jahromi

We present a new incremental fuzzy reinforcement learning algorithm to find a sub-optimal policy for infinite-horizon Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs). The algorithm addresses the high computational complexity of solving large Dec-POMDPs by generating a compact fuzzy rule-base for each agent. In our method, each agent uses its own fuzzy rule-base to make the decisions. The fuzzy rules in these rule-bases are incrementally created and tuned according to experiences of the agents. Reinforcement learning is used to tune the behavior of each agent in such a way that maximum global reward is achieved. In addition, we propose a method to construct the initial rule-base for each agent using the solution of the underlying MDP. This drastically improves the performance of the algorithm in comparison with random initialization of the rule-base. We assess the performance of our proposed method using several benchmark problems in comparison with some state-of-the-art methods. Experimental results show that our algorithm achieves better or similar reward when compared with other methods. However, from the runtime point of view, our method is superior to all previous methods. Using a compact fuzzy rule-base not only decreases the amount of memory used but also significantly speeds up the learning phase.


2020 ◽  
Vol 34 (05) ◽  
pp. 7325-7332
Author(s):  
Haifeng Zhang ◽  
Weizhe Chen ◽  
Zeren Huang ◽  
Minne Li ◽  
Yaodong Yang ◽  
...  

Coordination is one of the essential problems in multi-agent systems. Typically multi-agent reinforcement learning (MARL) methods treat agents equally and the goal is to solve the Markov game to an arbitrary Nash equilibrium (NE) when multiple equilibra exist, thus lacking a solution for NE selection. In this paper, we treat agents unequally and consider Stackelberg equilibrium as a potentially better convergence point than Nash equilibrium in terms of Pareto superiority, especially in cooperative environments. Under Markov games, we formally define the bi-level reinforcement learning problem in finding Stackelberg equilibrium. We propose a novel bi-level actor-critic learning method that allows agents to have different knowledge base (thus intelligent), while their actions still can be executed simultaneously and distributedly. The convergence proof is given, while the resulting learning algorithm is tested against the state of the arts. We found that the proposed bi-level actor-critic algorithm successfully converged to the Stackelberg equilibria in matrix games and find a asymmetric solution in a highway merge environment.


Author(s):  
Kenton Kirkpatrick ◽  
John Valasek ◽  
Dimitris Lagoudas

The ability to actively control the shape of aerospace structures has initiated research regarding the use of Shape Memory Alloy actuators. These actuators can be used for morphing or shape change by controlling their temperature, which is effectively done by applying a voltage difference across their length. The ability to characterize this temperature-strain relationship using Reinforcement Learning has been previously accomplished, but in order to control Shape Memory Alloy wires it is more beneficial to learn the voltage-position relationship. Numerical simulation using Reinforcement Learning has been used for determining the temperature-strain relationship for characterizing the major and minor hysteresis loops, and determining a limited control policy relating applied temperature to desired strain. Since Reinforcement Learning creates a non-parametric control policy, and there is not currently a general parametric model for this control policy, determining the voltage-position relationship for a Shape Memory Alloy is done separately. This paper extends earlier numerical simulation results and experimental results in temperature-strain space by applying a similar Reinforcement Learning algorithm to voltage-position space using an experimental hardware apparatus. Results presented in the paper show the ability to converge on a near-optimal control policy for Shape Memory Alloy length control by means of an improved Reinforcement Learning algorithm. These results demonstrate the power of Reinforcement Learning as a method of constructing a policy capable of controlling Shape Memory Alloy wire length.


2018 ◽  
Vol 882 ◽  
pp. 96-108 ◽  
Author(s):  
Jupiter Bakakeu ◽  
Schirin Tolksdorf ◽  
Jochen Bauer ◽  
Hans-Henning Klos ◽  
Jörn Peschke ◽  
...  

This paper addresses the problem of efficiently operating a flexible manufacturing machine in an electricity micro-grid featuring a high volatility of electricity prices. The problem of finding the optimal control policy is formulated as a sequential decision making problem under uncertainty where, at every time step the uncertainty comes from the lack of knowledge about fu-ture electricity consumption and future weather dependent energy prices. We propose to address this problem using deep reinforcement learning. To this purpose, we designed a deep learning architecture to forecast the load profile of future manufacturing schedule from past production time series. Combined with the forecast of future energy prices, the reinforcement-learning algorithm is trained to perform an online optimization of the production ma-chine in order to reduce the long-term energy costs. The concept is empirical-ly validated on a flexible production machine, where the machine speed can be optimized during the production.


Sign in / Sign up

Export Citation Format

Share Document