Multi Agent Deep Learning with Cooperative Communication

AbstractWe consider the problem of multi agents cooperating in a partially-observable environment. Agents must learn to coordinate and share relevant information to solve the tasks successfully. This article describes Asynchronous Advantage Actor-Critic with Communication (A3C2), an end-to-end differentiable approach where agents learn policies and communication protocols simultaneously. A3C2 uses a centralized learning, distributed execution paradigm, supports independent agents, dynamic team sizes, partially-observable environments, and noisy communications. We compare and show that A3C2 outperforms other state-of-the-art proposals in multiple environments.

Download Full-text

Exploring communication protocols and centralized critics in multi-agent deep learning

Integrated Computer-Aided Engineering ◽

10.3233/ica-200631 ◽

2020 ◽

Vol 27 (4) ◽

pp. 333-351

Author(s):

David Simões ◽

Nuno Lau ◽

Luís Paulo Reis

Keyword(s):

Deep Learning ◽

Human Performance ◽

Learning Algorithm ◽

Single Agent ◽

Communication Protocols ◽

Relevant Information ◽

Team Members ◽

Deep Learning Algorithm ◽

Multi Agent ◽

Partially Observable

Tackling multi-agent environments where each agent has a local limited observation of the global state is a non-trivial task that often requires hand-tuned solutions. A team of agents coordinating in such scenarios must handle the complex underlying environment, while each agent only has partial knowledge about the environment. Deep reinforcement learning has been shown to achieve super-human performance in single-agent environments, and has since been adapted to the multi-agent paradigm. This paper proposes A3C3, a multi-agent deep learning algorithm, where agents are evaluated by a centralized referee during the learning phase, but remain independent from each other in actual execution. This referee’s neural network is augmented with a permutation invariance architecture to increase its scalability to large teams. A3C3 also allows agents to learn communication protocols with which agents share relevant information to their team members, allowing them to overcome their limited knowledge, and achieve coordination. A3C3 and its permutation invariant augmentation is evaluated in multiple multi-agent test-beds, which include partially-observable scenarios, swarm environments, and complex 3D soccer simulations.

Download Full-text

IPOMDP-Net: A Deep Neural Network for Partially Observable Multi-Agent Planning Using Interactive POMDPs

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016062 ◽

2019 ◽

Vol 33 ◽

pp. 6062-6069 ◽

Cited By ~ 1

Author(s):

Yanlin Han ◽

Piotr Gmytrasiewicz

Keyword(s):

Neural Network ◽

Network Architecture ◽

State Of The Art ◽

Neural Computing ◽

Neural Network Architecture ◽

Markov Decision ◽

Planning Algorithm ◽

Multi Agent ◽

Partially Observable ◽

Multi Agent Planning

This paper introduces the IPOMDP-net, a neural network architecture for multi-agent planning under partial observability. It embeds an interactive partially observable Markov decision process (I-POMDP) model and a QMDP planning algorithm that solves the model in a neural network architecture. The IPOMDP-net is fully differentiable and allows for end-to-end training. In the learning phase, we train an IPOMDP-net on various fixed and randomly generated environments in a reinforcement learning setting, assuming observable reinforcements and unknown (randomly initialized) model functions. In the planning phase, we test the trained network on new, unseen variants of the environments under the planning setting, using the trained model to plan without reinforcements. Empirical results show that our model-based IPOMDP-net outperforms the other state-of-the-art modelfree network and generalizes better to larger, unseen environments. Our approach provides a general neural computing architecture for multi-agent planning using I-POMDPs. It suggests that, in a multi-agent setting, having a model of other agents benefits our decision-making, resulting in a policy of higher quality and better generalizability.

Download Full-text

A NONLINEAR MODEL FOR A SMART SEMANTIC BROWSER BOT FOR A TEXT ATTRIBUTE RECOGNITION

Fractals ◽

10.1142/s0218348x20500450 ◽

2020 ◽

Vol 28 (02) ◽

pp. 2050045

Author(s):

R. CARREÑO AGUILERA ◽

M. A. ACEVEDO MOSQUEDA ◽

M. E. ACEVEDO MOSQUEDA ◽

S. L. GOMEZ CORONEL ◽

I. ALGREDO BADILLO ◽

...

Keyword(s):

Artificial Intelligence ◽

Deep Learning ◽

Expert System ◽

State Of The Art ◽

Relevant Information ◽

Technical Aspect ◽

Stability Study ◽

Learning Tools ◽

Attribute Recognition ◽

Deep Learning Neural Network

In spite of the advances in the state of the art in semantic artificial intelligence applications, there is still a long way to go to bring it to a level of mass adoption. Thus, in order to contribute to the advancement of this topic, this study develops a feasible model with a potential scalability for semantic applications’ mass adoption, specifically for news or statement cluster attribute identification, either positive, negative or neutral. This paper proposes a disruptive system based on Blockchain using a Semantic Browser Expert System Bot with artificial intelligence called Blockchain Semantic Browser Expert System (BSBES) to look for and analyze relevant information that significantly represents the cryptocurrencies adoption patterns. The artificial intelligence in this study consists of a deep learning neural network to process the input information to identify the news pattern in a semantic way using deep learning based on two aspects of the news: technical aspect and adoption aspect of the cryptocurrencies. BSBES performance is achieved based on deep learning tools, and scalability is supported by a blockchain system including a stability study.

Download Full-text

Multi-agent reinforcement learning for character control

The Visual Computer ◽

10.1007/s00371-021-02269-1 ◽

2021 ◽

Author(s):

Cheng Li ◽

Levi Fussell ◽

Taku Komura

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Computer Games ◽

State Of The Art ◽

Research Topic ◽

Future Directions ◽

Computer Animations ◽

Survey Papers ◽

Simultaneous Control ◽

Multi Agent

AbstractSimultaneous control of multiple characters has been a research topic that has been extensively pursued for applications in computer games and computer animations, for applications such as crowd simulation, controlling two characters carrying objects or fighting with one another and controlling a team of characters playing collective sports. With the advance in deep learning and reinforcement learning, there is a growing interest in applying multi-agent reinforcement learning for intelligently controlling the characters to produce realistic movements. In this paper we will survey the state-of-the-art MARL techniques that are applicable for character control. We will then survey papers that make use of MARL for multi-character control and then discuss about the possible future directions of research.

Download Full-text

Improving Policies via Search in Cooperative Partially Observable Games

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6208 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7187-7194

Author(s):

Adam Lerer ◽

Hengyuan Hu ◽

Jakob Foerster ◽

Noam Brown

Keyword(s):

Social Interactions ◽

State Of The Art ◽

Common Knowledge ◽

Single Agent ◽

Approximation Error ◽

Original Performance ◽

Search Technique ◽

Multi Agent ◽

Zero Sum ◽

Partially Observable

Recent superhuman results in games have largely been achieved in a variety of zero-sum settings, such as Go and Poker, in which agents need to compete against others. However, just like humans, real-world AI systems have to coordinate and communicate with other agents in cooperative partially observable environments as well. These settings commonly require participants to both interpret the actions of others and to act in a way that is informative when being interpreted. Those abilities are typically summarized as theory of mind and are seen as crucial for social interactions. In this paper we propose two different search techniques that can be applied to improve an arbitrary agreed-upon policy in a cooperative partially observable game. The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy. In contrast, in multi-agent search all agents carry out the same common-knowledge search procedure whenever doing so is computationally feasible, and fall back to playing according to the agreed-upon policy otherwise. We prove that these search procedures are theoretically guaranteed to at least maintain the original performance of the agreed-upon policy (up to a bounded approximation error). In the benchmark challenge problem of Hanabi, our search technique greatly improves the performance of every agent we tested and when applied to a policy trained using RL achieves a new state-of-the-art score of 24.61 / 25 in the game, compared to a previous-best of 24.08 / 25.

Download Full-text

Iterative Planning for Deterministic QDec-POMDPs

10.29007/4t8s ◽

2018 ◽

Author(s):

Sagi Bazinin ◽

Guy Shani

Keyword(s):

Scale Up ◽

Action Execution ◽

Worst Case ◽

State Spaces ◽

Case Complexity ◽

Worst Case Complexity ◽

Distributed Execution ◽

Multi Agent ◽

Partially Observable ◽

Joint Plan

QDec-POMDPs are a qualitative alternative to stochastic Dec-POMDPs for goal-oriented plan- ning in cooperative partially observable multi-agent environments. Although QDec-POMDPs share the same worst case complexity as Dec-POMDPs, previous research has shown an ability to scale up to larger domains while producing high quality plan trees. A key difficulty in distributed execution is the need to construct a joint plan tree branching on the combinations of observations of all agents. In this work, we suggest an iterative algorithm, IMAP, that plans for one agent at a time, taking into considerations collaboration constraints about action execution of previous agents, and generating new constraints for the next agents. We explain how these constraints are generated and handled, and a backtracking mechanism for changing constraints that cannot be met. We provide experimental results on multi-agent planning domains, showing our methods to scale to much larger problems with several collaborating agents and huge state spaces.

Download Full-text

Sky Station Stratospheric Disaster Recovery System by Applying HAPs and Cooperative Communication Protocols

2nd International Conference on Research in Science, Engineering and Technology (ICRSET’2014), March 21-22, 2014 Dubai (UAE) ◽

10.15242/iie.e0314597 ◽

2014 ◽

Keyword(s):

Cooperative Communication ◽

Disaster Recovery ◽

Communication Protocols ◽

Recovery System ◽

Disaster Recovery System

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text

Data science in economics: comprehensive review of advanced machine learning and deep learning methods

10.31232/osf.io/4pxq2 ◽

2020 ◽

Author(s):

Saeed Nosratabadi ◽

Amir Mosavi ◽

Puhong Duan ◽

Pedram Ghamisi ◽

Ferdinand Filip ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Science ◽

State Of The Art ◽

Science Methods ◽

Learning Models ◽

Diverse Range ◽

Hybrid Machine ◽

Economics Research

This paper provides a state-of-the-art investigation of advances in data science in emerging economic applications. The analysis was performed on novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a wide and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, was used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which, based on the accuracy metric, outperform other learning algorithms. It is further expected that the trends will converge toward the advancements of sophisticated hybrid deep learning models.

Download Full-text