Development of Reinforcement Learning Methods in Control and Decision Making in the Large Scale Dynamic Game Environments

In this paper, deep reinforcement learning (DRL) and knowledge transfer are used to achieve the effective control of the learning agent for the confrontation in the multi-agent systems. Firstly, a multi-agent Deep Deterministic Policy Gradient (DDPG) algorithm with parameter sharing is proposed to achieve confrontation decision-making of multi-agent. In the process of training, the information of other agents is introduced to the critic network to improve the strategy of confrontation. The parameter sharing mechanism can reduce the loss of experience storage. In the DDPG algorithm, we use four neural networks to generate real-time action and Q-value function respectively and use a momentum mechanism to optimize the training process to accelerate the convergence rate for the neural network. Secondly, this paper introduces an auxiliary controller using a policy-based reinforcement learning (RL) method to achieve the assistant decision-making for the game agent. In addition, an effective reward function is used to help agents balance losses of enemies and our side. Furthermore, this paper also uses the knowledge transfer method to extend the learning model to more complex scenes and improve the generalization of the proposed confrontation model. Two confrontation decision-making experiments are designed to verify the effectiveness of the proposed method. In a small-scale task scenario, the trained agent can successfully learn to fight with the competitors and achieve a good winning rate. For large-scale confrontation scenarios, the knowledge transfer method can gradually improve the decision-making level of the learning agent.

Download Full-text

Multiagent Decision Making and Learning in Urban Environments

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/895 ◽

2019 ◽

Author(s):

Akshat Kumar

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Strategies ◽

Multiagent Systems ◽

Intelligent Agents ◽

Large Scale ◽

Urban Environments ◽

Aggregate Level ◽

Domain Models ◽

Self Driving Cars

Our increasingly interconnected urban environments provide several opportunities to deploy intelligent agents---from self-driving cars, ships to aerial drones---that promise to radically improve productivity and safety. Achieving coordination among agents in such urban settings presents several algorithmic challenges---ability to scale to thousands of agents, addressing uncertainty, and partial observability in the environment. In addition, accurate domain models need to be learned from data that is often noisy and available only at an aggregate level. In this paper, I will overview some of our recent contributions towards developing planning and reinforcement learning strategies to address several such challenges present in large-scale urban multiagent systems.

Download Full-text

Behavioural and neural limits in competitive decision making: The roles of outcome, opponency and observation

10.1101/571257 ◽

2019 ◽

Cited By ~ 1

Author(s):

Benjamin James Dyson ◽

Ben Albert Steward ◽

Tea Meneghetti ◽

Lewis Forder

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Visual Attention ◽

Large Scale ◽

Environmental Responsibility ◽

Positive Outcomes ◽

Outcome Valence ◽

Future Events ◽

Neural Gain ◽

Random Behaviour

AbstractTo understand the boundaries we set for ourselves in terms of environmental responsibility during competition, we examined a neural index of outcome valence (feedback-related negativity; FRN) in relation to earlier indices of visual attention (N1), later indices of motivational significance (P3), and, eventual behaviour. In Experiment 1 (n=36), participants either were (play) or were not (observe) responsible for action selection. In Experiment 2 (n=36), opponents additionally either could (exploitable) or could not (unexploitable) be beaten. Various failures in reinforcement learning expression were revealed including large-scale approximations of random behaviour. Against unexploitable opponents, N1 determined the extent to which negative and positive outcomes were perceived as distinct categories by FRN. Against exploitable opponents, FRN determined the extent to which P3 generated neural gain for future events. Differential activation of the N1 – FRN – P3 processing chain provides a framework for understanding the behavioural dynamism observed during competitive decision making.

Download Full-text

To hop or not, that is the question: Towards effective multi-hop reasoning over knowledge graphs

World Wide Web ◽

10.1007/s11280-021-00911-5 ◽

2021 ◽

Author(s):

Jinzhi Liao ◽

Xiang Zhao ◽

Jiuyang Tang ◽

Weixin Zeng ◽

Zhen Tan

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Supervised Learning ◽

Large Scale ◽

State Of The Art ◽

False Negative ◽

Stop Signal ◽

Knowledge Graph ◽

Overall Performance ◽

Knowledge Graphs

AbstractWith the proliferation of large-scale knowledge graphs (KGs), multi-hop knowledge graph reasoning has been a capstone that enables machines to be able to handle intelligent tasks, especially where some explicit reasoning path is appreciated for decision making. To train a KG reasoner, supervised learning-based methods suffer from false-negative issues, i.e., unseen paths during training are not to be found in prediction; in contrast, reinforcement learning (RL)-based methods do not require labeled paths, and can explore to cover many appropriate reasoning paths. In this connection, efforts have been dedicated to investigating several RL formulations for multi-hop KG reasoning. Particularly, current RL-based methods generate rewards at the very end of the reasoning process, due to which short paths of hops less than a given threshold are likely to be overlooked, and the overall performance is impaired. To address the problem, we propose , a revised RL formulation of multi-hop KG reasoning that is characterized by two novel designs—the stop signal and the worth-trying signal. The stop signal instructs the agent of RL to stay at the entity after finding the answer, preventing from hopping further even if the threshold is not reached; meanwhile, the worth-trying signal encourages the agent to try to learn some partial patterns from the paths that fail to lead to the answer. To validate the design of our model , comprehensive experiments are carried out on three benchmark knowledge graphs, and the results and analysis suggest the superiority of over state-of-the-art methods.

Download Full-text

Expertise Based Cooperative Reinforcement Learning Methods (ECRLM) for Dynamic Decision Making in Retail Shop Application

Information and Communication Technology for Intelligent Systems (ICTIS 2017) - Volume 2 - Smart Innovation, Systems and Technologies ◽

10.1007/978-3-319-63645-0_39 ◽

2017 ◽

pp. 350-360 ◽

Cited By ~ 2

Author(s):

Deepak A. Vidhate ◽

Parag Kulkarni

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Dynamic Decision Making ◽

Learning Methods ◽

Retail Shop

Download Full-text

Autonomous Lane Change Decision Making Using Different Deep Reinforcement Learning Methods

CICTP 2019 ◽

10.1061/9780784482292.479 ◽

2019 ◽

Author(s):

Xidong Feng ◽

Jianming Hu ◽

Yusen Huo ◽

Yi Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Lane Change ◽

Learning Methods

Download Full-text

Leadership and Decision-Making Styles in Large-Scale Sporting Events

Event Management ◽

10.3727/152599518x15299559876162 ◽

2018 ◽

Vol 22 (5) ◽

pp. 785-801 ◽

Cited By ~ 1

Author(s):

Majd Megheirkouni

Keyword(s):

Decision Making ◽

Large Scale ◽

Sporting Events ◽

Decision Making Styles

Download Full-text

A Drug Target Interaction Prediction Based on LINE-RF Learning

Current Bioinformatics ◽

10.2174/1574893615666191227092453 ◽

2020 ◽

Vol 15 (7) ◽

pp. 750-757

Author(s):

Jihong Wang ◽

Yue Shi ◽

Xiaodan Wang ◽

Huiyou Chang

Keyword(s):

Network Topology ◽

Drug Target ◽

Large Scale ◽

Representation Learning ◽

New Drugs ◽

Combination Method ◽

Learning Methods ◽

Network Representation ◽

On Line ◽

Clinical Experiments

Background: At present, using computer methods to predict drug-target interactions (DTIs) is a very important step in the discovery of new drugs and drug relocation processes. The potential DTIs identified by machine learning methods can provide guidance in biochemical or clinical experiments. Objective: The goal of this article is to combine the latest network representation learning methods for drug-target prediction research, improve model prediction capabilities, and promote new drug development. Methods: We use large-scale information network embedding (LINE) method to extract network topology features of drugs, targets, diseases, etc., integrate features obtained from heterogeneous networks, construct binary classification samples, and use random forest (RF) method to predict DTIs. Results: The experiments in this paper compare the common classifiers of RF, LR, and SVM, as well as the typical network representation learning methods of LINE, Node2Vec, and DeepWalk. It can be seen that the combined method LINE-RF achieves the best results, reaching an AUC of 0.9349 and an AUPR of 0.9016. Conclusion: The learning method based on LINE network can effectively learn drugs, targets, diseases and other hidden features from the network topology. The combination of features learned through multiple networks can enhance the expression ability. RF is an effective method of supervised learning. Therefore, the Line-RF combination method is a widely applicable method.

Download Full-text

Applications of blockchain in healthcare (Preprint)

10.2196/preprints.17777 ◽

2020 ◽

Author(s):

Pranav C

Keyword(s):

Decision Making ◽

Large Scale ◽

Review Paper ◽

New Technology ◽

Personal Identification ◽

Healthcare Industry ◽

Healthcare Organizations ◽

Paper Briefly ◽

Blockchain Technology ◽

Audit Trails

UNSTRUCTURED The word blockchain elicits thoughts of cryptocurrency much of the time, which does disservice to this disruptive new technology. Agreed, bitcoin launched in 2011 was the first large scale implementation of blockchain technology. Also, Bitcoin’s success has triggered the establishment of nearly 1000 new cryptocurrencies. This again lead to the delusion that the only application of blockchain technology is for the creation of cryptocurrency. However, the blockchain technology is capable of a lot more than just cryptocurrency creation and may support such things as transactions that require personal identification, peer review, elections and other types of democratic decision-making and audit trails. Blockchain exists with real world implementations beyond cryptocurrencies and these solutions deliver powerful benefits to healthcare organizations, bankers, retailers and consumers among others. One of the areas where blockchain technology can be used effectively is healthcare industry. Proper application of this technology in healthcare will not only save billions of money but also will contribute to the growth in research. This review paper briefly defines blockchain and deals in detail the applications of blockchain in various areas particularly in healthcare industry.

Download Full-text