scholarly journals Experience Weighted Learning in Multiagent Systems

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yi Zou ◽  
Jijuan Zhong ◽  
Zhihao Jiang ◽  
Hong Zhang ◽  
Xuyu Pu

Agents face challenges to achieve adaptability and stability when interacting with dynamic counterparts in a complex multiagent system (MAS). To strike a balance between these two goals, this paper proposes a learning algorithm for heterogeneous agents with bounded rationality. It integrates reinforcement learning as well as fictitious play to evaluate the historical information and adopt mechanisms in evolutionary game to adapt to uncertainty, which is referred to as experience weighted learning (EWL) in this paper. We have conducted multiagent simulations to test the performance of EWL in various games. The results demonstrate that the average payoff of EWL exceeds that of the baseline in all 4 games. In addition, we find that most of the EWL agents converge to pure strategy and become stable finally. Furthermore, we test the impact of 2 import parameters, respectively. The results show that the performance of EWL is quite stable and there is a potential to improve its performance by parameter optimization.

2020 ◽  
Vol 309 ◽  
pp. 03026
Author(s):  
Xia Gao ◽  
Fangqin Xu

With the rapid development of Internet technology and mobile terminals, users’ demand for high-speed networks is increasing. Mobile edge computing proposes a distributed caching approach to deal with the impact of massive data traffic on communication networks, in order to reduce network latency and improve user service quality. In this paper, a deep reinforcement learning algorithm is proposed to solve the task unloading problem of multi-service nodes. The simulation platform iFogSim and data set Google Cluster Trace are used to carry out experiments. The final results show that the task offloading strategy based on DDQN algorithm has a good effect on energy consumption and cost, it has verified the application prospect of deep reinforcement learning algorithm in mobile edge computing.


2020 ◽  
pp. 377-386
Author(s):  
Samuel Obadan ◽  
Zenghui Wang

This paper introduces novel concepts for accelerating learning in an off-policy reinforcement learning algorithm for Partially Observable Markov Decision Processes (POMDP) by leveraging multiple agents frame work. Reinforcement learning (RL) algorithm is considerably a slow but elegant approach to learning in an unknown environment. Although the action-value (Q-learning) is faster than the state-value, the rate of convergence to an optimal policy or maximum cumulative reward remains a constraint. Consequently, in an attempt to optimize the learning phase of an RL problem within POMD environment, we present two multi-agent learning paradigms: the multi-agent off-policy reinforcement learning and an ingenious GA (genetic Algorithm) approach for multi-agent offline learning using feedforward neural networks. At the end of the trainings (episodes and epochs) for reinforcement learning and genetic algorithm respectively, we compare the convergence rate for both algorithms with respect to creating the underlying MDPs for POMDP problems. Finally, we demonstrate the impact of layered resampling of Monte Carlo’s particle filter for improving the belief state estimation accuracy with respect to ground truth within POMDP domains. Initial empirical results suggest practicable solutions.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yunmei Yuan ◽  
Hongyu Li ◽  
Lili Ji

Nowadays, finding the optimal route for vehicles through online vehicle path planning is one of the main problems that the logistics industry needs to solve. Due to the uncertainty of the transportation system, especially the last-mile delivery problem of small packages in uncertain logistics transportation, the calculation of logistics vehicle routing planning becomes more complex than before. Most of the existing solutions are less applied to new technologies such as machine learning, and most of them use a heuristic algorithm. This kind of solution not only needs to set a lot of constraints but also requires much calculation time in the logistics network with high demand density. To design the uncertain logistics transportation path with minimum time, this paper proposes a new optimization strategy based on deep reinforcement learning that converts the uncertain online logistics routing problems into vehicle path planning problems and designs an embedded pointer network for obtaining the optimal solution. Considering the long time to solve the neural network, it is unrealistic to train parameters through supervised data. This article uses an unsupervised method to train the parameters. Because the process of parameter training is offline, this strategy can avoid the high delay. Through the simulation part, it is not difficult to see that the strategy proposed in this paper will effectively solve the uncertain logistics scheduling problem under the limited computing time, and it is significantly better than other strategies. Compared with traditional mathematical procedures, the algorithm proposed in this paper can reduce the driving distance by 60.71%. In addition, this paper also studies the impact of some key parameters on the effect of the program.


Author(s):  
Qiaoling Zhou

PurposeEnglish original movies played an important role in English learning and communication. In order to find the required movies for us from a large number of English original movies and reviews, this paper proposed an improved deep reinforcement learning algorithm for the recommendation of movies. In fact, although the conventional movies recommendation algorithms have solved the problem of information overload, they still have their limitations in the case of cold start-up and sparse data.Design/methodology/approachTo solve the aforementioned problems of conventional movies recommendation algorithms, this paper proposed a recommendation algorithm based on the theory of deep reinforcement learning, which uses the deep deterministic policy gradient (DDPG) algorithm to solve the cold starting and sparse data problems and uses Item2vec to transform discrete action space into a continuous one. Meanwhile, a reward function combining with cosine distance and Euclidean distance is proposed to ensure that the neural network does not converge to local optimum prematurely.FindingsIn order to verify the feasibility and validity of the proposed algorithm, the state of the art and the proposed algorithm are compared in indexes of RMSE, recall rate and accuracy based on the MovieLens English original movie data set for the experiments. Experimental results have shown that the proposed algorithm is superior to the conventional algorithm in various indicators.Originality/valueApplying the proposed algorithm to recommend English original movies, DDPG policy produces better recommendation results and alleviates the impact of cold start and sparse data.


2019 ◽  
Author(s):  
Jennifer R Sadler ◽  
Grace Elisabeth Shearrer ◽  
Nichollette Acosta ◽  
Kyle Stanley Burger

BACKGROUND: Dietary restraint represents an individual’s intent to limit their food intake and has been associated with impaired passive food reinforcement learning. However, the impact of dietary restraint on an active, response dependent learning is poorly understood. In this study, we tested the relationship between dietary restraint and food reinforcement learning using an active, instrumental conditioning task. METHODS: A sample of ninety adults completed a response-dependent instrumental conditioning task with reward and punishment using sweet and bitter tastes. Brain response via functional MRI was measured during the task. Participants also completed anthropometric measures, reward/motivation related questionnaires, and a working memory task. Dietary restraint was assessed via the Dutch Restrained Eating Scale. RESULTS: Two groups were selected from the sample: high restraint (n=29, score >2.5) and low restraint (n=30; score <1.85). High restraint was associated with significantly higher BMI (p=0.003) and lower N-back accuracy (p=0.045). The high restraint group also was marginally better at the instrumental conditioning task (p=0.066, r=0.37). High restraint was also associated with significantly greater brain response in the intracalcarine cortex (MNI: 15, -69, 12; k=35, pfwe< 0.05) to bitter taste, compared to neutral taste.CONCLUSIONS: High restraint was associated with improved performance on an instrumental task testing how individuals learn from reward and punishment. This may be mediated by greater brain response in the primary visual cortex, which has been associated with mental representation. Results suggest that dietary restraint does not impair response-dependent reinforcement learning.


Author(s):  
Eyal Zamir ◽  
Doron Teichman

In the past few decades, economic analysis of law has been challenged by a growing body of experimental and empirical studies that attest to prevalent and systematic deviations from the assumptions of economic rationality. While the findings on bounded rationality and heuristics and biases were initially perceived as antithetical to standard economic and legal-economic analysis, over time they have been largely integrated into mainstream economic analysis, including economic analysis of law. Moreover, the impact of behavioral insights has long since transcended purely economic analysis of law: in recent years, the behavioral movement has become one of the most influential developments in legal scholarship in general. Behavioral Law and Economics offers a state-of-the-art overview of the field. The book surveys the entire body of psychological research underpinning behavioral analysis of law, and critically evaluates the core methodological questions of this area of research. The book then discusses the fundamental normative questions stemming from the psychological findings on bounded rationality, and explores their implications for establishing the aims of legislation, and the means of attaining them. This is followed by a systematic and critical examination of the contributions of behavioral studies to all major fields of law—property, contracts, consumer protection, torts, corporate, securities regulation, antitrust, administrative, constitutional, international, criminal, and evidence law—as well as to the behavior of key players in the legal arena: litigants and judicial decision-makers.


2019 ◽  
Vol 2019 ◽  
pp. 1-17
Author(s):  
Zhu Bai ◽  
Mingxia Huang ◽  
Shuai Bian ◽  
Huandong Wu

The emergence of online car-hailing service provides an innovative approach to vehicle booking but has negatively influenced the taxi industry in China. This paper modeled taxi service mode choice based on evolutionary game theory (EGT). The modes included the dispatching and online car-hailing modes. We constructed an EGT framework, including determining the strategies and the payoff matrix. We introduced different behaviors, including taxi company management, driver operation, and passenger choice. This allowed us to model the impact of these behaviors on the evolving process of service mode choice. The results show that adjustments in taxi company, driver, and passenger behaviors impact the evolutionary path and convergence speed of our evolutionary game model. However, it also reveals that, regardless of adjustments, the stable states in the game model remain unchanged. The conclusion provides a basis for studying taxi system operation and management.


Sign in / Sign up

Export Citation Format

Share Document