scholarly journals The Agent Web Model: modeling web hacking for reinforcement learning

Author(s):  
László Erdődi ◽  
Fabio Massimo Zennaro

AbstractWebsite hacking is a frequent attack type used by malicious actors to obtain confidential information, modify the integrity of web pages or make websites unavailable. The tools used by attackers are becoming more and more automated and sophisticated, and malicious machine learning agents seem to be the next development in this line. In order to provide ethical hackers with similar tools, and to understand the impact and the limitations of artificial agents, we present in this paper a model that formalizes web hacking tasks for reinforcement learning agents. Our model, named Agent Web Model, considers web hacking as a capture-the-flag style challenge, and it defines reinforcement learning problems at seven different levels of abstraction. We discuss the complexity of these problems in terms of actions and states an agent has to deal with, and we show that such a model allows to represent most of the relevant web vulnerabilities. Aware that the driver of advances in reinforcement learning is the availability of standardized challenges, we provide an implementation for the first three abstraction layers, in the hope that the community would consider these challenges in order to develop intelligent web hacking agents.

Biomimetics ◽  
2021 ◽  
Vol 6 (1) ◽  
pp. 13
Author(s):  
Adam Bignold ◽  
Francisco Cruz ◽  
Richard Dazeley ◽  
Peter Vamplew ◽  
Cameron Foale

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.


2021 ◽  
Vol 11 (4) ◽  
pp. 1514 ◽  
Author(s):  
Quang-Duy Tran ◽  
Sang-Hoon Bae

To reduce the impact of congestion, it is necessary to improve our overall understanding of the influence of the autonomous vehicle. Recently, deep reinforcement learning has become an effective means of solving complex control tasks. Accordingly, we show an advanced deep reinforcement learning that investigates how the leading autonomous vehicles affect the urban network under a mixed-traffic environment. We also suggest a set of hyperparameters for achieving better performance. Firstly, we feed a set of hyperparameters into our deep reinforcement learning agents. Secondly, we investigate the leading autonomous vehicle experiment in the urban network with different autonomous vehicle penetration rates. Thirdly, the advantage of leading autonomous vehicles is evaluated using entire manual vehicle and leading manual vehicle experiments. Finally, the proximal policy optimization with a clipped objective is compared to the proximal policy optimization with an adaptive Kullback–Leibler penalty to verify the superiority of the proposed hyperparameter. We demonstrate that full automation traffic increased the average speed 1.27 times greater compared with the entire manual vehicle experiment. Our proposed method becomes significantly more effective at a higher autonomous vehicle penetration rate. Furthermore, the leading autonomous vehicles could help to mitigate traffic congestion.


AI Magazine ◽  
2014 ◽  
Vol 35 (3) ◽  
pp. 61-65 ◽  
Author(s):  
Christos Dimitrakakis ◽  
Guangliang Li ◽  
Nikoalos Tziortziotis

Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.


Author(s):  
Hsiu-Feng Wang ◽  
Pei-Yu Wang ◽  
Ching-Chih Liao ◽  
Yu-Yin Lin

This chapter examines children's aesthetic preferences for learning Web pages designed for them. It applies Berlyne's theory of aesthetic preference to these Web pages: a theory that suggests that people prefer a medium level of stimuli to a low or high level of stimuli. The experiment employs a 3 x 2 x 2 between-subject design; it explores perceived visual complexity, gender, cognitive style, and aesthetic preference. A total of 120 children (60 boys and 60 girls) aged between 11 to 12 years-old take part in the experiment. The children are asked to rate learning Web pages of different levels of perceived visual complexity for aesthetic preference. These Web pages have been created by the authors. The results of the experiment show that overall the children prefer Web pages that display a medium level of perceived visual complexity to those that display a high or low level of perceived visual complexity. Thus, the results support Berlyne's theory. However, when aesthetic preference is analysed with respect to gender, it is found that different levels of perceived visual complexity have an impact on boys' aesthetic preferences but not girls'. In other words, Berylne's theory is only partly supported. Likewise, Berylne's theory is only partly supported when aesthetic preference is analysed with respect to cognitive style. Here, imagers prefer a high level of perceived visual complexity and verbalisers prefer a medium level of perceived visual complexity. This chapter should be of interest to anyone who designs learning Web pages for children.


2021 ◽  
Vol 12 ◽  
Author(s):  
Lillian M. Rigoli ◽  
Gaurav Patil ◽  
Hamish F. Stening ◽  
Rachel W. Kallen ◽  
Michael J. Richardson

Rapid advances in the field of Deep Reinforcement Learning (DRL) over the past several years have led to artificial agents (AAs) capable of producing behavior that meets or exceeds human-level performance in a wide variety of tasks. However, research on DRL frequently lacks adequate discussion of the low-level dynamics of the behavior itself and instead focuses on meta-level or global-level performance metrics. In doing so, the current literature lacks perspective on the qualitative nature of AA behavior, leaving questions regarding the spatiotemporal patterning of their behavior largely unanswered. The current study explored the degree to which the navigation and route selection trajectories of DRL agents (i.e., AAs trained using DRL) through simple obstacle ridden virtual environments were equivalent (and/or different) from those produced by human agents. The second and related aim was to determine whether a task-dynamical model of human route navigation could not only be used to capture both human and DRL navigational behavior, but also to help identify whether any observed differences in the navigational trajectories of humans and DRL agents were a function of differences in the dynamical environmental couplings.


2019 ◽  
Vol 11 (2) ◽  
pp. 1-12 ◽  
Author(s):  
Reshu Agarwal ◽  
Mandeep Mittal

Popular data mining methods support knowledge discovery from patterns that hold in relations. For many applications, it is difficult to find strong associations among data items at low or primitive levels of abstraction. Mining association rules at multiple levels may lead to more informative and refined knowledge from data. Multi-level association rule mining is a variation of association rule mining for finding relationships between items at each level by applying different thresholds at different levels. In this study, an inventory classification policy is provided. At each level, the loss profit of frequent items is determined. The obtained loss profit is used to rank frequent items at each level with respect to their category, content and brand. This helps inventory manager to determine the most profitable item with respect to their category, content and brand. An example is illustrated to validate the results. Further, to comprehend the impact of above approach in the real scenario, experiments are conducted on the exiting dataset.


Author(s):  
Maria Giulia Ballatore ◽  
Ettore Felisatti ◽  
Laura Montanaro ◽  
Anita Tabacco

This paper is aimed to describe and critically analyze the so-called "TEACHPOT" experience (POT: Provide Opportunities in Teaching) performed during the last few years at Politecnico di Torino. Due to career criteria, the effort and the time lecturers spend in teaching have currently undergone a significant reduction in quantity. In order to support and meet each lecturers' expectations towards an improvement in their ability to teach, a mix of training opportunities has been provided. This consists of an extremely wide variety of experiences, tools, relationships, from which everyone can feel inspired to increase the effectiveness of their teaching and the participation of their students. The provided activities are designed around three main components: methodological training, teaching technologies, methodological experiences. A discussion on the findings is included and presented basing on the data collected through a survey. The impact of the overall experience can be evaluated on two different levels: the real effect on redesigning lessons, and the discussion on the matter within the entire academic community.


2019 ◽  
Author(s):  
Jennifer R Sadler ◽  
Grace Elisabeth Shearrer ◽  
Nichollette Acosta ◽  
Kyle Stanley Burger

BACKGROUND: Dietary restraint represents an individual’s intent to limit their food intake and has been associated with impaired passive food reinforcement learning. However, the impact of dietary restraint on an active, response dependent learning is poorly understood. In this study, we tested the relationship between dietary restraint and food reinforcement learning using an active, instrumental conditioning task. METHODS: A sample of ninety adults completed a response-dependent instrumental conditioning task with reward and punishment using sweet and bitter tastes. Brain response via functional MRI was measured during the task. Participants also completed anthropometric measures, reward/motivation related questionnaires, and a working memory task. Dietary restraint was assessed via the Dutch Restrained Eating Scale. RESULTS: Two groups were selected from the sample: high restraint (n=29, score >2.5) and low restraint (n=30; score <1.85). High restraint was associated with significantly higher BMI (p=0.003) and lower N-back accuracy (p=0.045). The high restraint group also was marginally better at the instrumental conditioning task (p=0.066, r=0.37). High restraint was also associated with significantly greater brain response in the intracalcarine cortex (MNI: 15, -69, 12; k=35, pfwe< 0.05) to bitter taste, compared to neutral taste.CONCLUSIONS: High restraint was associated with improved performance on an instrumental task testing how individuals learn from reward and punishment. This may be mediated by greater brain response in the primary visual cortex, which has been associated with mental representation. Results suggest that dietary restraint does not impair response-dependent reinforcement learning.


Author(s):  
Ivan Herreros

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.


Sign in / Sign up

Export Citation Format

Share Document