Selective auditory attention detection using dynamic learning systems: The study of RNN and reinforcement learning

Mapping Intimacies ◽

10.1101/2021.02.18.431748 ◽

2021 ◽

Author(s):

Masoud Geravanchizadeh ◽

Hossein Roushan

Keyword(s):

Reinforcement Learning ◽

Detection System ◽

Auditory Attention ◽

Final Decision ◽

Learning Approaches ◽

Cocktail Party ◽

Dynamic Learning ◽

Learning Stage ◽

Q Learning ◽

Markov Decision

AbstractThe cocktail party phenomenon describes the ability of the human brain to focus auditory attention on a particular stimulus while ignoring other acoustic events. Selective auditory attention detection (SAAD) is an important issue in the development of brain-computer interface systems and cocktail party processors. This paper proposes a new dynamic attention detection system to process the temporal evolution of the input signal. In the proposed dynamic system, after preprocessing of the input signals, the probabilistic state space of the system is formed. Then, in the learning stage, different dynamic learning methods, including recurrent neural network (RNN) and reinforcement learning (Markov decision process (MDP) and deep Q-learning) are applied to make the final decision as to the attended speech. Among different dynamic learning approaches, the evaluation results show that the deep Q-learning approach (MDP+RNN) provides the highest classification accuracy (94.2%) with the least detection delay. The proposed SAAD system is advantageous, in the sense that the detection of attention is performed dynamically for the sequential inputs. Also, the system has the potential to be used in scenarios, where the attention of the listener might be switched in time in the presence of various acoustic events.

Download Full-text

Dynamic selective auditory attention detection using RNN and reinforcement learning

Scientific Reports ◽

10.1038/s41598-021-94876-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Masoud Geravanchizadeh ◽

Hossein Roushan

Keyword(s):

Reinforcement Learning ◽

Temporal Evolution ◽

Detection System ◽

Auditory Attention ◽

Learning Approaches ◽

Sequential Decision ◽

Cocktail Party ◽

Dynamic Learning ◽

Detection Delay ◽

Decision Making Problem

AbstractThe cocktail party phenomenon describes the ability of the human brain to focus auditory attention on a particular stimulus while ignoring other acoustic events. Selective auditory attention detection (SAAD) is an important issue in the development of brain-computer interface systems and cocktail party processors. This paper proposes a new dynamic attention detection system to process the temporal evolution of the input signal. The proposed dynamic SAAD is modeled as a sequential decision-making problem, which is solved by recurrent neural network (RNN) and reinforcement learning methods of Q-learning and deep Q-learning. Among different dynamic learning approaches, the evaluation results show that the deep Q-learning approach with RNN as agent provides the highest classification accuracy (94.2%) with the least detection delay. The proposed SAAD system is advantageous, in the sense that the detection of attention is performed dynamically for the sequential inputs. Also, the system has the potential to be used in scenarios, where the attention of the listener might be switched in time in the presence of various acoustic events.

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text

Reinforcement Learning Applied to a Differential Game

Adaptive Behavior ◽

10.1177/105971239500400102 ◽

1995 ◽

Vol 4 (1) ◽

pp. 3-28 ◽

Cited By ~ 15

Author(s):

Mance E. Harmon ◽

Leemon C. Baird ◽

A. Harry Klopf

Keyword(s):

Reinforcement Learning ◽

Differential Game ◽

Learning Algorithm ◽

Learning System ◽

Test Bed ◽

Linear Quadratic ◽

Time Step ◽

Q Learning ◽

Step Duration ◽

Markov Decision

An application of reinforcement learning to a linear-quadratic, differential game is presented. The reinforcement learning system uses a recently developed algorithm, the residual-gradient form of advantage updating. The game is a Markov decision process with continuous time, states, and actions, linear dynamics, and a quadratic cost function. The game consists of two players, a missile and a plane; the missile pursues the plane and the plane evades the missile. Although a missile and plane scenario was the chosen test bed, the reinforcement learning approach presented here is equally applicable to biologically based systems, such as a predator pursuing prey. The reinforcement learning algorithm for optimal control is modified for differential games to find the minimax point rather than the maximum. Simulation results are compared to the analytical solution, demonstrating that the simulated reinforcement learning system converges to the optimal answer. The performance of both the residual-gradient and non-residual-gradient forms of advantage updating and Q-learning are compared, demonstrating that advantage updating converges faster than Q-learning in all simulations. Advantage updating also is demonstrated to converge regardless of the time step duration; Q-learning is unable to converge as the time step duration grows small.

Download Full-text

Reinforcement Learning under Threats

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019939 ◽

2019 ◽

Vol 33 ◽

pp. 9939-9940 ◽

Cited By ~ 1

Author(s):

Victor Gallego ◽

Roi Naveiro ◽

David Rios Insua

Keyword(s):

Reinforcement Learning ◽

Single Agent ◽

Potential Threat ◽

Q Learning ◽

Learning Framework ◽

Opponent Modeling ◽

Theoretical Approaches ◽

New Learning ◽

Markov Decision ◽

Multi Agent

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.

Download Full-text

Reinforcement Learning vs Genetic Algorithms in Game-Theoretic Cyber-Security

10.31237/osf.io/nxzep ◽

2018 ◽

Cited By ~ 1

Author(s):

Stefan Niculae

Keyword(s):

Reinforcement Learning ◽

Cyber Security ◽

Large Scale ◽

Human Performance ◽

Learning Approaches ◽

Classifier Systems ◽

Penetration Testing ◽

Q Learning ◽

Game Theoretic ◽

Security Game

Penetration testing is the practice of performing a simulated attack on a computer system in order to reveal its vulnerabilities. The most common approach is to gain information and then plan and execute the attack manually, by a security expert. This manual method cannot meet the speed and frequency required for efficient, large-scale secu- rity solutions development. To address this, we formalize penetration testing as a security game between an attacker who tries to compro- mise a network and a defending adversary actively protecting it. We compare multiple algorithms for finding the attacker’s strategy, from fixed-strategy to Reinforcement Learning, namely Q-Learning (QL), Extended Classifier Systems (XCS) and Deep Q-Networks (DQN). The attacker’s strength is measured in terms of speed and stealthi- ness, in the specific environment used in our simulations. The results show that QL surpasses human performance, XCS yields worse than human performance but is more stable, and the slow convergence of DQN keeps it from achieving exceptional performance, in addition, we find that all of these Machine Learning approaches outperform fixed-strategy attackers.

Download Full-text

Reinforcement Learning-Based Multihop Relaying: A Decentralized Q-Learning Approach

Entropy ◽

10.3390/e23101310 ◽

2021 ◽

Vol 23 (10) ◽

pp. 1310

Author(s):

Xiaowei Wang ◽

Xin Wang

Keyword(s):

Reinforcement Learning ◽

Relay Selection ◽

High Performance ◽

Q Learning ◽

Performance Loss ◽

Signaling Overhead ◽

Markov Decision ◽

Selection For ◽

Conventional Optimization ◽

Optimal Average

Conventional optimization-based relay selection for multihop networks cannot resolve the conflict between performance and cost. The optimal selection policy is centralized and requires local channel state information (CSI) of all hops, leading to high computational complexity and signaling overhead. Other optimization-based decentralized policies cause non-negligible performance loss. In this paper, we exploit the benefits of reinforcement learning in relay selection for multihop clustered networks and aim to achieve high performance with limited costs. Multihop relay selection problem is modeled as Markov decision process (MDP) and solved by a decentralized Q-learning scheme with rectified update function. Simulation results show that this scheme achieves near-optimal average end-to-end (E2E) rate. Cost analysis reveals that it also reduces computation complexity and signaling overhead compared with the optimal scheme.

Download Full-text

Reinforcement Learning with Non-Markovian Rewards

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5814 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3980-3987

Author(s):

Maor Gaon ◽

Ronen Brafman

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Decision Process ◽

Policy Learning ◽

Learning From Experience ◽

World Model ◽

Basic Premise ◽

Q Learning ◽

Markov Decision ◽

Automata Learning

The standard RL world model is that of a Markov Decision Process (MDP). A basic premise of MDPs is that the rewards depend on the last state and action only. Yet, many real-world rewards are non-Markovian. For example, a reward for bringing coffee only if requested earlier and not yet served, is non-Markovian if the state only records current requests and deliveries. Past work considered the problem of modeling and solving MDPs with non-Markovian rewards (NMR), but we know of no principled approaches for RL with NMR. Here, we address the problem of policy learning from experience with such rewards. We describe and evaluate empirically four combinations of the classical RL algorithm Q-learning and R-max with automata learning algorithms to obtain new RL algorithms for domains with NMR. We also prove that some of these variants converge to an optimal policy in the limit.

Download Full-text

Reinforcement Learning Rebirth, Techniques, Challenges, and Resolutions

JOIV International Journal on Informatics Visualization ◽

10.30630/joiv.4.3.376 ◽

2020 ◽

Vol 4 (3) ◽

Author(s):

Wasswa Shafik ◽

Mojtaba Matinkhah ◽

Parisa Etemadinejad ◽

Mammann Nur Sanda

Keyword(s):

Reinforcement Learning ◽

Electronic Devices ◽

Learning Automata ◽

The Internet ◽

Q Learning ◽

Hands On ◽

Learning Technique ◽

Markov Decision ◽

Artificial Neural Network Ann ◽

The Internet Of Things

Reinforcement learning (RL) is a new propitious research space that is well-known nowadays on the internet of things (IoT), media and social sensing computing are addressing a broad and pertinent task through making decisions sequentially by deterministic and stochastic evolutions. The IoTs extend world connectivity to physical devices like electronic devices network by use interconnect with others over the Internet with the possibility of remotely being supervised and meticulous. In this paper, we comprehensively survey an in-depth assessment of RL techniques in IoT systems focusing on the main known RL techniques like artificial neural network (ANN), Q-learning, Markov Decision Process (MDP), Learning Automata (LA). This study examines and analyses learning technique with focusing on challenges, models performance, similarities and the differences in IoTs accomplish with most correlated proposed state of the art models. The results obtained can be used as a foundation for designing, a model implementation based on the bottlenecks currently assessed with an evaluation of the most fashionable hands-on utility of current methods for reinforcement learning.

Download Full-text

Aero-engine acceleration control using deep reinforcement learning with phase-based reward function

Proceedings of the Institution of Mechanical Engineers Part G Journal of Aerospace Engineering ◽

10.1177/09544100211046225 ◽

2021 ◽

pp. 095441002110462

Author(s):

Qian-Kun Hu ◽

Yong-Ping Zhao

Keyword(s):

Reinforcement Learning ◽

Trust Region ◽

Engine Control ◽

Control Task ◽

Q Learning ◽

Reward Function ◽

Engine Control System ◽

Aero Engine ◽

Markov Decision ◽

Policy Optimization

In this paper, the conventional aero-engine acceleration control task is formulated into a Markov Decision Process (MDP) problem. Then, a novel phase-based reward function is proposed to enhance the performance of deep reinforcement learning (DRL) in solving feedback control tasks. With that reward function, an aero-engine controller based on Trust Region Policy Optimization (TRPO) is developed to improve the aero-engine acceleration performance. Four comparison simulations were conducted to verify the effectiveness of the proposed methods. The simulation results show that the phase-based reward function helps to eliminate the oscillation problem of the aero-engine control system, which is caused by the traditional goal-based reward function when DRL is applied to the aero-engine control. And the TRPO controller outperforms deep Q-learning (DQN) and the proportional-integral-derivative (PID) in the aero-engine acceleration control task. Compared to DQN and PID controller, the acceleration time of aero-engine is decreased by 0.6 and 2.58 s, respectively, and the aero-engine acceleration performance is improved by 16.8 and 46.4 % each.

Download Full-text

Reinforcement Learning Based Hierarchical Multi-Agent Robotic Search Team in Uncertain Environment

Mehran University Research Journal of Engineering and Technology ◽

10.22581/muet1982.2103.17 ◽

2021 ◽

Vol 40 (3) ◽

pp. 645-662

Author(s):

Shahzaib Hamid ◽

Ali Nasir ◽

Yasir Saleem

Keyword(s):

Reinforcement Learning ◽

Multi Agent Systems ◽

Qualitative Comparison ◽

Q Learning ◽

Novel Approach ◽

Learning Agent ◽

Markov Decision ◽

Multi Agent ◽

Efficient Learning ◽

Prior Models

Field of robotics has been under the limelight because of recent advances in Artificial Intelligence (AI). Due to increased diversity in multi-agent systems, new models are being developed to handle complexity of such systems. However, most of these models do not address problems such as; uncertainty handling, efficient learning, agent coordination and fault detection. This paper presents a novel approach of implementing Reinforcement Learning (RL) on hierarchical robotic search teams. The proposed algorithm handles uncertainties in the system by implementing Q-learning and depicts enhanced efficiency as well as better time consumption compared to prior models. The reason for that is each agent can take action on its own thus there is less dependency on leader agent for RL policy. The performance of this algorithm is measured by introducing agents in an unknown environment with both Markov Decision Process (MDP) and RL policies at their disposal. Simulation-based comparison of the agent motion is presented using the results from of MDP and RL policies. Furthermore, qualitative comparison of the proposed model with prior models is also presented.

Download Full-text