Automatic Inside Point Localization with Deep Reinforcement Learning for Interactive Object Segmentation

Guoqing Li; Guoping Zhang; Chanchan Qin

doi:10.3390/s21186100

Automatic Inside Point Localization with Deep Reinforcement Learning for Interactive Object Segmentation

Sensors ◽

10.3390/s21186100 ◽

2021 ◽

Vol 21 (18) ◽

pp. 6100

Author(s):

Guoqing Li ◽

Guoping Zhang ◽

Chanchan Qin

Keyword(s):

Reinforcement Learning ◽

Object Segmentation ◽

Target Object ◽

Segmentation Method ◽

Interactive Segmentation ◽

Excellent Performance ◽

Learning Framework ◽

Markov Decision ◽

Significant Performance ◽

Guidance Information

In the task of interactive image segmentation, the Inside-Outside Guidance (IOG) algorithm has demonstrated superior segmentation performance leveraging Inside-Outside Guidance information. Nevertheless, we observe that the inconsistent input between training and testing when selecting the inside point will result in significant performance degradation. In this paper, a deep reinforcement learning framework, named Inside Point Localization Network (IPL-Net), is proposed to infer the suitable position for the inside point to help the IOG algorithm. Concretely, when a user first clicks two outside points at the symmetrical corner locations of the target object, our proposed system automatically generates the sequence of movement to localize the inside point. We then perform the IOG interactive segmentation method for precisely segmenting the target object of interest. The inside point localization problem is difficult to define as a supervised learning framework because it is expensive to collect image and their corresponding inside points. Therefore, we formulate this problem as Markov Decision Process (MDP) and then optimize it with Dueling Double Deep Q-Network (D3QN). We train our network on the PASCAL dataset and demonstrate that the network achieves excellent performance.

Download Full-text

Contracts for Difference: A Reinforcement Learning Approach

Journal of Risk and Financial Management ◽

10.3390/jrfm13040078 ◽

2020 ◽

Vol 13 (4) ◽

pp. 78

Author(s):

Nico Zengeler ◽

Uwe Handmann

Keyword(s):

Reinforcement Learning ◽

Short Term Memory ◽

Learning Agents ◽

Learning Framework ◽

Learning Agent ◽

Markov Decision ◽

Economic Trends ◽

Model Size ◽

Contracts For Difference ◽

Partially Observable

We present a deep reinforcement learning framework for an automatic trading of contracts for difference (CfD) on indices at a high frequency. Our contribution proves that reinforcement learning agents with recurrent long short-term memory (LSTM) networks can learn from recent market history and outperform the market. Usually, these approaches depend on a low latency. In a real-world example, we show that an increased model size may compensate for a higher latency. As the noisy nature of economic trends complicates predictions, especially in speculative assets, our approach does not predict courses but instead uses a reinforcement learning agent to learn an overall lucrative trading policy. Therefore, we simulate a virtual market environment, based on historical trading data. Our environment provides a partially observable Markov decision process (POMDP) to reinforcement learners and allows the training of various strategies.

Download Full-text

Reinforcement Learning under Threats

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019939 ◽

2019 ◽

Vol 33 ◽

pp. 9939-9940 ◽

Cited By ~ 1

Author(s):

Victor Gallego ◽

Roi Naveiro ◽

David Rios Insua

Keyword(s):

Reinforcement Learning ◽

Single Agent ◽

Potential Threat ◽

Q Learning ◽

Learning Framework ◽

Opponent Modeling ◽

Theoretical Approaches ◽

New Learning ◽

Markov Decision ◽

Multi Agent

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. However, when non-stationary environments as such are considered, Q-learning leads to suboptimal results (Busoniu, Babuska, and De Schutter 2010). Previous game-theoretical approaches to this problem have focused on modeling the whole multi-agent system as a game. Instead, we shall face the problem of prescribing decisions to a single agent (the supported decision maker, DM) against a potential threat model (the adversary). We augment the MDP to account for this threat, introducing Threatened Markov Decision Processes (TMDPs). Furthermore, we propose a level-k thinking scheme resulting in a new learning framework to deal with TMDPs. We empirically test our framework, showing the benefits of opponent modeling.

Download Full-text

UAV Autonomous Tracking and Landing Based on Deep Reinforcement Learning Strategy

Sensors ◽

10.3390/s20195630 ◽

2020 ◽

Vol 20 (19) ◽

pp. 5630

Author(s):

Jingyi Xie ◽

Xiaodong Peng ◽

Haijiao Wang ◽

Wenlong Niu ◽

Xiao Zheng

Keyword(s):

Reinforcement Learning ◽

Learning Strategy ◽

Control Method ◽

Heuristic Rules ◽

Learning Framework ◽

Model Free ◽

Simulation Engine ◽

Markov Decision ◽

Moving Platform ◽

Partially Observable

Unmanned aerial vehicle (UAV) autonomous tracking and landing is playing an increasingly important role in military and civil applications. In particular, machine learning has been successfully introduced to robotics-related tasks. A novel UAV autonomous tracking and landing approach based on a deep reinforcement learning strategy is presented in this paper, with the aim of dealing with the UAV motion control problem in an unpredictable and harsh environment. Instead of building a prior model and inferring the landing actions based on heuristic rules, a model-free method based on a partially observable Markov decision process (POMDP) is proposed. In the POMDP model, the UAV automatically learns the landing maneuver by an end-to-end neural network, which combines the Deep Deterministic Policy Gradients (DDPG) algorithm and heuristic rules. A Modular Open Robots Simulation Engine (MORSE)-based reinforcement learning framework is designed and validated with a continuous UAV tracking and landing task on a randomly moving platform in high sensor noise and intermittent measurements. The simulation results show that when the moving platform is moving in different trajectories, the average landing success rate of the proposed algorithm is about 10% higher than that of the Proportional-Integral-Derivative (PID) method. As an indirect result, a state-of-the-art deep reinforcement learning-based UAV control method is validated, where the UAV can learn the optimal strategy of a continuously autonomous landing and perform properly in a simulation environment.

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Hierarchical Reinforcement Learning Framework for Secure UAV Communication in the Presence of Multiple UAV Adaptive Eavesdroppers

2020 IEEE 6th International Conference on Computer and Communications (ICCC) ◽

10.1109/iccc51575.2020.9344970 ◽

2020 ◽

Author(s):

Liu Jue ◽

Yang Weiwei

Keyword(s):

Reinforcement Learning ◽

Hierarchical Reinforcement Learning ◽

Learning Framework

Download Full-text

A Reinforcement Learning Framework for Spiking Networks with Dynamic Synapses

Computational Intelligence and Neuroscience ◽

10.1155/2011/869348 ◽

2011 ◽

Vol 2011 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Karim El-Laithy ◽

Martin Bogdan

Keyword(s):

Reinforcement Learning ◽

Spike Timing ◽

Neural Representation ◽

Model Parameters ◽

Learning Framework ◽

Reference Target ◽

Wide Range ◽

Spiking Network ◽

Dynamic Synapses ◽

Exclusive Or

An integration of both the Hebbian-based and reinforcement learning (RL) rules is presented for dynamic synapses. The proposed framework permits the Hebbian rule to update the hidden synaptic model parameters regulating the synaptic response rather than the synaptic weights. This is performed using both the value and the sign of the temporal difference in the reward signal after each trial. Applying this framework, a spiking network with spike-timing-dependent synapses is tested to learn the exclusive-OR computation on a temporally coded basis. Reward values are calculated with the distance between the output spike train of the network and a reference target one. Results show that the network is able to capture the required dynamics and that the proposed framework can reveal indeed an integrated version of Hebbian and RL. The proposed framework is tractable and less computationally expensive. The framework is applicable to a wide class of synaptic models and is not restricted to the used neural representation. This generality, along with the reported results, supports adopting the introduced approach to benefit from the biologically plausible synaptic models in a wide range of intuitive signal processing.

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text

Adaptive Reinforcement Learning Framework for NOMA-UAV Networks

IEEE Communications Letters ◽

10.1109/lcomm.2021.3093385 ◽

2021 ◽

pp. 1-1

Author(s):

Syed Khurram Mahmud ◽

Yuanwei Liu ◽

Yue Chen ◽

Kok Keong Chai

Keyword(s):

Reinforcement Learning ◽

Learning Framework

Download Full-text

Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning

2020 59th IEEE Conference on Decision and Control (CDC) ◽

10.1109/cdc42340.2020.9303982 ◽

2020 ◽

Author(s):

Yu Wang ◽

Nima Roohi ◽

Matthew West ◽

Mahesh Viswanathan ◽

Geir E. Dullerud

Keyword(s):

Reinforcement Learning ◽

Model Checking ◽

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision

Download Full-text

A Dual-Critic Reinforcement Learning Framework for Frame-Level Bit Allocation in HEVC/H.265

2021 Data Compression Conference (DCC) ◽

10.1109/dcc50243.2021.00009 ◽

2021 ◽

Author(s):

Yung-Han Ho ◽

Guo-Lun Jin ◽

Yun Liang ◽

Wen-Hsiao Peng ◽

Xiaobo Li

Keyword(s):

Reinforcement Learning ◽

Bit Allocation ◽

Learning Framework

Download Full-text