In this thesis, we propose an environment perception framework for autonomous driving
using deep reinforcement learning (DRL) that exhibits learning in autonomous vehicles under complex interactions with the environment, without being explicitly trained on driving
datasets. Unlike existing techniques, our proposed technique takes the learning loss into
account under deterministic as well as stochastic policy gradient. We apply DRL to object
detection and safe navigation while enhancing a self-driving vehicle’s ability to discern meaningful information from surrounding data. For efficient environmental perception and object
detection, various Q-learning based methods have been proposed in the literature. Unlike
other works, this thesis proposes a collaborative deterministic as well as stochastic policy
gradient based on DRL. Our technique is a combination of variational autoencoder (VAE),
deep deterministic policy gradient (DDPG), and soft actor-critic (SAC) that adequately
trains a self-driving vehicle. In this work, we focus on uninterrupted and reasonably safe
autonomous driving without colliding with an obstacle or steering off the track. We propose
a collaborative framework that utilizes best features of VAE, DDPG, and SAC and models
autonomous driving as partly stochastic and partly deterministic policy gradient problem in
continuous action space, and continuous state space. To ensure that the vehicle traverses the
road over a considerable period of time, we employ a reward-penalty based system where a
higher negative penalty is associated with an unfavourable action and a comparatively lower
positive reward is awarded for favourable actions. We also examine the variations in policy
loss, value loss, reward function, and cumulative reward for ‘VAE+DDPG’ and ‘VAE+SAC’
over the learning process.