COVID-19 has changed the lifestyle of many people due to its rapid human-to-human transmission. The spread started at the end of January 2020, and different countries used different approaches in terms of testing, sanitization, lock down and quarantine centres to control the spread of the virus. People are getting back to working and routine life activities with new normal standards of testing, sanitization, social distancing and lock down. People are regularly tested to identify those who are infected with COVID-19 and isolate them from general public. However, testing all people unnecessarily is an expensive operation in terms of resources usage. There must be an optimal policy to test only those who have higher chances of being COVID-19 positive. Similarly, sanitization is used for individuals and streets to disinfect people and places. However, sanitization is also an expensive operation in terms of resources, and it is not possible to disinfect each and every individual and street. Social separating or lock down or quarantine centres focuses are different methodologies that are utilised to control the human-to-human transmission of the infection and separate the individuals who are contaminated with COVID-19. However, lock down and quarantine centres are expensive operations in terms of resources as it disturbs the affairs of state and the growth of economy. At the same time, it negatively affects the quality of life of a society. It is also not possible to provide resources to all citizens by locking them inside homes or quarantine centres for infinite time. All these parameters are expensive in terms of resources and have an effect on controlling the spread of the virus, quality of life of human, resources and economy. In this article, a novel intelligent method based on reinforcement learning (RL) is built up that quantifies the unique levels of testing, disinfection and lock down alongside its impact on the spread of the infection, personal satisfaction or quality of life, resource use and economy. Different RL algorithms are actualized and agents are prepared with these algorithms to interact with the environment to gain proficiency with the best strategy. The examinations exhibit that deep learning–based algorithms, for example, DQN and DDPG are performing better than customary RL algorithms, for example, Q-Learning and SARSA.