scholarly journals Buffer Compliance Control of Space Robots Capturing a Non-Cooperative Spacecraft Based on Reinforcement Learning

2021 ◽  
Vol 11 (13) ◽  
pp. 5783
Author(s):  
Haiping Ai ◽  
An Zhu ◽  
Jiajia Wang ◽  
Xiaoyan Yu ◽  
Li Chen

Aiming at addressing the problem that the joints are easily destroyed by the impact torque during the process of space robot on-orbit capturing a non-cooperative spacecraft, a reinforcement learning control algorithm combined with a compliant mechanism is proposed to achieve buffer compliance control. The compliant mechanism can not only absorb the impact energy through the deformation of its internal spring, but also limit the impact torque to a safe range by combining with the compliance control strategy. First of all, the dynamic models of the space robot and the target spacecraft before capture are obtained by using the Lagrange approach and Newton-Euler method. After that, based on the law of conservation of momentum, the constraints of kinematics and velocity, the integrated dynamic model of the post-capture hybrid system is derived. Considering the unstable hybrid system, a buffer compliance control based on reinforcement learning is proposed for the stable control. The associative search network is employed to approximate unknown nonlinear functions, an adaptive critic network is utilized to construct reinforcement signal to tune the associative search network. The numerical simulation shows that the proposed control scheme can reduce the impact torque acting on joints by 76.6% at the maximum and 58.7% at the minimum in the capturing operation phase. And in the stable control phase, the impact torque acting on the joints were limited within the safety threshold, which can avoid overload and damage of the joint actuators.

2021 ◽  
Vol 11 (17) ◽  
pp. 8077
Author(s):  
Xiaodong Fu ◽  
Haiping Ai ◽  
Li Chen

During the process of satellite capture by a flexible base–link–joint space robot, the base, joints, and links vibrate easily and also rotate in a disorderly manner owing to the impact torque. To address this problem, a repetitive learning sliding mode stabilization control is proposed to stabilize the system. First, the dynamic models of the fully flexible space robot and the captured satellite are established, respectively, and the impact effect is calculated according to the motion and force transfer relationships. Based on this, a dynamic model of the system after capturing is established. Subsequently, the system is decomposed into slow and fast subsystems using the singular perturbation theory. To ensure that the base attitude and the joints of the slow subsystem reach the desired trajectories, link vibrations are suppressed simultaneously, and a repetitive learning sliding mode controller based on the concept of the virtual force is designed. Moreover, a multilinear optimal controller is proposed for the fast subsystem to suppress the vibration of the base and joints. Two sub-controllers constitute the repetitive learning sliding mode stabilization control for the system. This ensures that the base attitude and joints of the system reach the desired trajectories in a limited time after capturing, obtain better control quality, and suppress the multiple flexible vibrations of the base, links and joints. Finally, the simulation results verify the effectiveness of the designed control strategy.


2020 ◽  
pp. 41-50
Author(s):  
Ph. S. Kartaev ◽  
I. D. Medvedev

The paper examines the impact of oil price shocks on inflation, as well as the impact of the choice of the monetary policy regime on the strength of this influence. We used dynamic models on panel data for the countries of the world for the period from 2000 to 2017. It is shown that mainly the impact of changes in oil prices on inflation is carried out through the channel of exchange rate. The paper demonstrates the influence of the transition to inflation targeting on the nature of the relationship between oil price shocks and inflation. This effect is asymmetrical: during periods of rising oil prices, inflation targeting reduces the effect of the transfer of oil prices, limiting negative effects of shock. During periods of decline in oil prices, this monetary policy regime, in contrast, contributes to a stronger transfer, helping to reduce inflation.


2019 ◽  
Author(s):  
Jennifer R Sadler ◽  
Grace Elisabeth Shearrer ◽  
Nichollette Acosta ◽  
Kyle Stanley Burger

BACKGROUND: Dietary restraint represents an individual’s intent to limit their food intake and has been associated with impaired passive food reinforcement learning. However, the impact of dietary restraint on an active, response dependent learning is poorly understood. In this study, we tested the relationship between dietary restraint and food reinforcement learning using an active, instrumental conditioning task. METHODS: A sample of ninety adults completed a response-dependent instrumental conditioning task with reward and punishment using sweet and bitter tastes. Brain response via functional MRI was measured during the task. Participants also completed anthropometric measures, reward/motivation related questionnaires, and a working memory task. Dietary restraint was assessed via the Dutch Restrained Eating Scale. RESULTS: Two groups were selected from the sample: high restraint (n=29, score >2.5) and low restraint (n=30; score <1.85). High restraint was associated with significantly higher BMI (p=0.003) and lower N-back accuracy (p=0.045). The high restraint group also was marginally better at the instrumental conditioning task (p=0.066, r=0.37). High restraint was also associated with significantly greater brain response in the intracalcarine cortex (MNI: 15, -69, 12; k=35, pfwe< 0.05) to bitter taste, compared to neutral taste.CONCLUSIONS: High restraint was associated with improved performance on an instrumental task testing how individuals learn from reward and punishment. This may be mediated by greater brain response in the primary visual cortex, which has been associated with mental representation. Results suggest that dietary restraint does not impair response-dependent reinforcement learning.


Biomimetics ◽  
2021 ◽  
Vol 6 (1) ◽  
pp. 13
Author(s):  
Adam Bignold ◽  
Francisco Cruz ◽  
Richard Dazeley ◽  
Peter Vamplew ◽  
Cameron Foale

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.


2021 ◽  
Vol 16 (4) ◽  
pp. 638-669
Author(s):  
Miriam Alzate ◽  
Marta Arce-Urriza ◽  
Javier Cebollada

When studying the impact of online reviews on product sales, previous scholars have usually assumed that every review for a product has the same probability of being viewed by consumers. However, decision-making and information processing theories underline that the accessibility of information plays a role in consumer decision-making. We incorporate the notion of review visibility to study the relationship between online reviews and product sales, which is proxied by sales rank information, studying three different cases: (1) when every online review is assumed to have the same probability of being viewed; (2) when we assume that consumers sort online reviews by the most helpful mechanism; and (3) when we assume that consumers sort online reviews by the most recent mechanism. Review non-textual and textual variables are analyzed. The empirical analysis is conducted using a panel of 119 cosmetic products over a period of nine weeks. Using the system generalized method of moments (system GMM) method for dynamic models of panel data, our findings reveal that review variables influence product sales, but the magnitude, and even the direction of the effect, vary amongst visibility cases. Overall, the characteristics of the most helpful reviews have a higher impact on sales.


Energies ◽  
2019 ◽  
Vol 12 (21) ◽  
pp. 4054 ◽  
Author(s):  
Youssef Benchaabane ◽  
Rosa Elvira Silva ◽  
Hussein Ibrahim ◽  
Adrian Ilinca ◽  
Ambrish Chandra ◽  
...  

Remote and isolated communities in Canada experience gaps in access to stable energy sources and must rely on diesel generators for heat and electricity. However, the cost and environmental impact resulting from the use of fossil fuels, especially in local energy production, heating, industrial processes and transportation are compelling reasons to support the development and deployment of renewable energy hybrid systems. This paper presents a computer model for economic analysis and risk assessment of a wind–diesel hybrid system with compressed air energy storage. The proposed model is developed from the point of view of the project investor and it includes technical, financial, risk and environmental analysis. Robustness is evaluated through sensitivity analysis. The model has been validated by comparing the results of a wind–diesel case study against those obtained using HOMER (National Renewable Energy Laboratory, Golden, CO, United States) and RETScreen (Natural Resources Canada, Government of Canada, Canada) software. The impact on economic performance of adding energy storage system in a wind–diesel hybrid system has been discussed. The obtained results demonstrate the feasibility of such hybrid system as a suitable power generator in terms of high net present value and internal rate of return, low cost of energy, as well as low risk assessment. In addition, the environmental impact is positive since less fuel is used.


2021 ◽  
Vol 11 (4) ◽  
pp. 1514 ◽  
Author(s):  
Quang-Duy Tran ◽  
Sang-Hoon Bae

To reduce the impact of congestion, it is necessary to improve our overall understanding of the influence of the autonomous vehicle. Recently, deep reinforcement learning has become an effective means of solving complex control tasks. Accordingly, we show an advanced deep reinforcement learning that investigates how the leading autonomous vehicles affect the urban network under a mixed-traffic environment. We also suggest a set of hyperparameters for achieving better performance. Firstly, we feed a set of hyperparameters into our deep reinforcement learning agents. Secondly, we investigate the leading autonomous vehicle experiment in the urban network with different autonomous vehicle penetration rates. Thirdly, the advantage of leading autonomous vehicles is evaluated using entire manual vehicle and leading manual vehicle experiments. Finally, the proximal policy optimization with a clipped objective is compared to the proximal policy optimization with an adaptive Kullback–Leibler penalty to verify the superiority of the proposed hyperparameter. We demonstrate that full automation traffic increased the average speed 1.27 times greater compared with the entire manual vehicle experiment. Our proposed method becomes significantly more effective at a higher autonomous vehicle penetration rate. Furthermore, the leading autonomous vehicles could help to mitigate traffic congestion.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Abu Quwsar Ohi ◽  
M. F. Mridha ◽  
Muhammad Mostafa Monowar ◽  
Md. Abdul Hamid

AbstractPandemic defines the global outbreak of a disease having a high transmission rate. The impact of a pandemic situation can be lessened by restricting the movement of the mass. However, one of its concomitant circumstances is an economic crisis. In this article, we demonstrate what actions an agent (trained using reinforcement learning) may take in different possible scenarios of a pandemic depending on the spread of disease and economic factors. To train the agent, we design a virtual pandemic scenario closely related to the present COVID-19 crisis. Then, we apply reinforcement learning, a branch of artificial intelligence, that deals with how an individual (human/machine) should interact on an environment (real/virtual) to achieve the cherished goal. Finally, we demonstrate what optimal actions the agent perform to reduce the spread of disease while considering the economic factors. In our experiment, we let the agent find an optimal solution without providing any prior knowledge. After training, we observed that the agent places a long length lockdown to reduce the first surge of a disease. Furthermore, the agent places a combination of cyclic lockdowns and short length lockdowns to halt the resurgence of the disease. Analyzing the agent’s performed actions, we discover that the agent decides movement restrictions not only based on the number of the infectious population but also considering the reproduction rate of the disease. The estimation and policy of the agent may improve the human-strategy of placing lockdown so that an economic crisis may be avoided while mitigating an infectious disease.


Sign in / Sign up

Export Citation Format

Share Document