scholarly journals Attention-Based Temporal Encoding Network with Background-Independent Motion Mask for Action Recognition

2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Zhengkui Weng ◽  
Zhipeng Jin ◽  
Shuangxi Chen ◽  
Quanquan Shen ◽  
Xiangyang Ren ◽  
...  

Convolutional neural network (CNN) has been leaping forward in recent years. However, the high dimensionality, rich human dynamic characteristics, and various kinds of background interference increase difficulty for traditional CNNs in capturing complicated motion data in videos. A novel framework named the attention-based temporal encoding network (ATEN) with background-independent motion mask (BIMM) is proposed to achieve video action recognition here. Initially, we introduce one motion segmenting approach on the basis of boundary prior by associating with the minimal geodesic distance inside a weighted graph that is not directed. Then, we propose one dynamic contrast segmenting strategic procedure for segmenting the object that moves within complicated environments. Subsequently, we build the BIMM for enhancing the object that moves based on the suppression of the not relevant background inside the respective frame. Furthermore, we design one long-range attention system inside ATEN, capable of effectively remedying the dependency of sophisticated actions that are not periodic in a long term based on the more automatic focus on the semantical vital frames other than the equal process for overall sampled frames. For this reason, the attention mechanism is capable of suppressing the temporal redundancy and highlighting the discriminative frames. Lastly, the framework is assessed by using HMDB51 and UCF101 datasets. As revealed from the experimentally achieved results, our ATEN with BIMM gains 94.5% and 70.6% accuracy, respectively, which outperforms a number of existing methods on both datasets.

Electronics ◽  
2021 ◽  
Vol 10 (9) ◽  
pp. 986
Author(s):  
Hongru Li ◽  
Guiling Sun ◽  
Yue Li ◽  
Runzhuo Yang

The purpose of wearable technology is to use multimedia, sensors, and wireless communication to integrate specific technology into user clothes or accessories. With the help of various sensors, the physiological monitoring system can collect, process, and transmit physiological signals without causing damage. Wearable technology has been widely used in patient monitoring and people’s health management because of its low-load, mobile, and easy-to-use characteristics, and it supports long-term continuous work and can carry out wireless transmissions. In this paper, we established a Wi-Fi-based physiological monitoring system that can accurately measure heart rate, body surface temperature, and motion data and can quickly detect and alert the user about abnormal heart rates.


2008 ◽  
Vol 20 (12) ◽  
pp. 2185-2197 ◽  
Author(s):  
Jennifer T. Coull ◽  
Bruno Nazarian ◽  
Franck Vidal

The temporal discrimination paradigm requires subjects to compare the duration of a probe stimulus to that of a sample previously stored in working or long-term memory, thus providing an index of timing that is independent of a motor response. However, the estimation process itself comprises several component cognitive processes, including timing, storage, retrieval, and comparison of durations. Previous imaging studies have attempted to disentangle these components by simply measuring brain activity during early versus late scanning epochs. We aim to improve the temporal resolution and precision of this approach by using rapid event-related functional magnetic resonance imaging to time-lock the hemodynamic response to presentation of the sample and probe stimuli themselves. Compared to a control (color-estimation) task, which was matched in terms of difficulty, sustained attention, and motor preparation requirements, we found selective activation of the left putamen for the storage (“encoding”) of stimulus duration into working memory (WM). Moreover, increased putamen activity was linked to enhanced timing performance, suggesting that the level of putamen activity may modulate the depth of temporal encoding. Retrieval and comparison of stimulus duration in WM selectively activated the right superior temporal gyrus. Finally, the supplementary motor area was equally active during both sample and probe stages of the task, suggesting a fundamental role in timing the duration of a stimulus that is currently unfolding in time.


2020 ◽  
pp. 1006-1022
Author(s):  
Yongji Yang ◽  
Zhiguo Xiao ◽  
Furen Jiang

Within the context of health informatics, this article discusses how real-time information of human skeleton movements can be conveniently captured via the use of Kinect-based deep sensor. It highlights an effective action recognition method, using Unity3D to form virtual models of characters and scenes. Kinect somatosensory cameras can now identify one's motion data and provide feedback on the virtual role model to complete the avatar drive. This is to achieve real-time simulation modeling as well as drive operation instructions based on the design and identification verification to complement limb rehabilitation. The hardware of such a health informatics-related system is simple, inexpensive, and highly meaningful in term of augmenting user experience. In conclusion, the suggested approach can contribute significantly towards aiding physical rehabilitation therapy


2019 ◽  
Vol 277 ◽  
pp. 02034
Author(s):  
Sophie Aubry ◽  
Sohaib Laraba ◽  
Joëlle Tilmanne ◽  
Thierry Dutoit

In this paper a methodology to recognize actions based on RGB videos is proposed which takes advantages of the recent breakthrough made in deep learning. Following the development of Convolutional Neural Networks (CNNs), research was conducted on the transformation of skeletal motion data into 2D images. In this work, a solution is proposed requiring only the use of RGB videos instead of RGB-D videos. This work is based on multiple works studying the conversion of RGB-D data into 2D images. From a video stream (RGB images), a two-dimension skeleton of 18 joints for each detected body is extracted with a DNN-based human pose estimator called OpenPose. The skeleton data are encoded into Red, Green and Blue channels of images. Different ways of encoding motion data into images were studied. We successfully use state-of-the-art deep neural networks designed for image classification to recognize actions. Based on a study of the related works, we chose to use image classification models: SqueezeNet, AlexNet, DenseNet, ResNet, Inception, VGG and retrained them to perform action recognition. For all the test the NTU RGB+D database is used. The highest accuracy is obtained with ResNet: 83.317% cross-subject and 88.780% cross-view which outperforms most of state-of-the-art results.


Sign in / Sign up

Export Citation Format

Share Document