Adaptive Attention Memory Graph Convolutional Networks for Skeleton-Based Action Recognition

Di Liu; Hui Xu; Jianzhong Wang; Yinghua Lu; Jun Kong; Miao Qi

doi:10.3390/s21206761

Adaptive Attention Memory Graph Convolutional Networks for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s21206761 ◽

2021 ◽

Vol 21 (20) ◽

pp. 6761

Author(s):

Di Liu ◽

Hui Xu ◽

Jianzhong Wang ◽

Yinghua Lu ◽

Jun Kong ◽

...

Keyword(s):

Action Recognition ◽

Spatial Configuration ◽

Human Action Recognition ◽

Human Action ◽

Memory Module ◽

Convolutional Networks ◽

Temporal Features ◽

Attention Memory ◽

Key Frames ◽

Gated Recurrent Unit

Graph Convolutional Networks (GCNs) have attracted a lot of attention and shown remarkable performance for action recognition in recent years. For improving the recognition accuracy, how to build graph structure adaptively, select key frames and extract discriminative features are the key problems of this kind of method. In this work, we propose a novel Adaptive Attention Memory Graph Convolutional Networks (AAM-GCN) for human action recognition using skeleton data. We adopt GCN to adaptively model the spatial configuration of skeletons and employ Gated Recurrent Unit (GRU) to construct an attention-enhanced memory for capturing the temporal feature. With the memory module, our model can not only remember what happened in the past but also employ the information in the future using multi-bidirectional GRU layers. Furthermore, in order to extract discriminative temporal features, the attention mechanism is also employed to select key frames from the skeleton sequence. Extensive experiments on Kinetics, NTU RGB+D and HDM05 datasets show that the proposed network achieves better performance than some state-of-the-art methods.

Download Full-text

Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling

Data ◽

10.3390/data5040104 ◽

2020 ◽

Vol 5 (4) ◽

pp. 104

Author(s):

Ashok Sarabu ◽

Ajit Kumar Santra

Keyword(s):

Action Recognition ◽

Data Augmentation ◽

Main Idea ◽

Human Action Recognition ◽

Human Action ◽

Great Success ◽

Temporal Modeling ◽

Convolutional Networks ◽

Temporal Features ◽

Augmentation Techniques

The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods.

Download Full-text

Human Action Recognition Based on Key Frames

Communications in Computer and Information Science - Advances in Computer Science and Education Applications ◽

10.1007/978-3-642-22456-0_77 ◽

2011 ◽

pp. 535-542 ◽

Cited By ~ 2

Author(s):

Yong Hu ◽

Wei Zheng

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Key Frames

Download Full-text

I3D-Shufflenet Based Human Action Recognition

Algorithms ◽

10.3390/a13110301 ◽

2020 ◽

Vol 13 (11) ◽

pp. 301

Author(s):

Guocheng Liu ◽

Caixia Zhang ◽

Qingyang Xu ◽

Ruoshi Cheng ◽

Yong Song ◽

...

Keyword(s):

Neural Network ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Recognition Algorithm ◽

Convolution Kernel ◽

Histogram Of Oriented Gradients ◽

Temporal Features ◽

Convolution Kernels

In view of difficulty in application of optical flow based human action recognition due to large amount of calculation, a human action recognition algorithm I3D-shufflenet model is proposed combining the advantages of I3D neural network and lightweight model shufflenet. The 5 × 5 convolution kernel of I3D is replaced by a double 3 × 3 convolution kernels, which reduces the amount of calculations. The shuffle layer is adopted to achieve feature exchange. The recognition and classification of human action is performed based on trained I3D-shufflenet model. The experimental results show that the shuffle layer improves the composition of features in each channel which can promote the utilization of useful information. The Histogram of Oriented Gradients (HOG) spatial-temporal features of the object are extracted for training, which can significantly improve the ability of human action expression and reduce the calculation of feature extraction. The I3D-shufflenet is testified on the UCF101 dataset, and compared with other models. The final result shows that the I3D-shufflenet has higher accuracy than the original I3D with an accuracy of 96.4%.

Download Full-text

Study of Human Action Recognition Based on Improved Spatio-temporal Features

International Journal of Automation and Computing ◽

10.1007/s11633-014-0831-4 ◽

2014 ◽

Vol 11 (5) ◽

pp. 500-509 ◽

Cited By ~ 12

Author(s):

Xiao-Fei Ji ◽

Qian-Qian Wu ◽

Zhao-Jie Ju ◽

Yang-Yang Wang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Human Action Recognition from Motion Trajectory using Fourier Temporal Features of Skeleton Joints

2018 International Conference on Advances in Computing and Communication Engineering (ICACCE) ◽

10.1109/icacce.2018.8441712 ◽

2018 ◽

Author(s):

Naresh Kumar ◽

Nagarajan Sukavanam

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Motion Trajectory ◽

Temporal Features

Download Full-text

Human action recognition based on self-learned key frames and features extraction

2017 Chinese Automation Congress (CAC) ◽

10.1109/cac.2017.8243386 ◽

2017 ◽

Author(s):

Qi Fu ◽

Lina Liu ◽

Shiwei Ma

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Features Extraction ◽

Key Frames

Download Full-text

Human Action Recognition Based on Spatio-temporal Features

Lecture Notes in Computer Science - Pattern Recognition and Machine Intelligence ◽

10.1007/978-3-642-11164-8_58 ◽

2009 ◽

pp. 357-362

Author(s):

Nikhil Sawant ◽

K. K. Biswas

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Memory-Augmented Temporal Dynamic Learning for Action Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33019167 ◽

2019 ◽

Vol 33 ◽

pp. 9167-9175 ◽

Cited By ~ 4

Author(s):

Yuan Yuan ◽

Dong Wang ◽

Qi Wang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

External Memory ◽

Dynamic Learning ◽

Temporal Dynamic ◽

Memory Module ◽

Memory Controller ◽

Current Feature ◽

Motion Dynamics

Human actions captured in video sequences contain two crucial factors for action recognition, i.e., visual appearance and motion dynamics. To model these two aspects, Convolutional and Recurrent Neural Networks (CNNs and RNNs) are adopted in most existing successful methods for recognizing actions. However, CNN based methods are limited in modeling long-term motion dynamics. RNNs are able to learn temporal motion dynamics but lack effective ways to tackle unsteady dynamics in long-duration motion. In this work, we propose a memory-augmented temporal dynamic learning network, which learns to write the most evident information into an external memory module and ignore irrelevant ones. In particular, we present a differential memory controller to make a discrete decision on whether the external memory module should be updated with current feature. The discrete memory controller takes in the memory history, context embedding and current feature as inputs and controls information flow into the external memory module. Additionally, we train this discrete memory controller using straight-through estimator. We evaluate this end-to-end system on benchmark datasets (UCF101 and HMDB51) of human action recognition. The experimental results show consistent improvements on both datasets over prior works and our baselines.

Download Full-text