Enhanced Spatial and Extended Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Fanjia Li; Juanjuan Li; Aichun Zhu; Yonggang Xu; Hongsheng Yin; Gang Hua

doi:10.3390/s20185260

Enhanced Spatial and Extended Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s20185260 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5260 ◽

Cited By ~ 1

Author(s):

Fanjia Li ◽

Juanjuan Li ◽

Aichun Zhu ◽

Yonggang Xu ◽

Hongsheng Yin ◽

...

Keyword(s):

Action Recognition ◽

Large Scale ◽

Optimal Solution ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Network ◽

Spatial Graph ◽

Serial Connection ◽

In Series ◽

Temporal Graph

In the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution. To this end, we propose a novel enhanced spatial and extended temporal graph convolutional network (EE-GCN) in this paper. Three convolution kernels with different sizes are chosen to extract the discriminative temporal features from shorter to longer terms. The corresponding GCLs are then concatenated by a powerful yet efficient one-shot aggregation (OSA) + effective squeeze-excitation (eSE) structure. The OSA module aggregates the features from each layer once to the output, and the eSE module explores the interdependency between the channels of the output. Besides, we propose a new connection paradigm to enhance the spatial features, which expand the serial connection to a combination of serial and parallel connections by adding a spatial GCL in parallel with the temporal GCLs. The proposed method is evaluated on three large scale datasets, and the experimental results show that the performance of our method exceeds previous state-of-the-art methods.

Download Full-text

Learning Graph Convolutional Network for Skeleton-Based Human Action Recognition by Neural Searching

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5652 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2669-2676 ◽

Cited By ~ 11

Author(s):

Wei Peng ◽

Xiaopeng Hong ◽

Haoyu Chen ◽

Guoying Zhao

Keyword(s):

Action Recognition ◽

Large Scale ◽

Order Approximation ◽

Human Action Recognition ◽

Search Space ◽

Human Action ◽

Higher Order ◽

Dynamic Graph ◽

Convolutional Network ◽

Representational Capacity

Human action recognition from skeleton data, fuelled by the Graph Convolutional Network (GCN) with its powerful capability of modeling non-Euclidean data, has attracted lots of attention. However, many existing GCNs provide a pre-defined graph structure and share it through the entire network, which can loss implicit joint correlations especially for the higher-level features. Besides, the mainstream spectral GCN is approximated by one-order hop such that higher-order connections are not well involved. All of these require huge efforts to design a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for this task. Specifically, we explore the spatial-temporal correlations between nodes and build a search space with multiple dynamic graph modules. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a corresponding sampling- and memory-efficient evolution strategy is proposed to search in this space. The resulted architecture proves the effectiveness of the higher-order approximation and the layer-wise dynamic graph modules. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scale skeleton-based action recognition datasets. The results show that our model gets the state-of-the-art results in term of given metrics.

Download Full-text

Skeleton-Based Action Recognition Based on Distance Vector and Multihigh View Adaptive Networks

Computational Intelligence and Neuroscience ◽

10.1155/2021/1507770 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Min Zhang ◽

Haijie Yang ◽

Pengfei Li ◽

Ming Jiang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Adaptive Networks ◽

Convolutional Network ◽

Current Frame ◽

Convolutional Networks ◽

Distance Vector ◽

Temporal Graph ◽

Ablation Study

Skeleton-based human action recognition has attracted much attention in the field of computer vision. Most of the previous studies are based on fixed skeleton graphs so that only the local physical dependencies among joints can be captured, resulting in the omission of implicit joint correlations. In addition, under different views, the content of the same action is very different. In some views, keypoints will be blocked, which will cause recognition errors. In this paper, an action recognition method based on distance vector and multihigh view adaptive network (DV-MHNet) is proposed to address this challenging task. Among the mentioned techniques, the multihigh (MH) view adaptive networks are constructed to automatically determine the best observation view at different heights, obtain complete keypoints information of the current frame image, and enhance the robustness and generalization of the model to recognize actions at different heights. Then, the distance vector (DV) mechanism is introduced on this basis to establish the relative distance and relative orientation between different keypoints in the same frame and the same keypoints in different frame to obtain the global potential relationship of each keypoint, and finally by constructing the spatial temporal graph convolutional network to take into account the information in space and time, the characteristics of the action are learned. This paper has done the ablation study with traditional spatial temporal graph convolutional networks and with or without multihigh view adaptive networks, which reasonably proves the effectiveness of the model. The model is evaluated on two widely used action recognition benchmarks (NTU-RGB + D and PKU-MMD). Our method achieves better performance on both datasets.

Download Full-text

Progressive Spatio-Temporal Graph Convolutional Network for Skeleton-Based Human Action Recognition

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413860 ◽

2021 ◽

Author(s):

Negar Heidari ◽

Alexandras Iosifidis

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Network ◽

Temporal Graph ◽

Spatio Temporal

Download Full-text

Human Action Recognition Combining Sequential Dynamic Images and Two-Stream Convolutional Network

Laser & Optoelectronics Progress ◽

10.3788/lop202158.0210007 ◽

2021 ◽

Vol 58 (2) ◽

pp. 0210007

Author(s):

张文强 Zhang Wenqiang ◽

王增强 Wang Zengqiang ◽

张良 Zhang Liang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Network

Download Full-text

Human action recognition with a large-scale brain-inspired photonic computer

Nature Machine Intelligence ◽

10.1038/s42256-019-0110-8 ◽

2019 ◽

Vol 1 (11) ◽

pp. 530-537 ◽

Cited By ~ 10

Author(s):

Piotr Antonik ◽

Nicolas Marsal ◽

Daniel Brunner ◽

Damien Rontani

Keyword(s):

Action Recognition ◽

Large Scale ◽

Human Action Recognition ◽

Human Action

Download Full-text

A Bayesian Dynamical Approach for Human Action Recognition

Sensors ◽

10.3390/s21165613 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5613

Author(s):

Amirreza Farnoosh ◽

Zhouping Wang ◽

Shaotong Zhu ◽

Sarah Ostadabbas

Keyword(s):

Action Recognition ◽

Large Scale ◽

Temporal Dynamics ◽

Human Action Recognition ◽

Human Action ◽

Superior Performance ◽

Action Classification ◽

Motion Data ◽

Highly Correlated ◽

Low Dimensional

We introduce a generative Bayesian switching dynamical model for action recognition in 3D skeletal data. Our model encodes highly correlated skeletal data into a few sets of low-dimensional switching temporal processes and from there decodes to the motion data and their associated action labels. We parameterize these temporal processes with regard to a switching deep autoregressive prior to accommodate both multimodal and higher-order nonlinear inter-dependencies. This results in a dynamical deep generative latent model that parses meaningful intrinsic states in skeletal dynamics and enables action recognition. These sequences of states provide visual and quantitative interpretations about motion primitives that gave rise to each action class, which have not been explored previously. In contrast to previous works, which often overlook temporal dynamics, our method explicitly model temporal transitions and is generative. Our experiments on two large-scale 3D skeletal datasets substantiate the superior performance of our model in comparison with the state-of-the-art methods. Specifically, our method achieved 6.3% higher action classification accuracy (by incorporating a dynamical generative framework), and 3.5% better predictive error (by employing a nonlinear second-order dynamical transition model) when compared with the best-performing competitors.

Download Full-text

Spatial Joint features for 3D human skeletal action recognition system using spatial graph kernels

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.1.10152 ◽

2017 ◽

Vol 7 (1.1) ◽

pp. 489 ◽

Cited By ~ 1

Author(s):

P V.V. Kishore ◽

P Siva Kameswari ◽

K Niharika ◽

M Tanuja ◽

M Bindu ◽

...

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Recognition System ◽

Human Action ◽

Graph Kernels ◽

Joint Distributions ◽

Spatial Graph ◽

3D Space ◽

Machine Interface ◽

Action Frame

Human action recognition is a vibrant area of research with multiple application areas in human machine interface. In this work, we propose a human action recognition based on spatial graph kernels on 3D skeletal data. Spatial joint features are extracted using joint distances between human joint distributions in 3D space. A spatial graph is constructed using 3D points as vertices and the computed joint distances as edges for each action frame in the video sequence. Spatial graph kernels between the training set and testing set are constructed to extract similarity between the two action sets. Two spatial graph kernels are constructed with vertex and edge data represented by joint positions and joint distances. To test the proposed method, we use 4 publicly available 3D skeletal datasets from G3D, MSR Action 3D, UT Kinect and NTU RGB+D. The proposed spatial graph kernels result in better classification accuracies compared to the state of the art models.

Download Full-text

A Large-scale RGB-D Database for Arbitrary-view Human Action Recognition

2018 ACM Multimedia Conference on Multimedia Conference - MM '18 ◽

10.1145/3240508.3240675 ◽

2018 ◽

Cited By ~ 6

Author(s):

Yanli Ji ◽

Feixiang Xu ◽

Yang Yang ◽

Fumin Shen ◽

Heng Tao Shen ◽

...

Keyword(s):

Action Recognition ◽

Large Scale ◽

Human Action Recognition ◽

Human Action

Download Full-text

Deep Learning-Based Action Recognition Using 3D Skeleton Joints Information

Inventions ◽

10.3390/inventions5030049 ◽

2020 ◽

Vol 5 (3) ◽

pp. 49

Author(s):

Nusrat Tasnim ◽

Md. Mahbubul Islam ◽

Joong-Hwan Baek

Keyword(s):

Action Recognition ◽

Large Scale ◽

Dimensional Space ◽

Human Action Recognition ◽

Human Action ◽

Human Machine Interaction ◽

Human Actions ◽

3 Dimensional ◽

3D Skeleton ◽

Color Depth

Human action recognition has turned into one of the most attractive and demanding fields of research in computer vision and pattern recognition for facilitating easy, smart, and comfortable ways of human-machine interaction. With the witnessing of massive improvements to research in recent years, several methods have been suggested for the discrimination of different types of human actions using color, depth, inertial, and skeleton information. Despite having several action identification methods using different modalities, classifying human actions using skeleton joints information in 3-dimensional space is still a challenging problem. In this paper, we conceive an efficacious method for action recognition using 3D skeleton data. First, large-scale 3D skeleton joints information was analyzed and accomplished some meaningful pre-processing. Then, a simple straight-forward deep convolutional neural network (DCNN) was designed for the classification of the desired actions in order to evaluate the effectiveness and embonpoint of the proposed system. We also conducted prior DCNN models such as ResNet18 and MobileNetV2, which outperform existing systems using human skeleton joints information.

Download Full-text

An efficient and sparse approach for large scale human action recognition in videos

Machine Vision and Applications ◽

10.1007/s00138-016-0760-z ◽

2016 ◽

Vol 27 (4) ◽

pp. 529-543 ◽

Cited By ~ 9

Author(s):

Cyrille Beaudry ◽

Renaud Péteri ◽

Laurent Mascarilla

Keyword(s):

Action Recognition ◽

Large Scale ◽

Human Action Recognition ◽

Human Action

Download Full-text