scholarly journals Enhanced Spatial and Extended Temporal Graph Convolutional Network for Skeleton-Based Action Recognition

Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5260 ◽  
Author(s):  
Fanjia Li ◽  
Juanjuan Li ◽  
Aichun Zhu ◽  
Yonggang Xu ◽  
Hongsheng Yin ◽  
...  

In the skeleton-based human action recognition domain, the spatial-temporal graph convolution networks (ST-GCNs) have made great progress recently. However, they use only one fixed temporal convolution kernel, which is not enough to extract the temporal cues comprehensively. Moreover, simply connecting the spatial graph convolution layer (GCL) and the temporal GCL in series is not the optimal solution. To this end, we propose a novel enhanced spatial and extended temporal graph convolutional network (EE-GCN) in this paper. Three convolution kernels with different sizes are chosen to extract the discriminative temporal features from shorter to longer terms. The corresponding GCLs are then concatenated by a powerful yet efficient one-shot aggregation (OSA) + effective squeeze-excitation (eSE) structure. The OSA module aggregates the features from each layer once to the output, and the eSE module explores the interdependency between the channels of the output. Besides, we propose a new connection paradigm to enhance the spatial features, which expand the serial connection to a combination of serial and parallel connections by adding a spatial GCL in parallel with the temporal GCLs. The proposed method is evaluated on three large scale datasets, and the experimental results show that the performance of our method exceeds previous state-of-the-art methods.

2020 ◽  
Vol 34 (03) ◽  
pp. 2669-2676 ◽  
Author(s):  
Wei Peng ◽  
Xiaopeng Hong ◽  
Haoyu Chen ◽  
Guoying Zhao

Human action recognition from skeleton data, fuelled by the Graph Convolutional Network (GCN) with its powerful capability of modeling non-Euclidean data, has attracted lots of attention. However, many existing GCNs provide a pre-defined graph structure and share it through the entire network, which can loss implicit joint correlations especially for the higher-level features. Besides, the mainstream spectral GCN is approximated by one-order hop such that higher-order connections are not well involved. All of these require huge efforts to design a better GCN architecture. To address these problems, we turn to Neural Architecture Search (NAS) and propose the first automatically designed GCN for this task. Specifically, we explore the spatial-temporal correlations between nodes and build a search space with multiple dynamic graph modules. Besides, we introduce multiple-hop modules and expect to break the limitation of representational capacity caused by one-order approximation. Moreover, a corresponding sampling- and memory-efficient evolution strategy is proposed to search in this space. The resulted architecture proves the effectiveness of the higher-order approximation and the layer-wise dynamic graph modules. To evaluate the performance of the searched model, we conduct extensive experiments on two very large scale skeleton-based action recognition datasets. The results show that our model gets the state-of-the-art results in term of given metrics.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Min Zhang ◽  
Haijie Yang ◽  
Pengfei Li ◽  
Ming Jiang

Skeleton-based human action recognition has attracted much attention in the field of computer vision. Most of the previous studies are based on fixed skeleton graphs so that only the local physical dependencies among joints can be captured, resulting in the omission of implicit joint correlations. In addition, under different views, the content of the same action is very different. In some views, keypoints will be blocked, which will cause recognition errors. In this paper, an action recognition method based on distance vector and multihigh view adaptive network (DV-MHNet) is proposed to address this challenging task. Among the mentioned techniques, the multihigh (MH) view adaptive networks are constructed to automatically determine the best observation view at different heights, obtain complete keypoints information of the current frame image, and enhance the robustness and generalization of the model to recognize actions at different heights. Then, the distance vector (DV) mechanism is introduced on this basis to establish the relative distance and relative orientation between different keypoints in the same frame and the same keypoints in different frame to obtain the global potential relationship of each keypoint, and finally by constructing the spatial temporal graph convolutional network to take into account the information in space and time, the characteristics of the action are learned. This paper has done the ablation study with traditional spatial temporal graph convolutional networks and with or without multihigh view adaptive networks, which reasonably proves the effectiveness of the model. The model is evaluated on two widely used action recognition benchmarks (NTU-RGB + D and PKU-MMD). Our method achieves better performance on both datasets.


2021 ◽  
Vol 58 (2) ◽  
pp. 0210007
Author(s):  
张文强 Zhang Wenqiang ◽  
王增强 Wang Zengqiang ◽  
张良 Zhang Liang

2019 ◽  
Vol 1 (11) ◽  
pp. 530-537 ◽  
Author(s):  
Piotr Antonik ◽  
Nicolas Marsal ◽  
Daniel Brunner ◽  
Damien Rontani

Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5613
Author(s):  
Amirreza Farnoosh ◽  
Zhouping Wang ◽  
Shaotong Zhu ◽  
Sarah Ostadabbas

We introduce a generative Bayesian switching dynamical model for action recognition in 3D skeletal data. Our model encodes highly correlated skeletal data into a few sets of low-dimensional switching temporal processes and from there decodes to the motion data and their associated action labels. We parameterize these temporal processes with regard to a switching deep autoregressive prior to accommodate both multimodal and higher-order nonlinear inter-dependencies. This results in a dynamical deep generative latent model that parses meaningful intrinsic states in skeletal dynamics and enables action recognition. These sequences of states provide visual and quantitative interpretations about motion primitives that gave rise to each action class, which have not been explored previously. In contrast to previous works, which often overlook temporal dynamics, our method explicitly model temporal transitions and is generative. Our experiments on two large-scale 3D skeletal datasets substantiate the superior performance of our model in comparison with the state-of-the-art methods. Specifically, our method achieved 6.3% higher action classification accuracy (by incorporating a dynamical generative framework), and 3.5% better predictive error (by employing a nonlinear second-order dynamical transition model) when compared with the best-performing competitors.


2017 ◽  
Vol 7 (1.1) ◽  
pp. 489 ◽  
Author(s):  
P V.V. Kishore ◽  
P Siva Kameswari ◽  
K Niharika ◽  
M Tanuja ◽  
M Bindu ◽  
...  

Human action recognition is a vibrant area of research with multiple application areas in human machine interface. In this work, we propose a human action recognition based on spatial graph kernels on 3D skeletal data. Spatial joint features are extracted using joint distances between human joint distributions in 3D space. A spatial graph is constructed using 3D points as vertices and the computed joint distances as edges for each action frame in the video sequence. Spatial graph kernels between the training set and testing set are constructed to extract similarity between the two action sets. Two spatial graph kernels are constructed with vertex and edge data represented by joint positions and joint distances. To test the proposed method, we use 4 publicly available 3D skeletal datasets from G3D, MSR Action 3D, UT Kinect and NTU RGB+D. The proposed spatial graph kernels result in better classification accuracies compared to the state of the art models.


Author(s):  
Yanli Ji ◽  
Feixiang Xu ◽  
Yang Yang ◽  
Fumin Shen ◽  
Heng Tao Shen ◽  
...  

Inventions ◽  
2020 ◽  
Vol 5 (3) ◽  
pp. 49
Author(s):  
Nusrat Tasnim ◽  
Md. Mahbubul Islam ◽  
Joong-Hwan Baek

Human action recognition has turned into one of the most attractive and demanding fields of research in computer vision and pattern recognition for facilitating easy, smart, and comfortable ways of human-machine interaction. With the witnessing of massive improvements to research in recent years, several methods have been suggested for the discrimination of different types of human actions using color, depth, inertial, and skeleton information. Despite having several action identification methods using different modalities, classifying human actions using skeleton joints information in 3-dimensional space is still a challenging problem. In this paper, we conceive an efficacious method for action recognition using 3D skeleton data. First, large-scale 3D skeleton joints information was analyzed and accomplished some meaningful pre-processing. Then, a simple straight-forward deep convolutional neural network (DCNN) was designed for the classification of the desired actions in order to evaluate the effectiveness and embonpoint of the proposed system. We also conducted prior DCNN models such as ResNet18 and MobileNetV2, which outperform existing systems using human skeleton joints information.


2016 ◽  
Vol 27 (4) ◽  
pp. 529-543 ◽  
Author(s):  
Cyrille Beaudry ◽  
Renaud Péteri ◽  
Laurent Mascarilla

Sign in / Sign up

Export Citation Format

Share Document