Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling

Ashok Sarabu; Ajit Kumar Santra

doi:10.3390/data5040104

Distinct Two-Stream Convolutional Networks for Human Action Recognition in Videos Using Segment-Based Temporal Modeling

Data ◽

10.3390/data5040104 ◽

2020 ◽

Vol 5 (4) ◽

pp. 104

Author(s):

Ashok Sarabu ◽

Ajit Kumar Santra

Keyword(s):

Action Recognition ◽

Data Augmentation ◽

Main Idea ◽

Human Action Recognition ◽

Human Action ◽

Great Success ◽

Temporal Modeling ◽

Convolutional Networks ◽

Temporal Features ◽

Augmentation Techniques

The Two-stream convolution neural network (CNN) has proven a great success in action recognition in videos. The main idea is to train the two CNNs in order to learn spatial and temporal features separately, and two scores are combined to obtain final scores. In the literature, we observed that most of the methods use similar CNNs for two streams. In this paper, we design a two-stream CNN architecture with different CNNs for the two streams to learn spatial and temporal features. Temporal Segment Networks (TSN) is applied in order to retrieve long-range temporal features, and to differentiate the similar type of sub-action in videos. Data augmentation techniques are employed to prevent over-fitting. Advanced cross-modal pre-training is discussed and introduced to the proposed architecture in order to enhance the accuracy of action recognition. The proposed two-stream model is evaluated on two challenging action recognition datasets: HMDB-51 and UCF-101. The findings of the proposed architecture shows the significant performance increase and it outperforms the existing methods.

Download Full-text

Dual Attention-Guided Multiscale Dynamic Aggregate Graph Convolutional Networks for Skeleton-Based Human Action Recognition

Symmetry ◽

10.3390/sym12101589 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1589

Author(s):

Zeyuan Hu ◽

Eung-Joo Lee

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Great Success ◽

Semantic Features ◽

Convolutional Networks ◽

Temporal Correlations ◽

Semantic Relevance ◽

High Level ◽

Relationship Of

Traditional convolution neural networks have achieved great success in human action recognition. However, it is challenging to establish effective associations between different human bone nodes to capture detailed information. In this paper, we propose a dual attention-guided multiscale dynamic aggregate graph convolution neural network (DAG-GCN) for skeleton-based human action recognition. Our goal is to explore the best correlation and determine high-level semantic features. First, a multiscale dynamic aggregate GCN module is used to capture important semantic information and to establish dependence relationships for different bone nodes. Second, the higher level semantic feature is further refined, and the semantic relevance is emphasized through a dual attention guidance module. In addition, we exploit the relationship of joints hierarchically and the spatial temporal correlations through two modules. Experiments with the DAG-GCN method result in good performance on the NTU-60-RGB+D and NTU-120-RGB+D datasets. The accuracy is 95.76% and 90.01%, respectively, for the cross (X)-View and X-Subon the NTU60dataset.

Download Full-text

Adaptive Attention Memory Graph Convolutional Networks for Skeleton-Based Action Recognition

Sensors ◽

10.3390/s21206761 ◽

2021 ◽

Vol 21 (20) ◽

pp. 6761

Author(s):

Di Liu ◽

Hui Xu ◽

Jianzhong Wang ◽

Yinghua Lu ◽

Jun Kong ◽

...

Keyword(s):

Action Recognition ◽

Spatial Configuration ◽

Human Action Recognition ◽

Human Action ◽

Memory Module ◽

Convolutional Networks ◽

Temporal Features ◽

Attention Memory ◽

Key Frames ◽

Gated Recurrent Unit

Graph Convolutional Networks (GCNs) have attracted a lot of attention and shown remarkable performance for action recognition in recent years. For improving the recognition accuracy, how to build graph structure adaptively, select key frames and extract discriminative features are the key problems of this kind of method. In this work, we propose a novel Adaptive Attention Memory Graph Convolutional Networks (AAM-GCN) for human action recognition using skeleton data. We adopt GCN to adaptively model the spatial configuration of skeletons and employ Gated Recurrent Unit (GRU) to construct an attention-enhanced memory for capturing the temporal feature. With the memory module, our model can not only remember what happened in the past but also employ the information in the future using multi-bidirectional GRU layers. Furthermore, in order to extract discriminative temporal features, the attention mechanism is also employed to select key frames from the skeleton sequence. Extensive experiments on Kinetics, NTU RGB+D and HDM05 datasets show that the proposed network achieves better performance than some state-of-the-art methods.

Download Full-text

I3D-Shufflenet Based Human Action Recognition

Algorithms ◽

10.3390/a13110301 ◽

2020 ◽

Vol 13 (11) ◽

pp. 301

Author(s):

Guocheng Liu ◽

Caixia Zhang ◽

Qingyang Xu ◽

Ruoshi Cheng ◽

Yong Song ◽

...

Keyword(s):

Neural Network ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Recognition Algorithm ◽

Convolution Kernel ◽

Histogram Of Oriented Gradients ◽

Temporal Features ◽

Convolution Kernels

In view of difficulty in application of optical flow based human action recognition due to large amount of calculation, a human action recognition algorithm I3D-shufflenet model is proposed combining the advantages of I3D neural network and lightweight model shufflenet. The 5 × 5 convolution kernel of I3D is replaced by a double 3 × 3 convolution kernels, which reduces the amount of calculations. The shuffle layer is adopted to achieve feature exchange. The recognition and classification of human action is performed based on trained I3D-shufflenet model. The experimental results show that the shuffle layer improves the composition of features in each channel which can promote the utilization of useful information. The Histogram of Oriented Gradients (HOG) spatial-temporal features of the object are extracted for training, which can significantly improve the ability of human action expression and reduce the calculation of feature extraction. The I3D-shufflenet is testified on the UCF101 dataset, and compared with other models. The final result shows that the I3D-shufflenet has higher accuracy than the original I3D with an accuracy of 96.4%.

Download Full-text

Study of Human Action Recognition Based on Improved Spatio-temporal Features

International Journal of Automation and Computing ◽

10.1007/s11633-014-0831-4 ◽

2014 ◽

Vol 11 (5) ◽

pp. 500-509 ◽

Cited By ~ 12

Author(s):

Xiao-Fei Ji ◽

Qian-Qian Wu ◽

Zhao-Jie Ju ◽

Yang-Yang Wang

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Human Action Recognition from Motion Trajectory using Fourier Temporal Features of Skeleton Joints

2018 International Conference on Advances in Computing and Communication Engineering (ICACCE) ◽

10.1109/icacce.2018.8441712 ◽

2018 ◽

Author(s):

Naresh Kumar ◽

Nagarajan Sukavanam

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Motion Trajectory ◽

Temporal Features

Download Full-text

Human Action Recognition Based on Spatio-temporal Features

Lecture Notes in Computer Science - Pattern Recognition and Machine Intelligence ◽

10.1007/978-3-642-11164-8_58 ◽

2009 ◽

pp. 357-362

Author(s):

Nikhil Sawant ◽

K. K. Biswas

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Human Action Recognition by SOM Considering the Probability of Spatio-temporal Features

Lecture Notes in Computer Science - Neural Information Processing. Models and Applications ◽

10.1007/978-3-642-17534-3_48 ◽

2010 ◽

pp. 391-398 ◽

Cited By ~ 1

Author(s):

Yanli Ji ◽

Atsushi Shimada ◽

Rin-ichiro Taniguchi

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text

Pixel Convolutional Networks for Skeleton-Based Human Action Recognition

Communications in Computer and Information Science - Methods and Applications for Modeling and Simulation of Complex Systems ◽

10.1007/978-981-13-2853-4_40 ◽

2018 ◽

pp. 513-523

Author(s):

Zhichao Chang ◽

Jiangyun Wang ◽

Liang Han

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Convolutional Networks

Download Full-text

Human Action Recognition Using Spatio-Temporal Multiplier Network and Attentive Correlated Temporal Feature

International Journal of Image and Graphics ◽

10.1142/s0219467822500516 ◽

2021 ◽

Author(s):

C. Indhumathi ◽

V. Murugan ◽

G. Muthulakshmii

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Human Action Recognition ◽

Human Action ◽

Regional Correlation ◽

Temporal Features ◽

Adaptive Motion ◽

Spatio Temporal ◽

Inter Frame ◽

Temporal Feature

Nowadays, action recognition has gained more attention from the computer vision community. Normally for recognizing human actions, spatial and temporal features are extracted. Two-stream convolutional neural network is used commonly for human action recognition in videos. In this paper, Adaptive motion Attentive Correlated Temporal Feature (ACTF) is used for temporal feature extractor. The temporal average pooling in inter-frame is used for extracting the inter-frame regional correlation feature and mean feature. This proposed method has better accuracy of 96.9% for UCF101 and 74.6% for HMDB51 datasets, respectively, which are higher than the other state-of-the-art methods.

Download Full-text

Human Action Recognition by Learning Spatio-Temporal Features With Deep Neural Networks

IEEE Access ◽

10.1109/access.2018.2817253 ◽

2018 ◽

Vol 6 ◽

pp. 17913-17922 ◽

Cited By ~ 24

Author(s):

Lei Wang ◽

Yangyang Xu ◽

Jun Cheng ◽

Haiying Xia ◽

Jianqin Yin ◽

...

Keyword(s):

Neural Networks ◽

Action Recognition ◽

Deep Neural Networks ◽

Human Action Recognition ◽

Human Action ◽

Temporal Features ◽

Spatio Temporal

Download Full-text