Moment Shape Descriptors Applied for Action Recognition in Video Sequences

Temporal information plays a significant role in video-based human action recognition. How to effectively extract the spatial–temporal characteristics of actions in videos has always been a challenging problem. Most existing methods acquire spatial and temporal cues in videos individually. In this article, we propose a new effective representation for depth video sequences, called hierarchical dynamic depth projected difference images that can aggregate the action spatial and temporal information simultaneously at different temporal scales. We firstly project depth video sequences onto three orthogonal Cartesian views to capture the 3D shape and motion information of human actions. Hierarchical dynamic depth projected difference images are constructed with the rank pooling in each projected view to hierarchically encode the spatial–temporal motion dynamics in depth videos. Convolutional neural networks can automatically learn discriminative features from images and have been extended to video classification because of their superior performance. To verify the effectiveness of hierarchical dynamic depth projected difference images representation, we construct a hierarchical dynamic depth projected difference images–based action recognition framework where hierarchical dynamic depth projected difference images in three views are fed into three identical pretrained convolutional neural networks independently for finely retuning. We design three classification schemes in the framework and different schemes utilize different convolutional neural network layers to compare their effects on action recognition. Three views are combined to describe the actions more comprehensively in each classification scheme. The proposed framework is evaluated on three challenging public human action data sets. Experiments indicate that our method has better performance and can provide discriminative spatial–temporal information for human action recognition in depth videos.

Download Full-text

Global Flow and Temporal-Shape Descriptors for Human Action Recognition from 3D Reconstruction Data

Machine Learning and Data Mining in Pattern Recognition - Lecture Notes in Computer Science ◽

10.1007/978-3-319-62416-7_4 ◽

2017 ◽

pp. 47-62

Author(s):

Georgios Th. Papadopoulos ◽

Petros Daras

Keyword(s):

3D Reconstruction ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Shape Descriptors ◽

Global Flow ◽

Temporal Shape

Download Full-text

Multi-stream Convolutional Neural Networks for Action Recognition in Video Sequences Based on Adaptive Visual Rhythms

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla.2018.00077 ◽

2018 ◽

Cited By ~ 5

Author(s):

Darwin Ttito Concha ◽

Helena De Almeida Maia ◽

Helio Pedrini ◽

Hemerson Tacon ◽

Andre De Souza Brito ◽

...

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Action Recognition ◽

Video Sequences

Download Full-text

Timed-image based deep learning for action recognition in video sequences

Pattern Recognition ◽

10.1016/j.patcog.2020.107353 ◽

2020 ◽

Vol 104 ◽

pp. 107353 ◽

Cited By ~ 2

Author(s):

Abdourrahmane Mahamane Atto ◽

Alexandre Benoit ◽

Patrick Lambert

Keyword(s):

Deep Learning ◽

Action Recognition ◽

Video Sequences

Download Full-text

Human action recognition in crowded surveillance video sequences by using features taken from key-point trajectories

CVPR 2011 WORKSHOPS ◽

10.1109/cvprw.2011.5981713 ◽

2011 ◽

Cited By ~ 10

Author(s):

Masaki Takahashi ◽

Masahide Naemura ◽

Mahito Fujii ◽

Shin'ichi Satoh

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences ◽

Surveillance Video ◽

Point Trajectories

Download Full-text

Patient Monitoring by Abnormal Human Activity Recognition Based on CNN Architecture

Electronics ◽

10.3390/electronics9121993 ◽

2020 ◽

Vol 9 (12) ◽

pp. 1993

Author(s):

Malik Ali Gul ◽

Muhammad Haroon Yousaf ◽

Shah Nawaz ◽

Zaka Ur Rehman ◽

HyungWon Kim

Keyword(s):

Activity Recognition ◽

Action Recognition ◽

Human Activity ◽

Patient Monitoring ◽

Human Action Recognition ◽

Confidence Score ◽

Human Action ◽

Human Activity Recognition ◽

Video Sequences ◽

Human Actions

Human action recognition has emerged as a challenging research domain for video understanding and analysis. Subsequently, extensive research has been conducted to achieve the improved performance for recognition of human actions. Human activity recognition has various real time applications, such as patient monitoring in which patients are being monitored among a group of normal people and then identified based on their abnormal activities. Our goal is to render a multi class abnormal action detection in individuals as well as in groups from video sequences to differentiate multiple abnormal human actions. In this paper, You Look only Once (YOLO) network is utilized as a backbone CNN model. For training the CNN model, we constructed a large dataset of patient videos by labeling each frame with a set of patient actions and the patient’s positions. We retrained the back-bone CNN model with 23,040 labeled images of patient’s actions for 32 epochs. Across each frame, the proposed model allocated a unique confidence score and action label for video sequences by finding the recurrent action label. The present study shows that the accuracy of abnormal action recognition is 96.8%. Our proposed approach differentiated abnormal actions with improved F1-Score of 89.2% which is higher than state-of-the-art techniques. The results indicate that the proposed framework can be beneficial to hospitals and elder care homes for patient monitoring.

Download Full-text

Moment Shape Descriptors Applied for Action Recognition in Video Sequences

Action Recognition Using Silhouette Sequences and Shape Descriptors

Action Recognition Using HOG Feature in Different Resolution Video Sequences

A Robust Deep Model for Human Action Recognition in Restricted Video Sequences

Human Body Articulation for Action Recognition in Video Sequences

Hierarchical dynamic depth projected difference images–based action recognition in videos with convolutional neural networks

Global Flow and Temporal-Shape Descriptors for Human Action Recognition from 3D Reconstruction Data

Multi-stream Convolutional Neural Networks for Action Recognition in Video Sequences Based on Adaptive Visual Rhythms

Timed-image based deep learning for action recognition in video sequences

Human action recognition in crowded surveillance video sequences by using features taken from key-point trajectories

Patient Monitoring by Abnormal Human Activity Recognition Based on CNN Architecture

Export Citation Format