Human Body Articulation for Action Recognition in Video Sequences

With the evolution of computing technology in many application like human robot interaction, human computer interaction and health-care system, 3D human body models and their dynamic motions has gained popularity. Human performance accompanies human body shapes and their relative motions. Research on human activity recognition is structured around how the complex movement of a human body is identified and analyzed. Vision based action recognition from video is such kind of tasks where actions are inferred by observing the complete set of action sequence performed by human. Many techniques have been revised over the recent decades in order to develop a robust as well as effective framework for action recognition. In this survey, we summarize recent advances in human action recognition, namely the machine learning approach, deep learning approach and evaluation of these approaches.

Download Full-text

Moment Shape Descriptors Applied for Action Recognition in Video Sequences

Intelligent Information and Database Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-319-54430-4_19 ◽

2017 ◽

pp. 197-206 ◽

Cited By ~ 3

Author(s):

Katarzyna Gościewska ◽

Dariusz Frejlichowski

Keyword(s):

Action Recognition ◽

Video Sequences ◽

Shape Descriptors

Download Full-text

Human-Body Action Recognition Based on Dense Trajectories and Video Saliency

Laser & Optoelectronics Progress ◽

10.3788/lop57.241003 ◽

2020 ◽

Vol 57 (24) ◽

pp. 241003

Author(s):

高德勇 Gao Deyong ◽

康自兵 Kang Zibing ◽

王松 Wang Song ◽

王阳萍 Wang Yangping

Keyword(s):

Action Recognition ◽

Human Body ◽

Dense Trajectories ◽

Video Saliency

Download Full-text

Research on Discriminative Skeleton-Based Action Recognition in Spatiotemporal Fusion and Human-Robot Interaction

Complexity ◽

10.1155/2020/8717942 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Qiubo Zhong ◽

Caiming Zheng ◽

Haoxiang Zhang

Keyword(s):

Action Recognition ◽

Human Body ◽

Sequence Data ◽

Spatial Relationship ◽

Human Robot Interaction ◽

Robot Interaction ◽

Convolutional Network ◽

Action Sequence ◽

Interaction System ◽

High Discrimination

A novel posture motion-based spatiotemporal fused graph convolutional network (PM-STGCN) is presented for skeleton-based action recognition. Existing methods on skeleton-based action recognition focus on independently calculating the joint information in single frame and motion information of joints between adjacent frames from the human body skeleton structure and then combine the classification results. However, that does not take into consideration of the complicated temporal and spatial relationship of the human body action sequence, so they are not very efficient in distinguishing similar actions. In this work, we enhance the ability of distinguishing similar actions by focusing on spatiotemporal fusion and adaptive feature extraction for high discrimination information. Firstly, the local posture motion-based attention (LPM-TAM) module is proposed for the purpose of suppressing the skeleton sequence data with a low amount of motion in the temporal domain, and the representation of motion posture features is concentrated. Besides, the local posture motion-based channel attention module (LPM-CAM) is introduced to make use of the strongly discriminative representation between different action classes of similarity. Finally, the posture motion-based spatiotemporal fusion (PM-STF) module is constructed which fuses the spatiotemporal skeleton data by filtering out the low-information sequence and enhances the posture motion features adaptively with high discrimination. Extensive experiments have been conducted, and the results demonstrate that the proposed model is superior to the commonly used action recognition methods. The designed human-robot interaction system based on action recognition has competitive performance compared with the speech interaction system.

Download Full-text

Action Recognition Using HOG Feature in Different Resolution Video Sequences

2012 International Conference on Computer Distributed Control and Intelligent Environmental Monitoring ◽

10.1109/cdciem.2012.27 ◽

2012 ◽

Cited By ~ 6

Author(s):

Yuanyuan Huang ◽

Haomiao Yang ◽

Ping Huang

Keyword(s):

Action Recognition ◽

Video Sequences

Download Full-text

UCF-STAR: A Large Scale Still Image Dataset for Understanding Human Actions

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5653 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2677-2684

Author(s):

Marjaneh Safaei ◽

Pooyan Balouchian ◽

Hassan Foroosh

Keyword(s):

Action Recognition ◽

Human Body ◽

Large Scale ◽

Temporal Information ◽

Training Data ◽

Body Motion ◽

Human Actions ◽

Still Images ◽

Still Image ◽

Novel Approach

Action recognition in still images poses a great challenge due to (i) fewer available training data, (ii) absence of temporal information. To address the first challenge, we introduce a dataset for STill image Action Recognition (STAR), containing over $1M$ images across 50 different human body-motion action categories. UCF-STAR is the largest dataset in the literature for action recognition in still images. The key characteristics of UCF-STAR include (1) focusing on human body-motion rather than relatively static human-object interaction categories, (2) collecting images from the wild to benefit from a varied set of action representations, (3) appending multiple human-annotated labels per image rather than just the action label, and (4) inclusion of rich, structured and multi-modal set of metadata for each image. This departs from existing datasets, which typically provide single annotation in a smaller number of images and categories, with no metadata. UCF-STAR exposes the intrinsic difficulty of action recognition through its realistic scene and action complexity. To benchmark and demonstrate the benefits of UCF-STAR as a large-scale dataset, and to show the role of “latent” motion information in recognizing human actions in still images, we present a novel approach relying on predicting temporal information, yielding higher accuracy on 5 widely-used datasets.

Download Full-text

A Robust Deep Model for Human Action Recognition in Restricted Video Sequences

2020 43rd International Conference on Telecommunications and Signal Processing (TSP) ◽

10.1109/tsp49548.2020.9163464 ◽

2020 ◽

Author(s):

Vahid Ashkani Chenarlogh ◽

Hossein B Jond ◽

Jan Platos

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Video Sequences ◽

Deep Model

Download Full-text

Hierarchical dynamic depth projected difference images–based action recognition in videos with convolutional neural networks

International Journal of Advanced Robotic Systems ◽

10.1177/1729881418825093 ◽

2019 ◽

Vol 16 (1) ◽

pp. 172988141882509 ◽

Cited By ~ 3

Author(s):

Hanbo Wu ◽

Xin Ma ◽

Yibin Li

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Temporal Information ◽

Superior Performance ◽

Video Sequences ◽

Depth Video ◽

Difference Images

Temporal information plays a significant role in video-based human action recognition. How to effectively extract the spatial–temporal characteristics of actions in videos has always been a challenging problem. Most existing methods acquire spatial and temporal cues in videos individually. In this article, we propose a new effective representation for depth video sequences, called hierarchical dynamic depth projected difference images that can aggregate the action spatial and temporal information simultaneously at different temporal scales. We firstly project depth video sequences onto three orthogonal Cartesian views to capture the 3D shape and motion information of human actions. Hierarchical dynamic depth projected difference images are constructed with the rank pooling in each projected view to hierarchically encode the spatial–temporal motion dynamics in depth videos. Convolutional neural networks can automatically learn discriminative features from images and have been extended to video classification because of their superior performance. To verify the effectiveness of hierarchical dynamic depth projected difference images representation, we construct a hierarchical dynamic depth projected difference images–based action recognition framework where hierarchical dynamic depth projected difference images in three views are fed into three identical pretrained convolutional neural networks independently for finely retuning. We design three classification schemes in the framework and different schemes utilize different convolutional neural network layers to compare their effects on action recognition. Three views are combined to describe the actions more comprehensively in each classification scheme. The proposed framework is evaluated on three challenging public human action data sets. Experiments indicate that our method has better performance and can provide discriminative spatial–temporal information for human action recognition in depth videos.

Download Full-text

Recognition of Human Silhouette Based on Global Features

International Journal of Natural Computing Research ◽

10.4018/jncr.2010100105 ◽

2010 ◽

Vol 1 (4) ◽

pp. 47-55 ◽

Cited By ~ 2

Author(s):

Milene Arantes ◽

Adilson Gonzaga

Keyword(s):

Human Body ◽

Gait Recognition ◽

Human Motion ◽

Image Sequences ◽

Video Sequences ◽

Fusion Model ◽

Global Features ◽

Image Models ◽

Human Joints ◽

Human Silhouette

The aim of this paper is people recognition based on their gait. The authors propose a computer vision approach applied to video sequences extracting global features of human motion. From the skeleton, the authors extract the information about human joints. From the silhouette and the authors get the boundary features of the human body. The binary and gray-level-images contain different aspects about the human motion. This work proposes to recover the global information of the human body based on four segmented image models and applies a fusion model to improve classification. The authors consider frames as elements of distinct classes of video sequences and the sequences themselves as classes in a database. The classification rates obtained separately from four image sequences are then merged together by a fusion technique. The results were then compared with other techniques for gait recognition.

Download Full-text

Study on Human Body Action Recognition

Biometric Recognition - Lecture Notes in Computer Science ◽

10.1007/978-3-319-97909-0_10 ◽

2018 ◽

pp. 87-95

Author(s):

Dong Yin ◽

Yu-Qing Miao ◽

Kang Qiu ◽

An Wang

Keyword(s):

Action Recognition ◽

Human Body

Download Full-text