Human Body Articulation for Action Recognition in Video Sequences

Author(s):  
Tuan Hue Thi ◽  
Sijun Lu ◽  
Jian Zhang ◽  
Li Cheng ◽  
Li Wang
Author(s):  
Gopika Rajendran ◽  
Ojus Thomas Lee ◽  
Arya Gopi ◽  
Jais jose ◽  
Neha Gautham

With the evolution of computing technology in many application like human robot interaction, human computer interaction and health-care system, 3D human body models and their dynamic motions has gained popularity. Human performance accompanies human body shapes and their relative motions. Research on human activity recognition is structured around how the complex movement of a human body is identified and analyzed. Vision based action recognition from video is such kind of tasks where actions are inferred by observing the complete set of action sequence performed by human. Many techniques have been revised over the recent decades in order to develop a robust as well as effective framework for action recognition. In this survey, we summarize recent advances in human action recognition, namely the machine learning approach, deep learning approach and evaluation of these approaches.


2020 ◽  
Vol 57 (24) ◽  
pp. 241003
Author(s):  
高德勇 Gao Deyong ◽  
康自兵 Kang Zibing ◽  
王松 Wang Song ◽  
王阳萍 Wang Yangping

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Qiubo Zhong ◽  
Caiming Zheng ◽  
Haoxiang Zhang

A novel posture motion-based spatiotemporal fused graph convolutional network (PM-STGCN) is presented for skeleton-based action recognition. Existing methods on skeleton-based action recognition focus on independently calculating the joint information in single frame and motion information of joints between adjacent frames from the human body skeleton structure and then combine the classification results. However, that does not take into consideration of the complicated temporal and spatial relationship of the human body action sequence, so they are not very efficient in distinguishing similar actions. In this work, we enhance the ability of distinguishing similar actions by focusing on spatiotemporal fusion and adaptive feature extraction for high discrimination information. Firstly, the local posture motion-based attention (LPM-TAM) module is proposed for the purpose of suppressing the skeleton sequence data with a low amount of motion in the temporal domain, and the representation of motion posture features is concentrated. Besides, the local posture motion-based channel attention module (LPM-CAM) is introduced to make use of the strongly discriminative representation between different action classes of similarity. Finally, the posture motion-based spatiotemporal fusion (PM-STF) module is constructed which fuses the spatiotemporal skeleton data by filtering out the low-information sequence and enhances the posture motion features adaptively with high discrimination. Extensive experiments have been conducted, and the results demonstrate that the proposed model is superior to the commonly used action recognition methods. The designed human-robot interaction system based on action recognition has competitive performance compared with the speech interaction system.


2020 ◽  
Vol 34 (03) ◽  
pp. 2677-2684
Author(s):  
Marjaneh Safaei ◽  
Pooyan Balouchian ◽  
Hassan Foroosh

Action recognition in still images poses a great challenge due to (i) fewer available training data, (ii) absence of temporal information. To address the first challenge, we introduce a dataset for STill image Action Recognition (STAR), containing over $1M$ images across 50 different human body-motion action categories. UCF-STAR is the largest dataset in the literature for action recognition in still images. The key characteristics of UCF-STAR include (1) focusing on human body-motion rather than relatively static human-object interaction categories, (2) collecting images from the wild to benefit from a varied set of action representations, (3) appending multiple human-annotated labels per image rather than just the action label, and (4) inclusion of rich, structured and multi-modal set of metadata for each image. This departs from existing datasets, which typically provide single annotation in a smaller number of images and categories, with no metadata. UCF-STAR exposes the intrinsic difficulty of action recognition through its realistic scene and action complexity. To benchmark and demonstrate the benefits of UCF-STAR as a large-scale dataset, and to show the role of “latent” motion information in recognizing human actions in still images, we present a novel approach relying on predicting temporal information, yielding higher accuracy on 5 widely-used datasets.


2019 ◽  
Vol 16 (1) ◽  
pp. 172988141882509 ◽  
Author(s):  
Hanbo Wu ◽  
Xin Ma ◽  
Yibin Li

Temporal information plays a significant role in video-based human action recognition. How to effectively extract the spatial–temporal characteristics of actions in videos has always been a challenging problem. Most existing methods acquire spatial and temporal cues in videos individually. In this article, we propose a new effective representation for depth video sequences, called hierarchical dynamic depth projected difference images that can aggregate the action spatial and temporal information simultaneously at different temporal scales. We firstly project depth video sequences onto three orthogonal Cartesian views to capture the 3D shape and motion information of human actions. Hierarchical dynamic depth projected difference images are constructed with the rank pooling in each projected view to hierarchically encode the spatial–temporal motion dynamics in depth videos. Convolutional neural networks can automatically learn discriminative features from images and have been extended to video classification because of their superior performance. To verify the effectiveness of hierarchical dynamic depth projected difference images representation, we construct a hierarchical dynamic depth projected difference images–based action recognition framework where hierarchical dynamic depth projected difference images in three views are fed into three identical pretrained convolutional neural networks independently for finely retuning. We design three classification schemes in the framework and different schemes utilize different convolutional neural network layers to compare their effects on action recognition. Three views are combined to describe the actions more comprehensively in each classification scheme. The proposed framework is evaluated on three challenging public human action data sets. Experiments indicate that our method has better performance and can provide discriminative spatial–temporal information for human action recognition in depth videos.


2010 ◽  
Vol 1 (4) ◽  
pp. 47-55 ◽  
Author(s):  
Milene Arantes ◽  
Adilson Gonzaga

The aim of this paper is people recognition based on their gait. The authors propose a computer vision approach applied to video sequences extracting global features of human motion. From the skeleton, the authors extract the information about human joints. From the silhouette and the authors get the boundary features of the human body. The binary and gray-level-images contain different aspects about the human motion. This work proposes to recover the global information of the human body based on four segmented image models and applies a fusion model to improve classification. The authors consider frames as elements of distinct classes of video sequences and the sequences themselves as classes in a database. The classification rates obtained separately from four image sequences are then merged together by a fusion technique. The results were then compared with other techniques for gait recognition.


Author(s):  
Dong Yin ◽  
Yu-Qing Miao ◽  
Kang Qiu ◽  
An Wang

Sign in / Sign up

Export Citation Format

Share Document