Action Recognition Using Close-Up of Maximum Activation and ETRI-Activity3D LivingLab Dataset

Doyoung Kim; Inwoong Lee; Dohyung Kim; Sanghoon Lee

doi:10.3390/s21206774

Action Recognition Using Close-Up of Maximum Activation and ETRI-Activity3D LivingLab Dataset

Sensors ◽

10.3390/s21206774 ◽

2021 ◽

Vol 21 (20) ◽

pp. 6774

Author(s):

Doyoung Kim ◽

Inwoong Lee ◽

Dohyung Kim ◽

Sanghoon Lee

Keyword(s):

Action Recognition ◽

High Availability ◽

Video Data ◽

Experimental Results ◽

Great Performance ◽

Action Sequences ◽

Maximum Activation ◽

Rich Data ◽

Video Input ◽

Dataset Shift

The development of action recognition models has shown great performance on various video datasets. Nevertheless, because there is no rich data on target actions in existing datasets, it is insufficient to perform action recognition applications required by industries. To satisfy this requirement, datasets composed of target actions with high availability have been created, but it is difficult to capture various characteristics in actual environments because video data are generated in a specific environment. In this paper, we introduce a new ETRI-Activity3D-LivingLab dataset, which provides action sequences in actual environments and helps to handle a network generalization issue due to the dataset shift. When the action recognition model is trained on the ETRI-Activity3D and KIST SynADL datasets and evaluated on the ETRI-Activity3D-LivingLab dataset, the performance can be severely degraded because the datasets were captured in different environments domains. To reduce this dataset shift between training and testing datasets, we propose a close-up of maximum activation, which magnifies the most activated part of a video input in detail. In addition, we present various experimental results and analysis that show the dataset shift and demonstrate the effectiveness of the proposed method.

Download Full-text

Contributive Representation-Based Reconstruction for Online 3D Action Recognition

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421500051 ◽

2020 ◽

pp. 2150005

Author(s):

Mohsen Tabejamaat ◽

Hoda Mohammadzade

Keyword(s):

Action Recognition ◽

Experimental Results ◽

Initial Part ◽

Optimal Classifier ◽

Training Sequences ◽

Action Sequences ◽

Increasing Trend ◽

Sequence Representation ◽

Existing Frames ◽

Query Sample

Recent years have seen an increasing trend in developing 3D action recognition methods. However, despite the advances, existing models still suffer from some major drawbacks including the lack of any provision for recognizing action sequences with some missing frames. This significantly hampers the applicability of these methods for online scenarios, where only an initial part of sequences are already provided. In this paper, we introduce a novel sequence-to-sequence representation-based algorithm in which a query sample is characterized using a collaborative frame representation of all the training sequences. This way, an optimal classifier is tailored for the existing frames of each query sample, making the model robust to the effect of missing frames in sequences (e.g. in online scenarios). Moreover, due to the collaborative nature of the representation, it implicitly handles the problem of varying styles during the course of activities. Experimental results on three publicly available databases, UTKinect, TST fall, and UTD-MHAD, respectively, show 95.48%, 90.91%, and 91.67% accuracy when using the beginning 75% portion of query sequences and 84.42%, 60.98%, and 87.27% accuracy for their initial 50%.

Download Full-text

Detecting “DeepFakes” in H.264 Video Data Using Compression Ghost Artifacts

Electronic Imaging ◽

10.2352/issn.2470-1173.2020.4.mwsf-116 ◽

2020 ◽

Vol 2020 (4) ◽

pp. 116-1-116-7

Author(s):

Raphael Antonius Frick ◽

Sascha Zmudzinski ◽

Martin Steinebach

Keyword(s):

Image Forensics ◽

Video Data ◽

Experimental Results ◽

Video Sequences ◽

The Internet ◽

Video Content ◽

High Quality ◽

The Public

In recent years, the number of forged videos circulating on the Internet has immensely increased. Software and services to create such forgeries have become more and more accessible to the public. In this regard, the risk of malicious use of forged videos has risen. This work proposes an approach based on the Ghost effect knwon from image forensics for detecting forgeries in videos that can replace faces in video sequences or change the mimic of a face. The experimental results show that the proposed approach is able to identify forgery in high-quality encoded video content.

Download Full-text

Low-Cost Embedded System Using Convolutional Neural Networks-Based Spatiotemporal Feature Map for Real-Time Human Action Recognition

Applied Sciences ◽

10.3390/app11114940 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4940

Author(s):

Jinsoo Kim ◽

Jeongho Cho

Keyword(s):

Embedded System ◽

Real Time ◽

Action Recognition ◽

Processing Speed ◽

Recognition Accuracy ◽

Low Cost ◽

Human Action Recognition ◽

Human Action ◽

Video Data ◽

Feature Maps

The field of research related to video data has difficulty in extracting not only spatial but also temporal features and human action recognition (HAR) is a representative field of research that applies convolutional neural network (CNN) to video data. The performance for action recognition has improved, but owing to the complexity of the model, some still limitations to operation in real-time persist. Therefore, a lightweight CNN-based single-stream HAR model that can operate in real-time is proposed. The proposed model extracts spatial feature maps by applying CNN to the images that develop the video and uses the frame change rate of sequential images as time information. Spatial feature maps are weighted-averaged by frame change, transformed into spatiotemporal features, and input into multilayer perceptrons, which have a relatively lower complexity than other HAR models; thus, our method has high utility in a single embedded system connected to CCTV. The results of evaluating action recognition accuracy and data processing speed through challenging action recognition benchmark UCF-101 showed higher action recognition accuracy than the HAR model using long short-term memory with a small amount of video frames and confirmed the real-time operational possibility through fast data processing speed. In addition, the performance of the proposed weighted mean-based HAR model was verified by testing it in Jetson NANO to confirm the possibility of using it in low-cost GPU-based embedded systems.

Download Full-text

Automatic Detection of Discrimination Actions from Social Images

Electronics ◽

10.3390/electronics10030325 ◽

2021 ◽

Vol 10 (3) ◽

pp. 325

Author(s):

Zhihao Wu ◽

Baopeng Zhang ◽

Tianchen Zhou ◽

Yan Li ◽

Jianping Fan

Keyword(s):

Action Recognition ◽

State Of The Art ◽

Automatic Detection ◽

Experimental Results ◽

Practical Approach ◽

Detection And Identification ◽

Art Methods ◽

Image Set ◽

Social Images ◽

Relationship Identification

In this paper, we developed a practical approach for automatic detection of discrimination actions from social images. Firstly, an image set is established, in which various discrimination actions and relations are manually labeled. To the best of our knowledge, this is the first work to create a dataset for discrimination action recognition and relationship identification. Secondly, a practical approach is developed to achieve automatic detection and identification of discrimination actions and relationships from social images. Thirdly, the task of relationship identification is seamlessly integrated with the task of discrimination action recognition into one single network called the Co-operative Visual Translation Embedding++ network (CVTransE++). We also compared our proposed method with numerous state-of-the-art methods, and our experimental results demonstrated that our proposed methods can significantly outperform state-of-the-art approaches.

Download Full-text

Constructing and Utilizing Video Ontology for Accurate and Fast Retrieval

International Journal of Multimedia Data Engineering and Management ◽

10.4018/jmdem.2011100104 ◽

2011 ◽

Vol 2 (4) ◽

pp. 59-75 ◽

Cited By ~ 1

Author(s):

Kimiaki Shirahama ◽

Kuniaki Uehara

Keyword(s):

Knowledge Base ◽

Large Scale ◽

Video Retrieval ◽

Computational Cost ◽

Semantic Content ◽

Video Data ◽

Experimental Results ◽

Huge Number ◽

Dempster Shafer Theory ◽

Shafer Theory

This paper examines video retrieval based on Query-By-Example (QBE) approach, where shots relevant to a query are retrieved from large-scale video data based on their similarity to example shots. This involves two crucial problems: The first is that similarity in features does not necessarily imply similarity in semantic content. The second problem is an expensive computational cost to compute the similarity of a huge number of shots to example shots. The authors have developed a method that can filter a large number of shots irrelevant to a query, based on a video ontology that is knowledge base about concepts displayed in a shot. The method utilizes various concept relationships (e.g., generalization/specialization, sibling, part-of, and co-occurrence) defined in the video ontology. In addition, although the video ontology assumes that shots are accurately annotated with concepts, accurate annotation is difficult due to the diversity of forms and appearances of the concepts. Dempster-Shafer theory is used to account the uncertainty in determining the relevance of a shot based on inaccurate annotation of this shot. Experimental results on TRECVID 2009 video data validate the effectiveness of the method.

Download Full-text

Human action recognition in video data using invariant characteristic vectors

2012 19th IEEE International Conference on Image Processing ◽

10.1109/icip.2012.6467127 ◽

2012 ◽

Cited By ~ 1

Author(s):

Nazim Ashraf ◽

Hassan Foroosh

Keyword(s):

Action Recognition ◽

Human Action Recognition ◽

Human Action ◽

Video Data ◽

Invariant Characteristic

Download Full-text

Online Strategy Clustering Based on Action Sequences in RoboCupSoccer Small Size League

Robotics ◽

10.3390/robotics8030058 ◽

2019 ◽

Vol 8 (3) ◽

pp. 58

Author(s):

Yusuke Adachi ◽

Masahide Ito ◽

Tadashi Naruse

Keyword(s):

Experimental Results ◽

Clustering Method ◽

Learning Problem ◽

Data Set ◽

Geometric Data ◽

Action Sequences ◽

Novel Method ◽

Online Strategy

This paper addresses a strategy learning problem in the RoboCupSoccer Small Size League (SSL). We propose a novel method based on action sequences to cluster an opponent’s strategies online. Our proposed method is composed of the following three steps: (1) extracting typical actions from geometric data to make action sequences, (2) calculating the dissimilarity of the sequences, and (3) clustering the sequences by using the dissimilarity. This method can reduce the amount of data used in the clustering process; handling action sequences instead of geometric data as data-set makes it easier to search actions. As a result, the proposed clustering method is online feasible and also is applicable to countering an opponent’s strategy. The effectiveness of the proposed method was validated by experimental results.

Download Full-text

Consistent Video Style Transfer via Compound Regularization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6905 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12233-12240

Author(s):

Wenjing Wang ◽

Jizheng Xu ◽

Li Zhang ◽

Yue Wang ◽

Jiaying Liu

Keyword(s):

State Of The Art ◽

Video Data ◽

Experimental Results ◽

Temporal Consistency ◽

Challenging Problem ◽

Style Transfer ◽

Single Frame ◽

Optical Flows ◽

Training Strategies ◽

Art Style

Recently, neural style transfer has drawn many attentions and significant progresses have been made, especially for image style transfer. However, flexible and consistent style transfer for videos remains a challenging problem. Existing training strategies, either using a significant amount of video data with optical flows or introducing single-frame regularizers, have limited performance on real videos. In this paper, we propose a novel interpretation of temporal consistency, based on which we analyze the drawbacks of existing training strategies; and then derive a new compound regularization. Experimental results show that the proposed regularization can better balance the spatial and temporal performance, which supports our modeling. Combining with the new cost formula, we design a zero-shot video style transfer framework. Moreover, for better feature migration, we introduce a new module to dynamically adjust inter-channel distributions. Quantitative and qualitative results demonstrate the superiority of our method over other state-of-the-art style transfer methods. Our project is publicly available at: https://daooshee.github.io/CompoundVST/.

Download Full-text

Human Action Recognition Algorithm Based on Key Posture

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.631-632.1303 ◽

2013 ◽

Vol 631-632 ◽

pp. 1303-1308

Author(s):

He Jin Yuan

Keyword(s):

Action Recognition ◽

Clustering Algorithm ◽

Recognition Accuracy ◽

Human Action Recognition ◽

Human Action ◽

Recognition Algorithm ◽

Training Samples ◽

Action Sequences

A novel human action recognition algorithm based on key posture is proposed in this paper. In the method, the mesh features of each image in human action sequences are firstly calculated; then the key postures of the human mesh features are generated through k-medoids clustering algorithm; and the motion sequences are thus represented as vectors of key postures. The component of the vector is the occurrence number of the corresponding posture included in the action. For human action recognition, the observed action is firstly changed into key posture vector; then the correlevant coefficients to the training samples are calculated and the action which best matches the observed sequence is chosen as the final category. The experiments on Weizmann dataset demonstrate that our method is effective for human action recognition. The average recognition accuracy can exceed 90%.

Download Full-text

DWT-BASED VIDEO DATA HIDING ROBUST TO MPEG COMPRESSION AND FRAME LOSS

International Journal of Image and Graphics ◽

10.1142/s0219467805001689 ◽

2005 ◽

Vol 05 (01) ◽

pp. 111-133 ◽

Cited By ~ 6

Author(s):

HONGMEI LIU ◽

JIWU HUANG ◽

YUN Q. SHI

Keyword(s):

Data Hiding ◽

Random Noise ◽

Data Retrieval ◽

Video Data ◽

Experimental Results ◽

Data Detection ◽

Discrete Wavelet ◽

Magnitude Distribution ◽

Embedding Strategy ◽

Error Probabilities

In this paper, we propose a blind video data-hiding algorithm in DWT (discrete wavelet transform) domain. It embeds multiple information bits into uncompressed video sequences. The major features of this algorithm are as follows. (1) Development of a novel embedding strategy in DWT domain. Different from the existing schemes based on DWT that have explicitly excluded the LL subband coefficients from data embedding, we embed data in the LL subband for better invisibility and robustness. The underlying idea comes from our qualitative and quantitative analysis of the DWT coefficients magnitude distribution over commonly used images. The experimental results confirm the superiority of the proposed embedding strategy. (2) To combat temporal attacks, which will destroy the synchronization of hidden data that is necessary in data retrieval, we develop an effective temporal synchronization technique. Compared with the sliding correlation proposed in the existing algorithms, our synchronization technique is more advanced. (3) We adopt a new 3D interleaving technique to combat bursts of errors, while reducing random error probabilities in data detection by exploiting ECC (error correcting coding). The detection error rate with 3D interleaving is much lower than that without 3D interleaving when frame loss rate is below 50%. (4) Take a carefully designed measure in bit embedding to guarantee the invisibility of information. In experiments, we can embed a string of 402 bytes (excluding the redundant bits associated with ECC) in 96 frames of the CIF format sequence. The experimental results have demonstrated that the embedded information bits are perceptually transparent when the frames in the sequence are viewed either as still images or played continuously. The hidden information is robust to manipulations, such as MPEG-2 compression, scaling, additive random noise, and frame loss.

Download Full-text