spatiotemporal feature
Recently Published Documents


TOTAL DOCUMENTS

79
(FIVE YEARS 35)

H-INDEX

11
(FIVE YEARS 1)

2021 ◽  
Vol 30 (06) ◽  
Author(s):  
Yinhao Liu ◽  
Xiaofei Zhou ◽  
Haibing Yin ◽  
Hongkui Wang ◽  
Chenggang Yan

Author(s):  
Zhenyu Zhang ◽  
Yong Li ◽  
Jing Duan ◽  
Yilong Duan ◽  
Yixiu Guo ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yufeng Du ◽  
Quan Zhao ◽  
Xiaochun Lu

The team sports game video features complex background, fast target movement, and mutual occlusion between targets, which poses great challenges to multiperson collaborative video analysis. This paper proposes a video semantic extraction method that integrates domain knowledge and in-depth features, which can be applied to the analysis of a multiperson collaborative basketball game video, where the semantic event is modeled as an adversarial relationship between two teams of players. We first designed a scheme that combines a dual-stream network and learnable spatiotemporal feature aggregation, which can be used for end-to-end training of video semantic extraction to bridge the gap between low-level features and high-level semantic events. Then, an algorithm based on the knowledge from different video sources is proposed to extract the action semantics. The algorithm gathers local convolutional features in the entire space-time range, which can be used to track the ball/shooter/hoop to realize automatic semantic extraction of basketball game videos. Experiments show that the scheme proposed in this paper can effectively identify the four categories of short, medium, long, free throw, and scoring events and the semantics of athletes’ actions based on the video footage of the basketball game.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Yujiang Lu ◽  
Yaju Liu ◽  
Jianwei Fei ◽  
Zhihua Xia

Recent progress in deep learning, in particular the generative models, makes it easier to synthesize sophisticated forged faces in videos, leading to severe threats on social media about personal privacy and reputation. It is therefore highly necessary to develop forensics approaches to distinguish those forged videos from the authentic. Existing works are absorbed in exploring frame-level cues but insufficient in leveraging affluent temporal information. Although some approaches identify forgeries from the perspective of motion inconsistency, there is so far not a promising spatiotemporal feature fusion strategy. Towards this end, we propose the Channel-Wise Spatiotemporal Aggregation (CWSA) module to fuse deep features of continuous video frames without any recurrent units. Our approach starts by cropping the face region with some background remained, which transforms the learning objective from manipulations to the difference between pristine and manipulated pixels. A deep convolutional neural network (CNN) with skip connections that are conducive to the preservation of detection-helpful low-level features is then utilized to extract frame-level features. The CWSA module finally makes the real or fake decision by aggregating deep features of the frame sequence. Evaluation against a list of large facial video manipulation benchmarks has illustrated its effectiveness. On all three datasets, FaceForensics++, Celeb-DF, and DeepFake Detection Challenge Preview, the proposed approach outperforms the state-of-the-art methods with significant advantages.


2021 ◽  
Author(s):  
Wei Song ◽  
Qi-chao Li ◽  
Qi He ◽  
Xu Zhou ◽  
Yuan-yuan Chen

2021 ◽  
Vol 94 ◽  
pp. 116195
Author(s):  
Weijie Wei ◽  
Zhi Liu ◽  
Lijin Huang ◽  
Ziqiang Wang ◽  
Weiyu Chen ◽  
...  

Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 662
Author(s):  
Zeyuan Hu ◽  
Eung-Joo Lee

Most existing video action recognition methods mainly rely on high-level semantic information from convolutional neural networks (CNNs) but ignore the discrepancies of different information streams. However, it does not normally consider both long-distance aggregations and short-range motions. Thus, to solve these problems, we propose hierarchical excitation aggregation and disentanglement networks (Hi-EADNs), which include multiple frame excitation aggregation (MFEA) and a feature squeeze-and-excitation hierarchical disentanglement (SEHD) module. MFEA specifically uses long-short range motion modelling and calculates the feature-level temporal difference. The SEHD module utilizes these differences to optimize the weights of each spatiotemporal feature and excite motion-sensitive channels. Moreover, without introducing additional parameters, this feature information is processed with a series of squeezes and excitations, and multiple temporal aggregations with neighbourhoods can enhance the interaction of different motion frames. Extensive experimental results confirm our proposed Hi-EADN method effectiveness on the UCF101 and HMDB51 benchmark datasets, where the top-5 accuracy is 93.5% and 76.96%.


2021 ◽  
Vol 13 (6) ◽  
pp. 1117
Author(s):  
Jing Li ◽  
Yuguang Xie ◽  
Congcong Li ◽  
Yanran Dai ◽  
Jiaxin Ma ◽  
...  

In this paper, we investigate the problem of aligning multiple deployed camera into one united coordinate system for cross-camera information sharing and intercommunication. However, the difficulty is greatly increased when faced with large-scale scene under chaotic camera deployment. To address this problem, we propose a UAV-assisted wide area multi-camera space alignment approach based on spatiotemporal feature map. It employs the great global perception of Unmanned Aerial Vehicles (UAVs) to meet the challenge from wide-range environment. Concretely, we first present a novel spatiotemporal feature map construction approach to represent the input aerial and ground monitoring data. In this way, the motion consistency across view is well mined to overcome the great perspective gap between the UAV and ground cameras. To obtain the corresponding relationship between their pixels, we propose a cross-view spatiotemporal matching strategy. Through solving relative relationship with the above air-to-ground point correspondences, all ground cameras can be aligned into one surveillance space. The proposed approach was evaluated in both simulation and real environments qualitatively and quantitatively. Extensive experimental results demonstrate that our system can successfully align all ground cameras with very small pixel error. Additionally, the comparisons with other works on different test situations also verify its superior performance.


Sign in / Sign up

Export Citation Format

Share Document