scholarly journals Automatic textual description of interactions between two objects in surveillance videos

2021 ◽  
Vol 3 (7) ◽  
Author(s):  
Wael F. Youssef ◽  
Siba Haidar ◽  
Philippe Joly

AbstractThe purpose of our work is to automatically generate textual video description schemas from surveillance video scenes compatible with police incidents reports. Our proposed approach is based on a generic and flexible context-free ontology. The general schema is of the form [actuator] [action] [over/with] [actuated object] [+ descriptors: distance, speed, etc.]. We focus on scenes containing exactly two objects. Through elaborated steps, we generate a formatted textual description. We try to identify the existence of an interaction between the two objects, including remote interaction which does not involve physical contact and we point out when aggressivity took place in these cases. We use supervised deep learning to classify scenes into interaction or no-interaction classes and then into subclasses. The chosen descriptors used to represent subclasses are keys in surveillance systems that help generate live alerts and facilitate offline investigation.

Author(s):  
Chengcui Zhang

The focus of this survey is on spatio-temporal data mining and database retrieval for visual traffic surveillance systems. In many traffic surveillance applications, such as incident detection, abnormal events detection, vehicle speed estimation, and traffic volume estimation, the data used for reasoning is really in the form of spatio-temporal data (e.g. vehicle trajectories). How to effectively analyze these spatio-temporal data to automatically find its inherent characteristics for different visual traffic surveillance applications has been of great interest. Examples of spatio-temporal patterns extracted from traffic surveillance videos include, but are not limited to, sudden stops, harsh turns, speeding, and collisions. To meet the different needs of various traffic surveillance applications, several application- or event- specific models have been proposed in the literature. This paper provides a survey of different models and data mining algorithms to cover state of the art in spatio-temporal modelling, spatio-temporal data mining, and spatio-temporal retrieval for traffic surveillance video databases. In addition, the database model issues and challenges for traffic surveillance videos are also discussed in this survey.


Information ◽  
2018 ◽  
Vol 9 (12) ◽  
pp. 301
Author(s):  
Qian Li ◽  
Rangding Wang ◽  
Dawen Xu

Surveillance systems are ubiquitous in our lives, and surveillance videos are often used as significant evidence for judicial forensics. However, the authenticity of surveillance videos is difficult to guarantee. Ascertaining the authenticity of surveillance video is an urgent problem. Inter-frame forgery is one of the most common ways for video tampering. The forgery will reduce the correlation between adjacent frames at tampering position. Therefore, the correlation can be used to detect tamper operation. The algorithm is composed of feature extraction and abnormal point localization. During feature extraction, we extract the 2-D phase congruency of each frame, since it is a good image characteristic. Then calculate the correlation between the adjacent frames. In the second phase, the abnormal points were detected by using k-means clustering algorithm. The normal and abnormal points were clustered into two categories. Experimental results demonstrate that the scheme has high detection and localization accuracy.


Author(s):  
Mrunal Malekar

Videos generated by surveillance cameras inside the ATM were very long. In case, any robbery had taken place inside the ATM; it became time consuming to watch the entire long video. Hence, there was a need to process these surveillance videos by extracting the priority frames from it in which suspicious activities like robbery, murder, kidnap, etc. had taken place. The objective of this paper was to propose algorithm that would generate a detect the suspicious frames from that long surveillance video for the authorities which would consists of priority information. In this paper a novel approach dealing with Convolutional Neural Networks using Deep Learning was used to sample the priority information from the surveillance videos. The priority information was the suspicious activities like robbery, murder, etc. which take place inside the ATM. The results of the CNN model effectively were able to extract suspicious activity frames from a long video and thus extract suspicious frames and create a video from it.


2019 ◽  
Vol 9 (22) ◽  
pp. 4871 ◽  
Author(s):  
Quan Liu ◽  
Chen Feng ◽  
Zida Song ◽  
Joseph Louis ◽  
Jian Zhou

Earthmoving is an integral civil engineering operation of significance, and tracking its productivity requires the statistics of loads moved by dump trucks. Since current truck loads’ statistics methods are laborious, costly, and limited in application, this paper presents the framework of a novel, automated, non-contact field earthmoving quantity statistics (FEQS) for projects with large earthmoving demands that use uniform and uncovered trucks. The proposed FEQS framework utilizes field surveillance systems and adopts vision-based deep learning for full/empty-load truck classification as the core work. Since convolutional neural network (CNN) and its transfer learning (TL) forms are popular vision-based deep learning models and numerous in type, a comparison study is conducted to test the framework’s core work feasibility and evaluate the performance of different deep learning models in implementation. The comparison study involved 12 CNN or CNN-TL models in full/empty-load truck classification, and the results revealed that while several provided satisfactory performance, the VGG16-FineTune provided the optimal performance. This proved the core work feasibility of the proposed FEQS framework. Further discussion provides model choice suggestions that CNN-TL models are more feasible than CNN prototypes, and models that adopt different TL methods have advantages in either working accuracy or speed for different tasks.


2021 ◽  
Vol 7 (2) ◽  
pp. 12
Author(s):  
Yousef I. Mohamad ◽  
Samah S. Baraheem ◽  
Tam V. Nguyen

Automatic event recognition in sports photos is both an interesting and valuable research topic in the field of computer vision and deep learning. With the rapid increase and the explosive spread of data, which is being captured momentarily, the need for fast and precise access to the right information has become a challenging task with considerable importance for multiple practical applications, i.e., sports image and video search, sport data analysis, healthcare monitoring applications, monitoring and surveillance systems for indoor and outdoor activities, and video captioning. In this paper, we evaluate different deep learning models in recognizing and interpreting the sport events in the Olympic Games. To this end, we collect a dataset dubbed Olympic Games Event Image Dataset (OGED) including 10 different sport events scheduled for the Olympic Games Tokyo 2020. Then, the transfer learning is applied on three popular deep convolutional neural network architectures, namely, AlexNet, VGG-16 and ResNet-50 along with various data augmentation methods. Extensive experiments show that ResNet-50 with the proposed photobombing guided data augmentation achieves 90% in terms of accuracy.


2019 ◽  
Vol 9 (10) ◽  
pp. 2003 ◽  
Author(s):  
Tung-Ming Pan ◽  
Kuo-Chin Fan ◽  
Yuan-Kai Wang

Intelligent analysis of surveillance videos over networks requires high recognition accuracy by analyzing good-quality videos that however introduce significant bandwidth requirement. Degraded video quality because of high object dynamics under wireless video transmission induces more critical issues to the success of smart video surveillance. In this paper, an object-based source coding method is proposed to preserve constant quality of video streaming over wireless networks. The inverse relationship between video quality and object dynamics (i.e., decreasing video quality due to the occurrence of large and fast-moving objects) is characterized statistically as a linear model. A regression algorithm that uses robust M-estimator statistics is proposed to construct the linear model with respect to different bitrates. The linear model is applied to predict the bitrate increment required to enhance video quality. A simulated wireless environment is set up to verify the proposed method under different wireless situations. Experiments with real surveillance videos of a variety of object dynamics are conducted to evaluate the performance of the method. Experimental results demonstrate significant improvement of streaming videos relative to both visual and quantitative aspects.


Sign in / Sign up

Export Citation Format

Share Document