scholarly journals Optimization Research on Deep Learning and Temporal Segmentation Algorithm of Video Shot in Basketball Games

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Zhenggang Yan ◽  
Yue Yu ◽  
Mohammad Shabaz

The analysis of the video shot in basketball games and the edge detection of the video shot are the most active and rapid development topics in the field of multimedia research in the world. Video shots’ temporal segmentation is based on video image frame extraction. It is the precondition for video application. Studying the temporal segmentation of basketball game video shots has great practical significance and application prospects. In view of the fact that the current algorithm has long segmentation time for the video shot of basketball games, the deep learning model and temporal segmentation algorithm based on the histogram for the video shot of the basketball game are proposed. The video data is converted from the RGB space to the HSV space by the boundary detection of the video shot of the basketball game using deep learning and processing of the image frames, in which the histogram statistics are used to reduce the dimension of the video image, and the three-color components in the video are combined into a one-dimensional feature vector to obtain the quantization level of the video. The one-dimensional vector is used as the variable to perform histogram statistics and analysis on the video shot and to calculate the continuous frame difference, the accumulated frame difference, the window frame difference, the adaptive window’s mean, and the superaverage ratio of the basketball game video. The calculation results are combined with the set dynamic threshold to optimize the temporal segmentation of the video shot in the basketball game. It can be seen from the comparison results that the effectiveness of the proposed algorithm is verified by the test of the missed detection rate of the video shots. According to the test result of the split time, the optimization algorithm for temporal segmentation of the video shot in the basketball game is efficiently implemented.

2020 ◽  
pp. 1-12
Author(s):  
Hu Jingchao ◽  
Haiying Zhang

The difficulty in class student state recognition is how to make feature judgments based on student facial expressions and movement state. At present, some intelligent models are not accurate in class student state recognition. In order to improve the model recognition effect, this study builds a two-level state detection framework based on deep learning and HMM feature recognition algorithm, and expands it as a multi-level detection model through a reasonable state classification method. In addition, this study selects continuous HMM or deep learning to reflect the dynamic generation characteristics of fatigue, and designs random human fatigue recognition experiments to complete the collection and preprocessing of EEG data, facial video data, and subjective evaluation data of classroom students. In addition to this, this study discretizes the feature indicators and builds a student state recognition model. Finally, the performance of the algorithm proposed in this paper is analyzed through experiments. The research results show that the algorithm proposed in this paper has certain advantages over the traditional algorithm in the recognition of classroom student state features.


Author(s):  
Daichi Kitaguchi ◽  
Nobuyoshi Takeshita ◽  
Hiroki Matsuzaki ◽  
Hiro Hasegawa ◽  
Takahiro Igaki ◽  
...  

Abstract Background Dividing a surgical procedure into a sequence of identifiable and meaningful steps facilitates intraoperative video data acquisition and storage. These efforts are especially valuable for technically challenging procedures that require intraoperative video analysis, such as transanal total mesorectal excision (TaTME); however, manual video indexing is time-consuming. Thus, in this study, we constructed an annotated video dataset for TaTME with surgical step information and evaluated the performance of a deep learning model in recognizing the surgical steps in TaTME. Methods This was a single-institutional retrospective feasibility study. All TaTME intraoperative videos were divided into frames. Each frame was manually annotated as one of the following major steps: (1) purse-string closure; (2) full thickness transection of the rectal wall; (3) down-to-up dissection; (4) dissection after rendezvous; and (5) purse-string suture for stapled anastomosis. Steps 3 and 4 were each further classified into four sub-steps, specifically, for dissection of the anterior, posterior, right, and left planes. A convolutional neural network-based deep learning model, Xception, was utilized for the surgical step classification task. Results Our dataset containing 50 TaTME videos was randomly divided into two subsets for training and testing with 40 and 10 videos, respectively. The overall accuracy obtained for all classification steps was 93.2%. By contrast, when sub-step classification was included in the performance analysis, a mean accuracy (± standard deviation) of 78% (± 5%), with a maximum accuracy of 85%, was obtained. Conclusions To the best of our knowledge, this is the first study based on automatic surgical step classification for TaTME. Our deep learning model self-learned and recognized the classification steps in TaTME videos with high accuracy after training. Thus, our model can be applied to a system for intraoperative guidance or for postoperative video indexing and analysis in TaTME procedures.


Sensors ◽  
2021 ◽  
Vol 21 (12) ◽  
pp. 4045
Author(s):  
Alessandro Sassu ◽  
Jose Francisco Saenz-Cogollo ◽  
Maurizio Agelli

Edge computing is the best approach for meeting the exponential demand and the real-time requirements of many video analytics applications. Since most of the recent advances regarding the extraction of information from images and video rely on computation heavy deep learning algorithms, there is a growing need for solutions that allow the deployment and use of new models on scalable and flexible edge architectures. In this work, we present Deep-Framework, a novel open source framework for developing edge-oriented real-time video analytics applications based on deep learning. Deep-Framework has a scalable multi-stream architecture based on Docker and abstracts away from the user the complexity of cluster configuration, orchestration of services, and GPU resources allocation. It provides Python interfaces for integrating deep learning models developed with the most popular frameworks and also provides high-level APIs based on standard HTTP and WebRTC interfaces for consuming the extracted video data on clients running on browsers or any other web-based platform.


Author(s):  
Canyi Du ◽  
Rui Zhong ◽  
Yishen Zhuo ◽  
Xinyu Zhang ◽  
Feifei Yu ◽  
...  

Abstract Traditional engine fault diagnosis methods usually need to extract the features manually before classifying them by the pattern recognition method, which makes it difficult to solve the end-to-end fault diagnosis problem. In recent years, deep learning has been applied in different fields, bringing considerable convenience to technological change, and its application in the automotive field also has many applications, such as image recognition, language processing, and assisted driving. In this paper, a one-dimensional convolutional neural network (1D-CNN) in deep learning is used to process vibration signals to achieve fault diagnosis and classification. By collecting the vibration signal data of different engine working conditions, the collected data are organized into several sets of data in a working cycle, which are divided into a training sample set and a test sample set. Then, a one-dimensional convolutional neural network model is built in Python to allow the feature filter (convolution kernel) to learn the data from the training set and these convolution checks process the input data of the test set. Convolution and pooling extract features to output to a new space, which is characterized by learning features directly from the original vibration signals and completing fault diagnosis. The experimental results show that the pattern recognition method based on a one-dimensional convolutional neural network can be effectively applied to engine fault diagnosis and has higher diagnostic accuracy than traditional methods.


2021 ◽  
Author(s):  
Yu Rang Park ◽  
Sang Ho Hwang ◽  
Yeonsoo Yu ◽  
Jichul Kim ◽  
Taeyeop Lee ◽  
...  

BACKGROUND Early detection and intervention of developmental disabilities (DDs) are critical for improving the long-term outcomes of the afflicted children. Mobile-based applications are easily accessible and may thus help the early identification of DDs. OBJECTIVE We aimed to identify facial expression and head pose based on face landmark data extracted from face recording videos and to differentiate the characteristics between children with DDs and those without. METHODS Eighty-nine children (DD, n=33; typically developing, n=56) were included in the analysis. Using the mobile-based application, we extracted facial landmarks and head poses from the recorded videos and performed Long Short-Term Memory(LSTM)-based DD classification. RESULTS Stratified k-fold cross-validation showed that the average values of accuracy, precision, recall, and f1-score of the LSTM based deep learning model of DD children were 88%, 91%,72%, and 80%, respectively. Through the interpretation of prediction results using SHapley Additive exPlanations (SHAP), we confirmed that the nodding head angle variable was the most important variable. All of the top 10 variables of importance had significant differences in the distribution between children with DDs and those without (p<0.05). CONCLUSIONS Our results provide preliminary evidence that the deep-learning classification model using mobile-based children’s video data could be used for the early detection of children with DDs.


Sign in / Sign up

Export Citation Format

Share Document