key frames
Recently Published Documents


TOTAL DOCUMENTS

237
(FIVE YEARS 68)

H-INDEX

16
(FIVE YEARS 4)

2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yuanyao Lu ◽  
Qi Xiao ◽  
Haiyang Jiang

In recent years, deep learning has already been applied to English lip-reading. However, Chinese lip-reading starts late and lacks relevant dataset, and the recognition accuracy is not ideal. Therefore, this paper proposes a new hybrid neural network model to establish a Chinese lip-reading system. In this paper, we integrate the attention mechanism into both CNN and RNN. Specifically, we add the convolutional block attention module (CBAM) to the ResNet50 neural network, which enhances its ability to capture the small differences among the mouth patterns of similarly pronounced words in Chinese, improving the performance of feature extraction in the convolution process. We also add the time attention mechanism to the GRU neural network, which helps to extract the features among consecutive lip motion images. Considering the effects of the moments before and after on the current moment in the lip-reading process, we assign more weights to the key frames, which makes the features more representative. We further validate our model through experiments on our self-built dataset. Our experiments show that using convolutional block attention module (CBAM) in the Chinese lip-reading model can accurately recognize Chinese numbers 0–9 and some frequently used Chinese words. Compared with other lip-reading systems, our system has better performance and higher recognition accuracy.


2021 ◽  
Vol 7 ◽  
pp. 1-24
Author(s):  
Lydia Zvyagintseva

This paper begins and ends with a provocation: I argue that refusal in librarianship is both impossible and necessary. Reviewing examples of crisis narratives which permeate both American and Canadian universities, I take a materialist perspective on the idea of refusal within academic librarianship. To do so, I draw on the work of Audra Simpson, Kyle Whyte, Eve Tuck, Mario Tronti, and Rinaldo Walcott to examine the sites of impossibility of refusal in the practice of academic librarianship within contemporary neoliberal education institutions. Then, I analyze the totality of capitalism in setting the limit for the practice of refusal through case studies of direct action, including the Icelandic Women’s Strike of 1975 and the 2020 Scholar Strike Canada. Finally, I identify private property and history as key frames for understanding the contradiction at the heart of refusal of crisis. As such, any refusal that does not address the centrality of labour and private property relations can thus be understood as harm reduction rather than emancipation. Ultimately, I argue that for librarians to refuse would require an abandonment of liberalism as librarianship’s guiding philosophy, and a redefinition of librarianship as such. 


Algorithms ◽  
2021 ◽  
Vol 14 (11) ◽  
pp. 303
Author(s):  
Alan Koschel ◽  
Christoph Müller ◽  
Alexander Reiterer

Cameras play a prominent role in the context of 3D data, as they can be designed to be very cheap and small and can therefore be used in many 3D reconstruction systems. Typical cameras capture video at 20 to 60 frames per second, resulting in a high number of frames to select from for 3D reconstruction. Many frames are unsuited for reconstruction as they suffer from motion blur or show too little variation compared to other frames. The camera used within this work has built-in inertial sensors. What if one could use the built-in inertial sensors to select a set of key frames well-suited for 3D reconstruction, free from motion blur and redundancy, in real time? A random forest classifier (RF) is trained by inertial data to determine frames without motion blur and to reduce redundancy. Frames are analyzed by the fast Fourier transformation and Lucas–Kanade method to detect motion blur and moving features in frames to label those correctly to train the RF. We achieve a classifier that omits successfully redundant frames and preserves frames with the required quality but exhibits an unsatisfied performance with respect to ideal frames. A 3D reconstruction by Meshroom shows a better result with selected key frames by the classifier. By extracting frames from video, one can comfortably scan objects and scenes without taking single pictures. Our proposed method automatically extracts the best frames in real time without using complex image-processing algorithms.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Yi Liu ◽  
Yue Zhang ◽  
Haidong Hu ◽  
Xiaodong Liu ◽  
Lun Zhang ◽  
...  

With the rise and rapid development of short video sharing websites, the number of short videos on the Internet has been growing explosively. The organization and classification of short videos have become the basis for the effective use of short videos, which is also a problem faced by major short video platforms. Aiming at the characteristics of complex short video content categories and rich extended text information, this paper uses methods in the text classification field to solve the short video classification problem. Compared with the traditional way of classifying and understanding short video key frames, this method has the characteristics of lower computational cost, more accurate classification results, and easier application. This paper proposes a text classification model based on the attention mechanism of multitext embedding short video extension. The experiment first uses the training language model Albert to extract sentence-level vectors and then uses the attention mechanism to study the text information in various short video extensions in a short video classification weight factor. And this research applied Google’s unsupervised data augmentation (UDA) method based on unsupervised data, creatively combining it with the Chinese knowledge graph, and realized TF-IDF word replacement. During the training process, we introduced a large amount of unlabeled data, which significantly improved the accuracy of model classification. The final series of related experiments is aimed at comparing with the existing short video title classification methods, classification methods based on video key frames, and hybrid methods, and proving that the method proposed in this article is more accurate and robust on the test set.


Sensors ◽  
2021 ◽  
Vol 21 (20) ◽  
pp. 6761
Author(s):  
Di Liu ◽  
Hui Xu ◽  
Jianzhong Wang ◽  
Yinghua Lu ◽  
Jun Kong ◽  
...  

Graph Convolutional Networks (GCNs) have attracted a lot of attention and shown remarkable performance for action recognition in recent years. For improving the recognition accuracy, how to build graph structure adaptively, select key frames and extract discriminative features are the key problems of this kind of method. In this work, we propose a novel Adaptive Attention Memory Graph Convolutional Networks (AAM-GCN) for human action recognition using skeleton data. We adopt GCN to adaptively model the spatial configuration of skeletons and employ Gated Recurrent Unit (GRU) to construct an attention-enhanced memory for capturing the temporal feature. With the memory module, our model can not only remember what happened in the past but also employ the information in the future using multi-bidirectional GRU layers. Furthermore, in order to extract discriminative temporal features, the attention mechanism is also employed to select key frames from the skeleton sequence. Extensive experiments on Kinetics, NTU RGB+D and HDM05 datasets show that the proposed network achieves better performance than some state-of-the-art methods.


Electronics ◽  
2021 ◽  
Vol 10 (19) ◽  
pp. 2330
Author(s):  
Renshan Zhang ◽  
Su Cao ◽  
Kuang Zhao ◽  
Huangchao Yu ◽  
Yongyang Hu

Performing autonomous maneuvering flight planning and optimization remains a challenge for unmanned aerial vehicles (UAVs), especially for fixed-wing UAVs due to its high maneuverability and model complexity. A novel hybrid-driven fixed-wing UAV maneuver optimization framework, inspired by apprenticeship learning and nonlinear programing approaches, is proposed in this paper. The work consists of two main aspects: (1) Identifying the model parameters for a certain fixed-wing UAV based on the demonstrated flight data performed by human pilot. Then, the features of the maneuvers can be described by the positional/attitude/compound key-frames. Eventually, each of the maneuvers can be decomposed into several motion primitives. (2) Formulating the maneuver planning issue into a minimum-time optimization problem, a novel nonlinear programming algorithm was developed, which was unnecessary to determine the exact time for the UAV to pass by the key-frames. The simulation results illustrate the effectiveness of the proposed framework in several scenarios, as both the preservation of geometric features and the minimization of maneuver times were ensured.


Sensors ◽  
2021 ◽  
Vol 21 (18) ◽  
pp. 6202
Author(s):  
Pedro Albuquerque ◽  
Tanmay Tulsidas Verlekar ◽  
Paulo Lobato Correia ◽  
Luís Ducla Soares

Human motion analysis provides useful information for the diagnosis and recovery assessment of people suffering from pathologies, such as those affecting the way of walking, i.e., gait. With recent developments in deep learning, state-of-the-art performance can now be achieved using a single 2D-RGB-camera-based gait analysis system, offering an objective assessment of gait-related pathologies. Such systems provide a valuable complement/alternative to the current standard practice of subjective assessment. Most 2D-RGB-camera-based gait analysis approaches rely on compact gait representations, such as the gait energy image, which summarize the characteristics of a walking sequence into one single image. However, such compact representations do not fully capture the temporal information and dependencies between successive gait movements. This limitation is addressed by proposing a spatiotemporal deep learning approach that uses a selection of key frames to represent a gait cycle. Convolutional and recurrent deep neural networks were combined, processing each gait cycle as a collection of silhouette key frames, allowing the system to learn temporal patterns among the spatial features extracted at individual time instants. Trained with gait sequences from the GAIT-IT dataset, the proposed system is able to improve gait pathology classification accuracy, outperforming state-of-the-art solutions and achieving improved generalization on cross-dataset tests.


2021 ◽  
Vol 2025 (1) ◽  
pp. 012018
Author(s):  
Junyu Chen ◽  
Ganlan Peng ◽  
Yuanfang Peng ◽  
Mu Fang ◽  
Zhibin Chen ◽  
...  
Keyword(s):  

Author(s):  
Anqi Pang ◽  
Xin Chen ◽  
Haimin Luo ◽  
Minye Wu ◽  
Jingyi Yu ◽  
...  

Recent neural rendering approaches for human activities achieve remarkable view synthesis results, but still rely on dense input views or dense training with all the capture frames, leading to deployment difficulty and inefficient training overload. However, existing advances will be ill-posed if the input is both spatially and temporally sparse. To fill this gap, in this paper we propose a few-shot neural human rendering approach (FNHR) from only sparse RGBD inputs, which exploits the temporal and spatial redundancy to generate photo-realistic free-view output of human activities. Our FNHR is trained only on the key-frames which expand the motion manifold in the input sequences. We introduce a two-branch neural blending to combine the neural point render and classical graphics texturing pipeline, which integrates reliable observations over sparse key-frames. Furthermore, we adopt a patch-based adversarial training process to make use of the local redundancy and avoids over-fitting to the key-frames, which generates fine-detailed rendering results. Extensive experiments demonstrate the effectiveness of our approach to generate high-quality free view-point results for challenging human performances under the sparse setting.


Sign in / Sign up

Export Citation Format

Share Document