scholarly journals Entwicklung und Evaluation eines Deep-Learning-Algorithmus für die Worterkennung aus Lippenbewegungen für die deutsche Sprache

HNO ◽  
2022 ◽  
Author(s):  
Nam Dinh Pham ◽  
Torsten Rahne

Zusammenfassung Hintergrund Zahlreiche Menschen profitieren beim Lippenlesen von den zusätzlichen visuellen Informationen aus den Lippenbewegungen des Sprechenden, was jedoch sehr fehleranfällig ist. Algorithmen zum Lippenlesen mit auf künstlichen neuronalen Netzwerken basierender künstlicher Intelligenz verbessern die Worterkennung signifikant, stehen jedoch nicht für die deutsche Sprache zur Verfügung. Material und Methoden Es wurden 1806 Videos mit jeweils nur einer deutsch sprechenden Person selektiert, in Wortsegmente unterteilt und mit einer Spracherkennungssoftware Wortklassen zugeordnet. In 38.391 Videosegmenten mit 32 Sprechenden wurden 18 mehrsilbige, visuell voneinander unterscheidbare Wörter zum Trainieren und Validieren eines neuronalen Netzwerks verwendet. Die Modelle 3D Convolutional Neural Network, Gated Recurrent Units und die Kombination beider Modelle (GRUConv) wurden ebenso verglichen wie unterschiedliche Bildausschnitte und Farbräume der Videos. Die Korrektklassifikationsrate wurde jeweils innerhalb von 5000 Trainingsepochen ermittelt. Ergebnisse Der Vergleich der Farbräume ergab keine relevant unterschiedlichen Korrektklassifikationsraten im Bereich von 69 % bis 72 %. Bei Zuschneidung auf die Lippen wurde mit 70 % eine deutlich höhere Korrektklassifikationsrate als bei Zuschnitt auf das gesamte Sprechergesicht (34 %) erreicht. Mit dem GRUConv-Modell betrugen die maximalen Korrektklassifikationsraten 87 % bei bekannten Sprechenden und 63 % in der Validierung mit unbekannten Sprechenden. Schlussfolgerung Das erstmals für die deutsche Sprache entwickelte neuronale Netzwerk zum Lippenlesen zeigt eine sehr große, mit englischsprachigen Algorithmen vergleichbare Genauigkeit. Es funktioniert auch mit unbekannten Sprechenden und kann mit mehr Wortklassen generalisiert werden.

2019 ◽  
Author(s):  
Seoin Back ◽  
Junwoong Yoon ◽  
Nianhan Tian ◽  
Wen Zhong ◽  
Kevin Tran ◽  
...  

We present an application of deep-learning convolutional neural network of atomic surface structures using atomic and Voronoi polyhedra-based neighbor information to predict adsorbate binding energies for the application in catalysis.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Young-Gon Kim ◽  
Sungchul Kim ◽  
Cristina Eunbee Cho ◽  
In Hye Song ◽  
Hee Jin Lee ◽  
...  

AbstractFast and accurate confirmation of metastasis on the frozen tissue section of intraoperative sentinel lymph node biopsy is an essential tool for critical surgical decisions. However, accurate diagnosis by pathologists is difficult within the time limitations. Training a robust and accurate deep learning model is also difficult owing to the limited number of frozen datasets with high quality labels. To overcome these issues, we validated the effectiveness of transfer learning from CAMELYON16 to improve performance of the convolutional neural network (CNN)-based classification model on our frozen dataset (N = 297) from Asan Medical Center (AMC). Among the 297 whole slide images (WSIs), 157 and 40 WSIs were used to train deep learning models with different dataset ratios at 2, 4, 8, 20, 40, and 100%. The remaining, i.e., 100 WSIs, were used to validate model performance in terms of patch- and slide-level classification. An additional 228 WSIs from Seoul National University Bundang Hospital (SNUBH) were used as an external validation. Three initial weights, i.e., scratch-based (random initialization), ImageNet-based, and CAMELYON16-based models were used to validate their effectiveness in external validation. In the patch-level classification results on the AMC dataset, CAMELYON16-based models trained with a small dataset (up to 40%, i.e., 62 WSIs) showed a significantly higher area under the curve (AUC) of 0.929 than those of the scratch- and ImageNet-based models at 0.897 and 0.919, respectively, while CAMELYON16-based and ImageNet-based models trained with 100% of the training dataset showed comparable AUCs at 0.944 and 0.943, respectively. For the external validation, CAMELYON16-based models showed higher AUCs than those of the scratch- and ImageNet-based models. Model performance for slide feasibility of the transfer learning to enhance model performance was validated in the case of frozen section datasets with limited numbers.


2021 ◽  
Vol 13 (2) ◽  
pp. 274
Author(s):  
Guobiao Yao ◽  
Alper Yilmaz ◽  
Li Zhang ◽  
Fei Meng ◽  
Haibin Ai ◽  
...  

The available stereo matching algorithms produce large number of false positive matches or only produce a few true-positives across oblique stereo images with large baseline. This undesired result happens due to the complex perspective deformation and radiometric distortion across the images. To address this problem, we propose a novel affine invariant feature matching algorithm with subpixel accuracy based on an end-to-end convolutional neural network (CNN). In our method, we adopt and modify a Hessian affine network, which we refer to as IHesAffNet, to obtain affine invariant Hessian regions using deep learning framework. To improve the correlation between corresponding features, we introduce an empirical weighted loss function (EWLF) based on the negative samples using K nearest neighbors, and then generate deep learning-based descriptors with high discrimination that is realized with our multiple hard network structure (MTHardNets). Following this step, the conjugate features are produced by using the Euclidean distance ratio as the matching metric, and the accuracy of matches are optimized through the deep learning transform based least square matching (DLT-LSM). Finally, experiments on Large baseline oblique stereo images acquired by ground close-range and unmanned aerial vehicle (UAV) verify the effectiveness of the proposed approach, and comprehensive comparisons demonstrate that our matching algorithm outperforms the state-of-art methods in terms of accuracy, distribution and correct ratio. The main contributions of this article are: (i) our proposed MTHardNets can generate high quality descriptors; and (ii) the IHesAffNet can produce substantial affine invariant corresponding features with reliable transform parameters.


Cancers ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 652 ◽  
Author(s):  
Carlo Augusto Mallio ◽  
Andrea Napolitano ◽  
Gennaro Castiello ◽  
Francesco Maria Giordano ◽  
Pasquale D'Alessio ◽  
...  

Background: Coronavirus disease 2019 (COVID-19) pneumonia and immune checkpoint inhibitor (ICI) therapy-related pneumonitis share common features. The aim of this study was to determine on chest computed tomography (CT) images whether a deep convolutional neural network algorithm is able to solve the challenge of differential diagnosis between COVID-19 pneumonia and ICI therapy-related pneumonitis. Methods: We enrolled three groups: a pneumonia-free group (n = 30), a COVID-19 group (n = 34), and a group of patients with ICI therapy-related pneumonitis (n = 21). Computed tomography images were analyzed with an artificial intelligence (AI) algorithm based on a deep convolutional neural network structure. Statistical analysis included the Mann–Whitney U test (significance threshold at p < 0.05) and the receiver operating characteristic curve (ROC curve). Results: The algorithm showed low specificity in distinguishing COVID-19 from ICI therapy-related pneumonitis (sensitivity 97.1%, specificity 14.3%, area under the curve (AUC) = 0.62). ICI therapy-related pneumonitis was identified by the AI when compared to pneumonia-free controls (sensitivity = 85.7%, specificity 100%, AUC = 0.97). Conclusions: The deep learning algorithm is not able to distinguish between COVID-19 pneumonia and ICI therapy-related pneumonitis. Awareness must be increased among clinicians about imaging similarities between COVID-19 and ICI therapy-related pneumonitis. ICI therapy-related pneumonitis can be applied as a challenge population for cross-validation to test the robustness of AI models used to analyze interstitial pneumonias of variable etiology.


Electronics ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 81
Author(s):  
Jianbin Xiong ◽  
Dezheng Yu ◽  
Shuangyin Liu ◽  
Lei Shu ◽  
Xiaochan Wang ◽  
...  

Plant phenotypic image recognition (PPIR) is an important branch of smart agriculture. In recent years, deep learning has achieved significant breakthroughs in image recognition. Consequently, PPIR technology that is based on deep learning is becoming increasingly popular. First, this paper introduces the development and application of PPIR technology, followed by its classification and analysis. Second, it presents the theory of four types of deep learning methods and their applications in PPIR. These methods include the convolutional neural network, deep belief network, recurrent neural network, and stacked autoencoder, and they are applied to identify plant species, diagnose plant diseases, etc. Finally, the difficulties and challenges of deep learning in PPIR are discussed.


2021 ◽  
Vol 13 (10) ◽  
pp. 1953
Author(s):  
Seyed Majid Azimi ◽  
Maximilian Kraus ◽  
Reza Bahmanyar ◽  
Peter Reinartz

In this paper, we address various challenges in multi-pedestrian and vehicle tracking in high-resolution aerial imagery by intensive evaluation of a number of traditional and Deep Learning based Single- and Multi-Object Tracking methods. We also describe our proposed Deep Learning based Multi-Object Tracking method AerialMPTNet that fuses appearance, temporal, and graphical information using a Siamese Neural Network, a Long Short-Term Memory, and a Graph Convolutional Neural Network module for more accurate and stable tracking. Moreover, we investigate the influence of the Squeeze-and-Excitation layers and Online Hard Example Mining on the performance of AerialMPTNet. To the best of our knowledge, we are the first to use these two for regression-based Multi-Object Tracking. Additionally, we studied and compared the L1 and Huber loss functions. In our experiments, we extensively evaluate AerialMPTNet on three aerial Multi-Object Tracking datasets, namely AerialMPT and KIT AIS pedestrian and vehicle datasets. Qualitative and quantitative results show that AerialMPTNet outperforms all previous methods for the pedestrian datasets and achieves competitive results for the vehicle dataset. In addition, Long Short-Term Memory and Graph Convolutional Neural Network modules enhance the tracking performance. Moreover, using Squeeze-and-Excitation and Online Hard Example Mining significantly helps for some cases while degrades the results for other cases. In addition, according to the results, L1 yields better results with respect to Huber loss for most of the scenarios. The presented results provide a deep insight into challenges and opportunities of the aerial Multi-Object Tracking domain, paving the way for future research.


2021 ◽  
Vol 13 (4) ◽  
pp. 554
Author(s):  
A. A. Masrur Ahmed ◽  
Ravinesh C Deo ◽  
Nawin Raj ◽  
Afshin Ghahramani ◽  
Qi Feng ◽  
...  

Remotely sensed soil moisture forecasting through satellite-based sensors to estimate the future state of the underlying soils plays a critical role in planning and managing water resources and sustainable agricultural practices. In this paper, Deep Learning (DL) hybrid models (i.e., CEEMDAN-CNN-GRU) are designed for daily time-step surface soil moisture (SSM) forecasts, employing the gated recurrent unit (GRU), complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN), and convolutional neural network (CNN). To establish the objective model’s viability for SSM forecasting at multi-step daily horizons, the hybrid CEEMDAN-CNN-GRU model is tested at 1st, 5th, 7th, 14th, 21st, and 30th day ahead period by assimilating a comprehensive pool of 52 predictor dataset obtained from three distinct data sources. Data comprise satellite-derived Global Land Data Assimilation System (GLDAS) repository a global, high-temporal resolution, unique terrestrial modelling system, and ground-based variables from Scientific Information Landowners (SILO) and synoptic-scale climate indices. The results demonstrate the forecasting capability of the hybrid CEEMDAN-CNN-GRU model with respect to the counterpart comparative models. This is supported by a relatively lower value of the mean absolute percentage and root mean square error. In terms of the statistical score metrics and infographics employed to test the final model’s utility, the proposed CEEMDAN-CNN-GRU models are considerably superior compared to a standalone and other hybrid method tested on independent SSM data developed through feature selection approaches. Thus, the proposed approach can be successfully implemented in hydrology and agriculture management.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2852
Author(s):  
Parvathaneni Naga Srinivasu ◽  
Jalluri Gnana SivaSai ◽  
Muhammad Fazal Ijaz ◽  
Akash Kumar Bhoi ◽  
Wonjoon Kim ◽  
...  

Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region’s image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.


Sign in / Sign up

Export Citation Format

Share Document