confusion matrix
Recently Published Documents


TOTAL DOCUMENTS

1009
(FIVE YEARS 765)

H-INDEX

24
(FIVE YEARS 10)

Author(s):  
Hamza Abbad ◽  
Shengwu Xiong

Automatic diacritization is an Arabic natural language processing topic based on the sequence labeling task where the labels are the diacritics and the letters are the sequence elements. A letter can have from zero up to two diacritics. The dataset used was a subset of the preprocessed version of the Tashkeela corpus. We developed a deep learning model composed of a stack of four bidirectional long short-term memory hidden layers of the same size and an output layer at every level. The levels correspond to the groups that we classified the diacritics into (short vowels, double case-endings, Shadda, and Sukoon). Before training, the data were divided into input vectors containing letter indexes and outputs vectors containing the indexes of diacritics regarding their groups. Both input and output vectors are concatenated, then a sliding window operation with overlapping is performed to generate continuous and fixed-size data. Such data is used for both training and evaluation. Finally, we realize some tests using the standard metrics with all of their variations and compare our results with two recent state-of-the-art works. Our model achieved 3% diacritization error rate and 8.99% word error rate when including all letters. We have also generated the confusion matrix to show the performances per output and analyzed the mismatches of the first 500 lines to classify the model errors according to their linguistic nature.


2022 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Ilesanmi Daniyan ◽  
Khumbulani Mpofu ◽  
Samuel Nwankwo

PurposeThe need to examine the integrity of infrastructure in the rail industry in order to improve its reliability and reduce the chances of breakdown due to defects has brought about development of an inspection and diagnostic robot.Design/methodology/approachIn this study, an inspection robot was designed for detecting crack, corrosion, missing clips and wear on rail track facilities. The robot is designed to use infrared and ultrasonic sensors for obstacles avoidance and crack detection, two 3D-profilometer for wear detection as well as cameras with high resolution to capture real time images and colour sensors for corrosion detection. The robot is also designed with cameras placed in front of it with colour sensors at each side to assist in the detection of corrosion in the rail track. The image processing capability of the robot will permit the analysis of the type and depth of the crack and corrosion captured in the track. The computer aided design and modeling of the robot was carried out using the Solidworks software version 2018 while the simulation of the proposed system was carried out in the MATLAB 2020b environment.FindingsThe results obtained present three frameworks for wear, corrosion and missing clips as well as crack detection. In addition, the design data for the development of the integrated robotic system is also presented in the work. The confusion matrix resulting from the simulation of the proposed system indicates significant sensitivity and accuracy of the system to the presence and detection of fault respectively. Hence, the work provides a design framework for detecting and analysing the presence of defects on the rail track.Practical implicationsThe development and the implementation of the designed robot will bring about a more proactive way to monitor rail track conditions and detect rail track defects so that effort can be geared towards its restoration before it becomes a major problem thus increasing the rail network capacity and availability.Originality/valueThe novelty of this work is based on the fact that the system is designed to work autonomously to avoid obstacles and check for cracks, missing clips, wear and corrosion in the rail tracks with a system of integrated and coordinated components.


2022 ◽  
Author(s):  
Abdul Muqtadir Khan ◽  
Abdullah BinZiad ◽  
Abdullah Al Subaii ◽  
Turki Alqarni ◽  
Mohamed Yassine Jelassi ◽  
...  

Abstract Diagnostic pumping techniques are used routinely in proppant fracturing design. The pumping process can be time consuming; however, it yields technical confidence in treatment and productivity optimization. Recent developments in data analytics and machine learning can aid in shortening operational workflows and enhance project economics. Supervised learning was applied to an existing database to streamline the process and affect the design framework. Five classification algorithms were used for this study. The database was constructed through heterogeneous reservoir plays from the injection/falloff outputs. The algorithms used were support vector machine, decision tree, random forest, multinomial, and XGBoost. The number of classes was sensitized to establish a balance between model accuracy and prediction granularity. Fifteen cases were developed for a comprehensive comparison. A complete machine learning framework was constructed to work through each case set along with hyperparameter tuning to maximize accuracy. After the model was finalized, an extensive field validation workflow was deployed. The target outputs selected for the model were crosslinked fluid efficiency, total proppant mass, and maximum proppant concentration. The unsupervised clustering technique with t-SNE algorithm that was used first lacked accuracy. Supervised classification models showed better predictions. Cross-validation techniques showed an increasing trend of prediction accuracy. Feature selection was done using one-variable-at-a-time (OVAT) and a simple feature correlation study. Because the number of features and the dataset size were small, no features were eliminated from the final model building. Accuracy and F1 score calculations were used from the confusion matrix for evaluation, XGBoost showed excellent results with an accuracy of 74 to 95% for the output parameters. Fluid efficiency was categorized into three classes and yielded an accuracy of 96%. Proppant concentration and proppant mass predictions showed 77% and 86% accuracy, respectively, for the six-class case. The combination of high accuracy and fine granularity confirmed the potential application of machine learning models. The ratio of training to testing (holdout) across all cases ranged from 80:20 to 70:30. Model validations were done through an inverse problem of predicting and matching the fracture geometry and treatment pressures from the machine learning model design and the actual net pressure match. The simulations were conducted using advanced multiphysics simulations. The advantages of this innovative design approach showed four areas of improvement: reduction in polymer consumption by 30%, reduction of the flowback time by 25%, reduction of water usage by 30%, and enhanced operational efficiency by 60 to 65%.


PeerJ ◽  
2022 ◽  
Vol 10 ◽  
pp. e12743
Author(s):  
Fangfang Liu ◽  
Guanshui Bao ◽  
Mengxia Yan ◽  
Guiming Lin

Background Primary headache is a disorder with a high incidence and low diagnostic accuracy; the incidence of migraine and tension-type headache ranks first among primary headaches. Artificial intelligence (AI) decision support systems have shown great potential in the medical field. Therefore, we attempt to use machine learning to build a clinical decision-making system for primary headaches. Methods The demographic data and headache characteristics of 173 patients were collected by questionnaires. Decision tree, random forest, gradient boosting algorithm and support vector machine (SVM) models were used to construct a discriminant model and a confusion matrix was used to calculate the evaluation indicators of the models. Furthermore, we have carried out feature selection through univariate statistical analysis and machine learning. Results In the models, the accuracy, F1 score were calculated through the confusion matrix. The logistic regression model has the best discrimination effect, with the accuracy reaching 0.84 and the area under the ROC curve also being the largest at 0.90. Furthermore, we identified the most important factors for distinguishing the two disorders through statistical analysis and machine learning: nausea/vomiting and photophobia/phonophobia. These two factors represent potential independent factors for the identification of migraines and tension-type headaches, with the accuracy reaching 0.74 and the area under the ROC curve being at 0.74. Conclusions Applying machine learning to the decision-making system for primary headaches can achieve a high diagnostic accuracy. Among them, the discrimination effect obtained by the integrated algorithm is significantly better than that of a single learner. Second, nausea/vomiting, photophobia/phonophobia may be the most important factors for distinguishing migraine from tension-type headaches.


2022 ◽  
Vol 14 (2) ◽  
pp. 307
Author(s):  
Guillaume Brunier ◽  
Simon Oiry ◽  
Yves Gruet ◽  
Stanislas F. Dubois ◽  
Laurent Barillé

In temperate coastal regions of Western Europe, the polychaete Sabellaria alveolata (Linné) builds large intertidal reefs of several hectares on soft-bottom substrates. These reefs are protected by the European Habitat Directive EEC/92/43 under the status of biogenic structures hosting a high biodiversity and providing ecological functions such as protection against coastal erosion. As an alternative to time-consuming field campaigns, a UAV-based Structure-from-Motion photogrammetric survey was carried out in October 2020 over Noirmoutier Island (France) where the second-largest known European reef is located in a tidal delta. A DJI Phantom 4 Multispectral UAV provided a topographic dataset at very high resolutions of 5 cm/pixel for the Digital Surface Model (DSM) and 2.63 cm/pixel for the multispectral orthomosaic images. The reef footprint was mapped using a combination of two topographic indices: the Topographic Openness Index and the Topographic Position Index. The reef structures covered an area of 8.15 ha, with 89% corresponding to the main reef composed of connected and continuous biogenic structures, 7.6% of large isolated structures (<60 m2), and 4.4% of small isolated reef clumps (<2 m2). To further describe the topographic complexity of the reef, the Geomorphon landform classification was used. The spatial distribution of tabular platforms considered as a healthy stage of the reef in contrast to a degraded stage was mapped with a proxy that consists in comparing the reef volume to a theoretical tabular-shaped reef volume. Epibionts colonizing the reef (macroalgae, mussels, and oysters) were also mapped by combining multispectral indices such as the Normalised Difference Vegetation Index and simple band ratios with topographic indices. A confusion matrix showed that macroalgae and mussels were satisfactorily identified but that oysters could not be detected by an automated procedure due to their spectral complexity. The topographic indices used in this work should now be further exploited to propose a health index for these large intertidal reefs.


2022 ◽  
Vol 14 (1) ◽  
pp. 232
Author(s):  
Defu Zou ◽  
Lin Zhao ◽  
Guangyue Liu ◽  
Erji Du ◽  
Guojie Hu ◽  
...  

An accurate and detailed vegetation map is of crucial significance for understanding the spatial heterogeneity of subsurfaces, which can help to characterize the thermal state of permafrost. The absence of an alpine swamp meadow (ASM) type, or an insufficient resolution (usually km-level) to capture the spatial distribution of the ASM, greatly limits the availability of existing vegetation maps in permafrost modeling of the Qinghai-Tibet Plateau (QTP). This study generated a map of the vegetation type at a spatial resolution of 30 m on the central QTP. The random forest (RF) classification approach was employed to map the vegetation based on 319 ground-truth samples, combined with a set of input variables derived from the visible, infrared, and thermal Landsat-8 images. Validation using a train-test split (i.e., 70% of the samples were randomly selected to train the RF model, while the remaining 30% were used for validation and a total of 1000 runs) showed that the average overall accuracy and Kappa coefficient of the RF approach were 0.78 (0.68–0.85) and 0.69 (0.64–0.74), respectively. The confusion matrix showed that the overall accuracy and Kappa coefficient of the predicted vegetation map reached 0.848 (0.844–0.852) and 0.790 (0.785–0.796), respectively. The user accuracies for the ASM, alpine meadow, alpine steppe, and alpine desert were 95.0%, 83.3%, 82.4%, and 86.7%, respectively. The most important variables for vegetation type prediction were two vegetation indices, i.e., NDVI and EVI. The surface reflectance of visible and shortwave infrared bands showed a secondary contribution, and the brightness temperature and the surface temperature of the thermal infrared bands showed little contribution. The dominant vegetation in the study area is alpine steppe and alpine desert. The results of this study can provide an accurate and detailed vegetation map, especially for the distribution of the ASM, which can help to improve further permafrost studies.


Author(s):  
Haviluddin Haviluddin ◽  
Edy Budiman ◽  
Rendy Ramadhan

Insurance product offerings are not always understood by prospective customers (CN) due to limited information related to products. This can cause confusion so that CN does not want to buy it. The purpose of this study is to analyze the selection of insurance products PT. AIA Financial Samarinda, East Kalimantan, Indonesia uses the Analytical Hierarchy Process (AHP) and Multi Objective Optimization on the Basis of Ratio Analysis (MOORA) approach so that CNs can choose based on insurance product facilities that match their abilities. In this study, as many as 10 types of insurance products and 10 CN criteria were then analyzed based on the two methods used. Then, the calculation accuracy of the two methods has been using the confusion matrix (CM) method. Based on the results of CM calculations from 27 CN datasets with a conformity level of 81.5%, it has been obtained which indicates that the two methods can be implemented as an alternative in choosing insurance products according to ability or based on CN criteria. The results show that this method is quite effective, efficient and relatively easy to use in determining insurance products that meet the criteria or according to CN's economic capabilities


2022 ◽  
Vol 22 (1) ◽  
Author(s):  
Khadijeh Moulaei ◽  
Mostafa Shanbehzadeh ◽  
Zahra Mohammadi-Taghiabad ◽  
Hadi Kazemi-Arpanahi

Abstract Background The coronavirus disease (COVID-19) hospitalized patients are always at risk of death. Machine learning (ML) algorithms can be used as a potential solution for predicting mortality in COVID-19 hospitalized patients. So, our study aimed to compare several ML algorithms to predict the COVID-19 mortality using the patient’s data at the first time of admission and choose the best performing algorithm as a predictive tool for decision-making. Methods In this study, after feature selection, based on the confirmed predictors, information about 1500 eligible patients (1386 survivors and 144 deaths) obtained from the registry of Ayatollah Taleghani Hospital, Abadan city, Iran, was extracted. Afterwards, several ML algorithms were trained to predict COVID-19 mortality. Finally, to assess the models’ performance, the metrics derived from the confusion matrix were calculated. Results The study participants were 1500 patients; the number of men was found to be higher than that of women (836 vs. 664) and the median age was 57.25 years old (interquartile 18–100). After performing the feature selection, out of 38 features, dyspnea, ICU admission, and oxygen therapy were found as the top three predictors. Smoking, alanine aminotransferase, and platelet count were found to be the three lowest predictors of COVID-19 mortality. Experimental results demonstrated that random forest (RF) had better performance than other ML algorithms with accuracy, sensitivity, precision, specificity, and receiver operating characteristic (ROC) of 95.03%, 90.70%, 94.23%, 95.10%, and 99.02%, respectively. Conclusion It was found that ML enables a reasonable level of accuracy in predicting the COVID-19 mortality. Therefore, ML-based predictive models, particularly the RF algorithm, potentially facilitate identifying the patients who are at high risk of mortality and inform proper interventions by the clinicians.


2022 ◽  
Author(s):  
Darlin Apasrawirote ◽  
Pharinya Boonchai ◽  
Paisarn Muneesawang ◽  
Wannacha Nakhonkam ◽  
Nophawan Bunchu

Abstract Forensic entomology is the branch of forensic science that is related to using arthropod specimens found in legal issues. Fly maggots are one of crucial pieces of evidence that can be used for estimating post-mortem intervals worldwide. However, the species-level identification of fly maggots is difficult, time consuming, and requires specialized taxonomic training. In this work, a novel method for the identification of different forensically-important fly species is proposed using convolutional neural networks (CNNs). The data used for the experiment were obtained from a digital camera connected to a compound microscope. We compared the performance of four widely used models that vary in complexity of architecture to evaluate tradeoffs in accuracy and speed for species classification including ResNet-101, Densenet161, Vgg19_bn, and AlexNet. In the validation step, all of the studied models provided 100% accuracy for identifying maggots of 4 species including Chrysomya megacephala (Diptera: Calliphoridae), Chrysomya (Achoetandrus) rufifacies (Diptera: Calliphoridae), Lucilia cuprina (Diptera: Calliphoridae), and Musca domestica (Diptera: Muscidae) based on images of posterior spiracles. However, AlexNet showed the fastest speed to process the identification model and presented a good balance between performance and speed. Therefore, the AlexNet model was selected for the testing step. The results of the confusion matrix of AlexNet showed that misclassification was found between C. megacephala and C. (Achoetandrus) rufifacies as well as between C. megacephala and L. cuprina. No misclassification was found for M. domestica. In addition, we created a web-application platform called thefly.ai to help users identify species of fly maggots in their own images using our classification model. The results from this study can be applied to identify further species by using other types of images. This model can also be used in the development of identification features in mobile applications. This study is a crucial step for integrating information from biology and AI-technology to develop a novel platform for use in forensic investigation.


Author(s):  
Omar Freddy Chamorro-Atalaya ◽  
Guillermo Morales Romero ◽  
Adrián Quispe Andía ◽  
Beatriz Caycho Salas ◽  
Elizabeth Katerin Auqui Ramos ◽  
...  

The objective of this study is to analyze and discuss the metrics of the predictive model using the K-nearest neighbor (K-NN) learning algorithm, which will be applied to the data on the perception of engineering students on the quality of the virtual administrative service, such as part of the methodology was analyzed the indicators of accuracy, precision, sensitivity and specificity, from the obtaining of the confusion matrix and the receiver operational characteristic (ROC) curve. The collected data were validated through Cronbach's Alpha, finding consistency values higher than 0.9, which allows to continue with the analysis. Through the predictive model through the Matlab R2021a software, it was concluded that the average metrics for all classes are optimal, presenting a precision of 92.77%, sensitivity 86.62%, and specificity 94.7%; with a total accuracy of 85.5%. In turn, the highest level of the area under the curve (AUC) is 0.98, which is why it is considered an optimal predictive model. Having carried out this study, it is possible to contribute significantly to the decision-making of the higher institution in relation to the improvement of the quality of the virtual administrative service.


Sign in / Sign up

Export Citation Format

Share Document