precision recall curve
Recently Published Documents


TOTAL DOCUMENTS

76
(FIVE YEARS 58)

H-INDEX

6
(FIVE YEARS 4)

2022 ◽  
Vol 9 (1) ◽  
Author(s):  
Joffrey L. Leevy ◽  
John Hancock ◽  
Taghi M. Khoshgoftaar ◽  
Jared M. Peterson

AbstractThe recent years have seen a proliferation of Internet of Things (IoT) devices and an associated security risk from an increasing volume of malicious traffic worldwide. For this reason, datasets such as Bot-IoT were created to train machine learning classifiers to identify attack traffic in IoT networks. In this study, we build predictive models with Bot-IoT to detect attacks represented by dataset instances from the Information Theft category, as well as dataset instances from the data exfiltration and keylogging subcategories. Our contribution is centered on the evaluation of ensemble feature selection techniques (FSTs) on classification performance for these specific attack instances. A group or ensemble of FSTs will often perform better than the best individual technique. The classifiers that we use are a diverse set of four ensemble learners (Light GBM, CatBoost, XGBoost, and random forest (RF)) and four non-ensemble learners (logistic regression (LR), decision tree (DT), Naive Bayes (NB), and a multi-layer perceptron (MLP)). The metrics used for evaluating classification performance are area under the receiver operating characteristic curve (AUC) and Area Under the precision-recall curve (AUPRC). For the most part, we determined that our ensemble FSTs do not affect classification performance but are beneficial because feature reduction eases computational burden and provides insight through improved data visualization.


2021 ◽  
Author(s):  
Qiu-Xia Feng ◽  
Bo Tang ◽  
Xi-Sheng Liu

Abstract Background: The study aimed to evaluate the diagnostic performance of machine learning-based CT radiomics models for predicting the recurrence and metastasis of gastrointestinal stromal tumors (GISTs) preoperatively.Methods: A total of 382 patients with histopathological confirmed GISTs were retrospectively included. According to postoperative follow-up, patients were classified into non-recurrence and metastasis group (NRM) and recurrence or metastasis group (RM). Radiomics features were extracted from arterial and portal venous phase CT images. Four feature selection methods and ten machine learning techniques were used to train predicting models on training cohort with internal validation by 10-fold cross-validation. F1 score was used to evaluate the performance of the classification model. The best model of two phase were stacked to build an ensemble model. The area under the curve (AUC), recall, precision, accuracy, and F1 score were used to evaluate the performance of the models and compare with clinical criteria based on diameter.Results: Eighty machine learning models in two phases were built and the ensemble model was integrated by analysis of variance and Naive Bayes (ANOVA_NB) model in arterial phase which selected only 5 features provided the highest F1 Score of 0.560 and Kruskal Wallis and Adaptive Boosting (KW_ AdaBoost) model in venous phase which selected only 4 features provided the highest F1 Score of 0.500. The AUC of the generated ensemble model and the clinical criteria showed no difference (0.866 vs 0.857; DeLong Test, P = 0.865). But the ensemble model had higher accuracy (0.961), recall (0.826), precision (0.905), F1 Score (0.864), and the area under the Precision-Recall curve (0.774; 95%CI, 0.552 - 0.917), compared with clinical criteria, of which, the accuracy was 0.942, recall was 0.367, precision was 0.478, the F1 Score was 0.415 and the area under the Precision-Recall curve was 0.354(95%CI, 0.552 - 0.917).Conclusions: Our findings highlight the potential of machine learning techniques based on CT radiomics in the prediction of recurrence and metastasis of GISTs preoperatively.


2021 ◽  
Vol 33 (6) ◽  
pp. 1385-1397
Author(s):  
Leyuan Sun ◽  
Rohan P. Singh ◽  
Fumio Kanehiro ◽  
◽  
◽  
...  

Most simultaneous localization and mapping (SLAM) systems assume that SLAM is conducted in a static environment. When SLAM is used in dynamic environments, the accuracy of each part of the SLAM system is adversely affected. We term this problem as dynamic SLAM. In this study, we propose solutions for three main problems in dynamic SLAM: camera tracking, three-dimensional map reconstruction, and loop closure detection. We propose to employ geometry-based method, deep learning-based method, and the combination of them for object segmentation. Using the information from segmentation to generate the mask, we filter the keypoints that lead to errors in visual odometry and features extracted by the CNN from dynamic areas to improve the performance of loop closure detection. Then, we validate our proposed loop closure detection method using the precision-recall curve and also confirm the framework’s performance using multiple datasets. The absolute trajectory error and relative pose error are used as metrics to evaluate the accuracy of the proposed SLAM framework in comparison with state-of-the-art methods. The findings of this study can potentially improve the robustness of SLAM technology in situations where mobile robots work together with humans, while the object-based point cloud byproduct has potential for other robotics tasks.


2021 ◽  
Vol 12 ◽  
Author(s):  
Xiao-Ying Yan ◽  
Peng-Wei Yin ◽  
Xiao-Meng Wu ◽  
Jia-Xin Han

Drug combination therapies are a promising strategy to overcome drug resistance and improve the efficacy of monotherapy in cancer, and it has been shown to lead to a decrease in dose-related toxicities. Except the synergistic reaction between drugs, some antagonistic drug–drug interactions (DDIs) exist, which is the main cause of adverse drug events. Precisely predicting the type of DDI is important for both drug development and more effective drug combination therapy applications. Recently, numerous text mining– and machine learning–based methods have been developed for predicting DDIs. All these methods implicitly utilize the feature of drugs from diverse drug-related properties. However, how to integrate these features more efficiently and improve the accuracy of classification is still a challenge. In this paper, we proposed a novel method (called NMDADNN) to predict the DDI types by integrating five drug-related heterogeneous information sources to extract the unified drug mapping features. NMDADNN first constructs the similarity networks by using the Jaccard coefficient and then implements random walk with restart algorithm and positive pointwise mutual information for extracting the topological similarities. After that, five network-based similarities are unified by using a multimodel deep autoencoder. Finally, NMDADNN implements the deep neural network (DNN) on the unified drug feature to infer the types of DDIs. In comparison with other recent state-of-the-art DNN-based methods, NMDADNN achieves the best results in terms of accuracy, area under the precision-recall curve, area under the ROC curve, F1 score, precision and recall. In addition, many of the promising types of drug–drug pairs predicted by NMDADNN are also confirmed by using the interactions checker tool. These results demonstrate the effectiveness of our NMDADNN method, indicating that NMDADNN has the great potential for predicting DDI types.


2021 ◽  
Author(s):  
Qiu-Xia Feng ◽  
Lu-Lu Xu ◽  
Qiong Li ◽  
Xiao-Ting Jiang ◽  
Bo Tang ◽  
...  

Abstract Background The study aimed to evaluate the diagnostic performance of machine learning-based CT radiomics models for predicting the recurrence and metastasis of gastrointestinal stromal tumors (GISTs) preoperatively. Methods A total of 382 patients with histopathological confirmed GISTs were retrospectively included. According to postoperative follow-up, patients were classified into non-recurrence and metastasis group (NRM) and recurrence or metastasis group (RM). Radiomics features were extracted from arterial and portal venous phase CT images. Four feature selection methods and ten machine learning techniques were used to train predicting models on training cohort with internal validation by 10-fold cross-validation. F1 score was used to evaluate the performance of the classification model. The best model of two phase were stacked to build an ensemble model. The area under the curve (AUC), recall, precision, accuracy, and F1 score were used to evaluate the performance of the models and compare with clinical criteria based on diameter. Results Eighty machine learning models in two phases were built and the ensemble model was integrated by analysis of variance and Naive Bayes (ANOVA_NB) model in arterial phase which selected only 5 features provided the highest F1 Score of 0.560 and Kruskal Wallis and Adaptive Boosting (KW_ AdaBoost) model in venous phase which selected only 4 features provided the highest F1 Score of 0.500. The AUC of the generated ensemble model and the clinical criteria showed no difference (0.866 vs 0.857; DeLong Test, P = 0.865). But the ensemble model had higher accuracy (0.961), recall (0.826), precision (0.905), F1 Score (0.864), and the area under the Precision-Recall curve (0.774; 95%CI, 0.552 - 0.917), compared with clinical criteria, of which, the accuracy was 0.942, recall was 0.367, precision was 0.478, the F1 Score was 0.415 and the area under the Precision-Recall curve was 0.354(95%CI, 0.552 - 0.917). Conclusions Our findings highlight the potential of machine learning techniques based on CT radiomics in the prediction of recurrence and metastasis of GISTs preoperatively.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Shuaiqi Liu ◽  
Jingjie An ◽  
Jie Zhao ◽  
Shuhuan Zhao ◽  
Hui Lv ◽  
...  

Recently, in most existing studies, it is assumed that there are no interaction relationships between drugs and targets with unknown interactions. However, unknown interactions mean the relationships between drugs and targets have just not been confirmed. In this paper, samples for which the relationship between drugs and targets has not been determined are considered unlabeled. A weighted fusion method of multisource information is proposed to screen drug-target interactions. Firstly, some drug-target pairs which may have interactions are selected. Secondly, the selected drug-target pairs are added to the positive samples, which are regarded as known to have interaction relationships, and the original interaction relationship matrix is revised. Finally, the revised datasets are used to predict the interaction derived from the bipartite local model with neighbor-based interaction profile inferring (BLM-NII). Experiments demonstrate that the proposed method has greatly improved specificity, sensitivity, precision, and accuracy compared with the BLM-NII method. In addition, compared with several state-of-the-art methods, the area under the receiver operating characteristic curve (AUC) and the area under the precision-recall curve (AUPR) of the proposed method are excellent.


Author(s):  
Aaron S. Coyner ◽  
Jimmy S. Chen ◽  
Praveer Singh ◽  
Robert L. Schelonka ◽  
Brian K. Jordan ◽  
...  

BACKGROUND AND OBJECTIVES Retinopathy of prematurity (ROP) is a leading cause of childhood blindness. Screening and treatment reduces this risk, but requires multiple examinations of infants, most of whom will not develop severe disease. Previous work has suggested that artificial intelligence may be able to detect incident severe disease (treatment-requiring retinopathy of prematurity [TR-ROP]) before clinical diagnosis. We aimed to build a risk model that combined artificial intelligence with clinical demographics to reduce the number of examinations without missing cases of TR-ROP. METHODS Infants undergoing routine ROP screening examinations (1579 total eyes, 190 with TR-ROP) were recruited from 8 North American study centers. A vascular severity score (VSS) was derived from retinal fundus images obtained at 32 to 33 weeks’ postmenstrual age. Seven ElasticNet logistic regression models were trained on all combinations of birth weight, gestational age, and VSS. The area under the precision-recall curve was used to identify the highest-performing model. RESULTS The gestational age + VSS model had the highest performance (mean ± SD area under the precision-recall curve: 0.35 ± 0.11). On 2 different test data sets (n = 444 and n = 132), sensitivity was 100% (positive predictive value: 28.1% and 22.6%) and specificity was 48.9% and 80.8% (negative predictive value: 100.0%). CONCLUSIONS Using a single examination, this model identified all infants who developed TR-ROP, on average, >1 month before diagnosis with moderate to high specificity. This approach could lead to earlier identification of incident severe ROP, reducing late diagnosis and treatment while simultaneously reducing the number of ROP examinations and unnecessary physiologic stress for low-risk infants.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ruben Chevez-Guardado ◽  
Lourdes Peña-Castillo

AbstractPromoters are genomic regions where the transcription machinery binds to initiate the transcription of specific genes. Computational tools for identifying bacterial promoters have been around for decades. However, most of these tools were designed to recognize promoters in one or few bacterial species. Here, we present Promotech, a machine-learning-based method for promoter recognition in a wide range of bacterial species. We compare Promotech’s performance with the performance of five other promoter prediction methods. Promotech outperforms these other programs in terms of area under the precision-recall curve (AUPRC) or precision at the same level of recall. Promotech is available at https://github.com/BioinformaticsLabAtMUN/PromoTech.


Author(s):  
Saurav Mishra

Caused by the bite of the Anopheles mosquito infected with the parasite of genus Plasmodium, malaria has remained a major burden towards healthcare for years with an approximate 400,000 deaths reported globally every year. The traditional diagnosis process for malaria involves an examination of the blood smear slide under the microscope. This process is not only time consuming but also requires pathologists to be highly skilled in their work. Timely diagnosis and availability of robust diagnostic facilities and skilled laboratory technicians are very much vital to reduce the mortality rate. This study aims to build a robust system by applying deep learning techniques such as transfer learning and snapshot ensembling to automate the detection of the parasite in the thin blood smear images. All the models were evaluated against the following metrics - F1 score, Accuracy, Precision, Recall, Mathews Correlation Coefficient (MCC), Area Under the Receiver Operating Characteristics (AUC-ROC) and the Area under the Precision Recall curve (AUC-PR). The snapshot ensembling model created by combining the snapshots of the EfficientNet-B0 pre-trained model outperformed every other model achieving a f1 score - 99.37%, precision - 99.52% and recall - 99.23%. The results show the potential of  model ensembles which combine the predictive power of multiple weal models to create a single efficient model that is better equipped to handle the real world data. The GradCAM experiment displayed the gradient activation maps of the last convolution layer to visually explicate where and what a model sees in an image to classify them into a particular class. The models in this study correctly activate the stained parasitic region of interest in the thin blood smear images. Such visuals make the model more transparent, explainable, and trustworthy which are very much essential for deploying AI based models in the healthcare network.


Author(s):  
Saurav Mishra

Caused by the bite of the Anopheles mosquito infected with the parasite of genus Plasmodium, malaria has remained a major burden towards healthcare for years with an approximate 400,000 deaths reported globally every year. The traditional diagnosis process for malaria involves an examination of the blood smear slide under the microscope. This process is not only time consuming but also requires pathologists to be highly skilled in their work. Timely diagnosis and availability of robust diagnostic facilities and skilled laboratory technicians are very much vital to reduce the mortality rate. This study aims to build a robust system by applying deep learning techniques such as transfer learning and snapshot ensembling to automate the detection of the parasite in the thin blood smear images. All the models were evaluated against the following metrics - F1 score, Accuracy, Precision, Recall, Mathews Correlation Coefficient (MCC), Area Under the Receiver Operating Characteristics (AUC-ROC) and the Area under the Precision Recall curve (AUC-PR). The snapshot ensembling model created by combining the snapshots of the EfficientNet-B0 pre-trained model outperformed every other model achieving a f1 score - 99.37%, precision - 99.52% and recall - 99.23%. The results show the potential of  model ensembles which combine the predictive power of multiple weal models to create a single efficient model that is better equipped to handle the real world data. The GradCAM experiment displayed the gradient activation maps of the last convolution layer to visually explicate where and what a model sees in an image to classify them into a particular class. The models in this study correctly activate the stained parasitic region of interest in the thin blood smear images. Such visuals make the model more transparent, explainable, and trustworthy which are very much essential for deploying AI based models in the healthcare network.


Sign in / Sign up

Export Citation Format

Share Document