Applying AdaBoost to Improve Diagnostic Accuracy

Methodology ◽  
2019 ◽  
Vol 15 (2) ◽  
pp. 77-87 ◽  
Author(s):  
Zhehan Jiang ◽  
Kevin Walker ◽  
Dexin Shi

Abstract. Cognitive diagnostic modeling has been adopted to support various diagnostic measuring processes. Specifically, this approach allows practitioners and/or researchers to investigate an individual’s status with regard to certain latent variables of interest. However, the diagnostic information provided by traditional estimation approaches often suffers from low accuracy, especially under small sample conditions. This paper adopts an AdaBoost technique, popular in the field of machine learning, to estimate latent variables. Further, the proposed approach involves the construction of a simple iterative algorithm that is based upon the AdaBoost technique – such that the area under the curve (AUC) is minimized. The algorithmic details are elaborated via pseudo codes with line-to-line verbal explanations. Simulation studies were conducted such that the improvement of latent variable estimates via the proposed approach can be examined.

2021 ◽  
Vol 8 ◽  
Author(s):  
Kumiko Tanaka ◽  
Taka-aki Nakada ◽  
Nozomi Takahashi ◽  
Takahiro Dozono ◽  
Yuichiro Yoshimura ◽  
...  

Purpose: Portable chest radiographs are diagnostically indispensable in intensive care units (ICU). This study aimed to determine if the proposed machine learning technique increased in accuracy as the number of radiograph readings increased and if it was accurate in a clinical setting.Methods: Two independent data sets of portable chest radiographs (n = 380, a single Japanese hospital; n = 1,720, The National Institution of Health [NIH] ChestX-ray8 dataset) were analyzed. Each data set was divided training data and study data. Images were classified as atelectasis, pleural effusion, pneumonia, or no emergency. DenseNet-121, as a pre-trained deep convolutional neural network was used and ensemble learning was performed on the best-performing algorithms. Diagnostic accuracy and processing time were compared to those of ICU physicians.Results: In the single Japanese hospital data, the area under the curve (AUC) of diagnostic accuracy was 0.768. The area under the curve (AUC) of diagnostic accuracy significantly improved as the number of radiograph readings increased from 25 to 100% in the NIH data set. The AUC was higher than 0.9 for all categories toward the end of training with a large sample size. The time to complete 53 radiographs by machine learning was 70 times faster than the time taken by ICU physicians (9.66 s vs. 12 min). The diagnostic accuracy was higher by machine learning than by ICU physicians in most categories (atelectasis, AUC 0.744 vs. 0.555, P < 0.05; pleural effusion, 0.856 vs. 0.706, P < 0.01; pneumonia, 0.720 vs. 0.744, P = 0.88; no emergency, 0.751 vs. 0.698, P = 0.47).Conclusions: We developed an automatic detection system for portable chest radiographs in ICU setting; its performance was superior and quite faster than ICU physicians.


2020 ◽  
Author(s):  
Chansik An ◽  
Yae Won Park ◽  
Sung Soo Ahn ◽  
Kyunghwa Han ◽  
Hwiyoung Kim ◽  
...  

Abstract Objective: This study aims to determine how randomly splitting a dataset into training and test sets affects the estimated performance of a machine learning model under different conditions, using real-world brain tumor radiomics data.Materials and Methods: We conducted two classification tasks of different difficulty levels with magnetic resonance imaging (MRI) radiomics features: (1) “Simple” task, glioblastomas [n=109] vs. brain metastasis [n=58] and (2) “difficult” task, low- [n=163] vs. high-grade [n=95] meningiomas. Additionally, two undersampled datasets were created by randomly sampling 50% from these datasets. We performed random training-test set splitting for each dataset repeatedly to create 1,000 different training and test set pairs. For each dataset pair, the least absolute shrinkage and selection operator model was trained by five-fold cross-validation (CV) or nested CV with or without repetitions in the training set and tested with the test set, using the area under the curve (AUC) as an evaluation metric.Results: The AUCs in CV and testing varied widely based on data composition, especially with the undersampled datasets and the difficult task. The mean (±standard deviation) AUC difference between CV and testing was 0.029 (±0.022) for the simple task without undersampling and 0.108 (±0.079) for the difficult task with undersampling. In a training-test set pair, the AUC was high in CV but much lower in testing (0.840 and 0.650, respectively); in another dataset pair with the same task, however, the AUC was low in CV but much higher in testing (0.702 and 0.836, respectively). None of the CV methods helped overcome this issue.Conclusions: Machine learning after a single random training-test set split may lead to unreliable results in radiomics studies, especially when the sample size is small.


2021 ◽  
Vol 12 (1) ◽  
pp. 109-142
Author(s):  
Kengo Kato ◽  
Yuya Sasaki ◽  
Takuya Ura

Kotlarski's identity has been widely used in applied economic research based on repeated‐measurement or panel models with latent variables. However, how to conduct inference for these models has been an open question for two decades. This paper addresses this open problem by constructing a novel confidence band for the density function of a latent variable in repeated measurement error model. The confidence band builds on our finding that we can rewrite Kotlarski's identity as a system of linear moment restrictions. Our approach is robust in that we do not require the completeness. The confidence band controls the asymptotic size uniformly over a class of data generating processes, and it is consistent against all fixed alternatives. Simulation studies support our theoretical results.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0256152
Author(s):  
Chansik An ◽  
Yae Won Park ◽  
Sung Soo Ahn ◽  
Kyunghwa Han ◽  
Hwiyoung Kim ◽  
...  

This study aims to determine how randomly splitting a dataset into training and test sets affects the estimated performance of a machine learning model and its gap from the test performance under different conditions, using real-world brain tumor radiomics data. We conducted two classification tasks of different difficulty levels with magnetic resonance imaging (MRI) radiomics features: (1) “Simple” task, glioblastomas [n = 109] vs. brain metastasis [n = 58] and (2) “difficult” task, low- [n = 163] vs. high-grade [n = 95] meningiomas. Additionally, two undersampled datasets were created by randomly sampling 50% from these datasets. We performed random training-test set splitting for each dataset repeatedly to create 1,000 different training-test set pairs. For each dataset pair, the least absolute shrinkage and selection operator model was trained and evaluated using various validation methods in the training set, and tested in the test set, using the area under the curve (AUC) as an evaluation metric. The AUCs in training and testing varied among different training-test set pairs, especially with the undersampled datasets and the difficult task. The mean (±standard deviation) AUC difference between training and testing was 0.039 (±0.032) for the simple task without undersampling and 0.092 (±0.071) for the difficult task with undersampling. In a training-test set pair with the difficult task without undersampling, for example, the AUC was high in training but much lower in testing (0.882 and 0.667, respectively); in another dataset pair with the same task, however, the AUC was low in training but much higher in testing (0.709 and 0.911, respectively). When the AUC discrepancy between training and test, or generalization gap, was large, none of the validation methods helped sufficiently reduce the generalization gap. Our results suggest that machine learning after a single random training-test set split may lead to unreliable results in radiomics studies especially with small sample sizes.


VASA ◽  
2019 ◽  
Vol 48 (6) ◽  
pp. 516-522 ◽  
Author(s):  
Verena Mayr ◽  
Mirko Hirschl ◽  
Peter Klein-Weigel ◽  
Luka Girardi ◽  
Michael Kundi

Summary. Background: For diagnosis of peripheral arterial occlusive disease (PAD), a Doppler-based ankle-brachial-index (dABI) is recommended as the first non-invasive measurement. Due to limitations of dABI, oscillometry might be used as an alternative. The aim of our study was to investigate whether a semi-automatic, four-point oscillometric device provides comparable diagnostic accuracy. Furthermore, time requirements and patient preferences were evaluated. Patients and methods: 286 patients were recruited for the study; 140 without and 146 with PAD. The Doppler-based (dABI) and oscillometric (oABI and pulse wave index – PWI) measurements were performed on the same day in a randomized cross-over design. Specificity and sensitivity against verified PAD diagnosis were computed and compared by McNemar tests. ROC analyses were performed and areas under the curve were compared by non-parametric methods. Results: oABI had significantly lower sensitivity (65.8%, 95% CI: 59.2%–71.9%) compared to dABI (87.3%, CI: 81.9–91.3%) but significantly higher specificity (79.7%, 74.7–83.9% vs. 67.0%, 61.3–72.2%). PWI had a comparable sensitivity to dABI. The combination of oABI and PWI had the highest sensitivity (88.8%, 85.7–91.4%). ROC analysis revealed that PWI had the largest area under the curve, but no significant differences between oABI and dABI were observed. Time requirement for oABI was significantly shorter by about 5 min and significantly more patients would prefer oABI for future testing. Conclusions: Semi-automatic oABI measurements using the AngER-device provide comparable diagnostic results to the conventional Doppler method while PWI performed best. The time saved by oscillometry could be important, especially in high volume centers and epidemiologic studies.


Methodology ◽  
2011 ◽  
Vol 7 (4) ◽  
pp. 157-164
Author(s):  
Karl Schweizer

Probability-based and measurement-related hypotheses for confirmatory factor analysis of repeated-measures data are investigated. Such hypotheses comprise precise assumptions concerning the relationships among the true components associated with the levels of the design or the items of the measure. Measurement-related hypotheses concentrate on the assumed processes, as, for example, transformation and memory processes, and represent treatment-dependent differences in processing. In contrast, probability-based hypotheses provide the opportunity to consider probabilities as outcome predictions that summarize the effects of various influences. The prediction of performance guided by inexact cues serves as an example. In the empirical part of this paper probability-based and measurement-related hypotheses are applied to working-memory data. Latent variables according to both hypotheses contribute to a good model fit. The best model fit is achieved for the model including latent variables that represented serial cognitive processing and performance according to inexact cues in combination with a latent variable for subsidiary processes.


2019 ◽  
Author(s):  
Kevin Constante ◽  
Edward Huntley ◽  
Emma Schillinger ◽  
Christine Wagner ◽  
Daniel Keating

Background: Although family behaviors are known to be important for buffering youth against substance use, research in this area often evaluates a particular type of family interaction and how it shapes adolescents’ behaviors, when it is likely that youth experience the co-occurrence of multiple types of family behaviors that may be protective. Methods: The current study (N = 1716, 10th and 12th graders, 55% female) examined associations between protective family context, a latent variable comprised of five different measures of family behaviors, and past 12 months substance use: alcohol, cigarettes, marijuana, and e-cigarettes. Results: A multi-group measurement invariance assessment supported protective family context as a coherent latent construct with partial (metric) measurement invariance among Black, Latinx, and White youth. A multi-group path model indicated that protective family context was significantly associated with less substance use for all youth, but of varying magnitudes across ethnic-racial groups. Conclusion: These results emphasize the importance of evaluating psychometric properties of family-relevant latent variables on the basis of group membership in order to draw appropriate inferences on how such family variables relate to substance use among diverse samples.


2021 ◽  
Vol 20 ◽  
pp. 153303382110119
Author(s):  
Wen-Ting Zhang ◽  
Guo-Xun Zhang ◽  
Shuai-Shuai Gao

Background: Leukemia is a common malignant disease in the human blood system. Many researchers have proposed circulating microRNAs as biomarkers for the diagnosis of leukemia. We conducted a meta-analysis to evaluate the diagnostic accuracy of circulating miRNAs in the diagnosis of leukemia. Methods: A comprehensive literature search (updated to October 13, 2020) in PubMed, EMBASE, Web of Science, Cochrane Library, Wanfang database and China National Knowledge Infrastructure (CNKI) was performed to identify eligible studies. The sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR), diagnostic odds ratio (DOR), and area under the curve (AUC) for diagnosing leukemia were pooled for both overall and subgroup analysis. The meta-regression and subgroup analysis were performed to explore heterogeneity and Deeks’ funnel plot was used to assess publication bias. Results: 49 studies from 22 publications with a total of 3,489 leukemia patients and 2,756 healthy controls were included in this meta-analysis. The overall sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, diagnostic odds ratio and area under the curve were 0.83, 0.92, 10.8, 0.18, 59 and 0.94, respectively. Subgroup analysis shows that the microRNA clusters of plasma type could carry out a better diagnostic accuracy of leukemia patients. In addition, publication bias was not found. Conclusions: Circulating microRNAs can be used as a promising noninvasive biomarker in the early diagnosis of leukemia.


2021 ◽  
Vol 10 (5) ◽  
pp. 992
Author(s):  
Martina Barchitta ◽  
Andrea Maugeri ◽  
Giuliana Favara ◽  
Paolo Marco Riela ◽  
Giovanni Gallo ◽  
...  

Patients in intensive care units (ICUs) were at higher risk of worsen prognosis and mortality. Here, we aimed to evaluate the ability of the Simplified Acute Physiology Score (SAPS II) to predict the risk of 7-day mortality, and to test a machine learning algorithm which combines the SAPS II with additional patients’ characteristics at ICU admission. We used data from the “Italian Nosocomial Infections Surveillance in Intensive Care Units” network. Support Vector Machines (SVM) algorithm was used to classify 3782 patients according to sex, patient’s origin, type of ICU admission, non-surgical treatment for acute coronary disease, surgical intervention, SAPS II, presence of invasive devices, trauma, impaired immunity, antibiotic therapy and onset of HAI. The accuracy of SAPS II for predicting patients who died from those who did not was 69.3%, with an Area Under the Curve (AUC) of 0.678. Using the SVM algorithm, instead, we achieved an accuracy of 83.5% and AUC of 0.896. Notably, SAPS II was the variable that weighted more on the model and its removal resulted in an AUC of 0.653 and an accuracy of 68.4%. Overall, these findings suggest the present SVM model as a useful tool to early predict patients at higher risk of death at ICU admission.


Sign in / Sign up

Export Citation Format

Share Document