Extreme Gradient Boosting for Parkinson’s Disease Diagnosis from Voice Recordings

2020 ◽  
Author(s):  
Ibrahim Karabayir ◽  
Suguna Pappu ◽  
Samuel Goldman ◽  
Oguz Akbilgic

Abstract Background : Parkinson’s Disease (PD) is a clinically diagnosed neurodegenerative disorder that affects both motor and non-motor neural circuits. Speech deterioration (hypokinetic dysarthria) is a common symptom, which often presents early in the disease course. Machine learning can help movement disorders specialists improve their diagnostic accuracy using non-invasive and inexpensive voice recordings. Method : We used “Parkinson Dataset with Replicated Acoustic Features Data Set” from the UCI-Machine Learning repository. The dataset included 45 features including sex and 44 speech test based acoustic features from 40 patients with Parkinson’s disease and 40 controls. We analyzed the data using various machine learning algorithms including tree-based ensemble approaches such as random forest and extreme gradient boosting. We also implemented a variable importance analysis to identify important variables classifying patients with PD. Results : The cohort included total of 80 subjects; 40 patients with PD (55% men) and 40 controls (67.5% men). PD patients showed at least two of the three symptoms; resting tremor, bradykinesia, or rigidity. All patients were over 50 years old and the mean age for PD subjects and controls were 69.6 (SD 7.8) and 66.4 (SD 8.4), respectively. Our final model provided an AUC of 0.940 with 95% confidence interval 0.935-0.945in 4-folds cross validation using only six acoustic features including Delta3 (Run2), Delta0 (Run 3), MFCC4 (Run 2), Delta10 (Run 2/Run 3), MFCC10 (Run 2) and Jitter_Rap (Run 1/Run 2). Conclusions : Machine learning can accurately detect Parkinson’s disease using an inexpensive and non-invasive voice recording. Such technologies can be deployed into smartphones for screening of large patient populations for Parkinson’s disease.

2020 ◽  
Author(s):  
Ibrahim Karabayir ◽  
Samuel Goldman ◽  
Suguna Pappu ◽  
Oguz Akbilgic

Abstract Background: Parkinson’s Disease (PD) is a clinically diagnosed neurodegenerative disorder that affects both motor and non-motor neural circuits. Speech deterioration (hypokinetic dysarthria) is a common symptom, which often presents early in the disease course. Machine learning can help movement disorders specialists improve their diagnostic accuracy using non-invasive and inexpensive voice recordings.Method: We used “Parkinson Dataset with Replicated Acoustic Features Data Set” from the UCI-Machine Learning repository. The dataset included 44 speech-test based acoustic features from patients with PD and controls. We analyzed the data using various machine learning algorithms including Light and Extreme Gradient Boosting, Random Forest, Support Vector Machines, K-nearest neighborhood, Least Absolute Shrinkage and Selection Operator Regression, as well as logistic regression. We also implemented a variable importance analysis to identify important variables classifying patients with PD. Results: The cohort included a total of 80 subjects: 40 patients with PD (55% men) and 40 controls (67.5% men). Disease duration was 5 years or less for all subjects, with a mean Unified Parkinson’s Disease Rating Scale (UPDRS) score of 19.6 (SD 8.1), and none were taking PD medication. The mean age for PD subjects and controls was 69.6 (SD 7.8) and 66.4 (SD 8.4), respectively. Our best-performing model used Light Gradient Boosting to provide an AUC of 0.951 with 95% confidence interval 0.946-0.955 in 4-fold cross validation using only seven acoustic features.Conclusions: Machine learning can accurately detect Parkinson’s disease using an inexpensive and non-invasive voice recording. Light Gradient Boosting outperformed other machine learning algorithms. Such approaches could be used to inexpensively screen large patient populations for Parkinson’s disease.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Ibrahim Karabayir ◽  
Samuel M. Goldman ◽  
Suguna Pappu ◽  
Oguz Akbilgic

Abstract Background Parkinson’s Disease (PD) is a clinically diagnosed neurodegenerative disorder that affects both motor and non-motor neural circuits. Speech deterioration (hypokinetic dysarthria) is a common symptom, which often presents early in the disease course. Machine learning can help movement disorders specialists improve their diagnostic accuracy using non-invasive and inexpensive voice recordings. Method We used “Parkinson Dataset with Replicated Acoustic Features Data Set” from the UCI-Machine Learning repository. The dataset included 44 speech-test based acoustic features from patients with PD and controls. We analyzed the data using various machine learning algorithms including Light and Extreme Gradient Boosting, Random Forest, Support Vector Machines, K-nearest neighborhood, Least Absolute Shrinkage and Selection Operator Regression, as well as logistic regression. We also implemented a variable importance analysis to identify important variables classifying patients with PD. Results The cohort included a total of 80 subjects: 40 patients with PD (55% men) and 40 controls (67.5% men). Disease duration was 5 years or less for all subjects, with a mean Unified Parkinson’s Disease Rating Scale (UPDRS) score of 19.6 (SD 8.1), and none were taking PD medication. The mean age for PD subjects and controls was 69.6 (SD 7.8) and 66.4 (SD 8.4), respectively. Our best-performing model used Light Gradient Boosting to provide an AUC of 0.951 with 95% confidence interval 0.946–0.955 in 4-fold cross validation using only seven acoustic features. Conclusions Machine learning can accurately detect Parkinson’s disease using an inexpensive and non-invasive voice recording. Light Gradient Boosting outperformed other machine learning algorithms. Such approaches could be used to inexpensively screen large patient populations for Parkinson’s disease.


2020 ◽  
Author(s):  
Ibrahim Karabayir ◽  
Samuel Goldman ◽  
Suguna Pappu ◽  
Oguz Akbilgic

Abstract Background: Parkinson’s Disease (PD) is a clinically diagnosed neurodegenerative disorder that affects both motor and non-motor neural circuits. Speech deterioration (hypokinetic dysarthria) is a common symptom, which often presents early in the disease course. Machine learning can help movement disorders specialists improve their diagnostic accuracy using non-invasive and inexpensive voice recordings.Method: We used “Parkinson Dataset with Replicated Acoustic Features Data Set” from the UCI-Machine Learning repository. The dataset included 44 speech-test based acoustic features from patients with PD and controls. We analyzed the data using various machine learning algorithms including Light and Extreme Gradient Boosting, Random Forest, Support Vector Machines, K-nearest neighborhood, Least Absolute Shrinkage and Selection Operator Regression, as well as logistic regression. We also implemented a variable importance analysis to identify important variables classifying patients with PD. Results: The cohort included a total of 80 subjects: 40 patients with PD (55% men) and 40 controls (67.5% men). Disease duration was 5 years or less for all subjects, with a mean Unified Parkinson’s Disease Rating Scale (UPDRS) score of 19.6 (SD 8.1), and none were taking PD medication. The mean age for PD subjects and controls was 69.6 (SD 7.8) and 66.4 (SD 8.4), respectively. Our best-performing model used Light Gradient Boosting to provide an AUC of 0.951 with 95% confidence interval 0.946-0.955 in 4-fold cross validation using only seven acoustic features.Conclusions: Machine learning can accurately detect Parkinson’s disease using an inexpensive and non-invasive voice recording. Light Gradient Boosting outperformed other machine learning algorithms. Such approaches could be used to inexpensively screen large patient populations for Parkinson’s disease.


2021 ◽  
Vol 8 ◽  
Author(s):  
Jiang Zhu ◽  
Jinxin Zheng ◽  
Longfei Li ◽  
Rui Huang ◽  
Haoyu Ren ◽  
...  

Purpose: While there are no clear indications of whether central lymph node dissection is necessary in patients with T1-T2, non-invasive, clinically uninvolved central neck lymph nodes papillary thyroid carcinoma (PTC), this study seeks to develop and validate models for predicting the risk of central lymph node metastasis (CLNM) in these patients based on machine learning algorithms.Methods: This is a retrospective study comprising 1,271 patients with T1-T2 stage, non-invasive, and clinically node negative (cN0) PTC who underwent surgery at the Department of Endocrine and Breast Surgery of The First Affiliated Hospital of Chongqing Medical University from February 1, 2016, to December 31, 2018. We applied six machine learning (ML) algorithms, including Logistic Regression (LR), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGBoost), Random Forest (RF), Decision Tree (DT), and Neural Network (NNET), coupled with preoperative clinical characteristics and intraoperative information to develop prediction models for CLNM. Among all the samples, 70% were randomly selected to train the models while the remaining 30% were used for validation. Indices like the area under the receiver operating characteristic (AUROC), sensitivity, specificity, and accuracy were calculated to test the models' performance.Results: The results showed that ~51.3% (652 out of 1,271) of the patients had pN1 disease. In multivariate logistic regression analyses, gender, tumor size and location, multifocality, age, and Delphian lymph node status were all independent predictors of CLNM. In predicting CLNM, six ML algorithms posted AUROC of 0.70–0.75, with the extreme gradient boosting (XGBoost) model standing out, registering 0.75. Thus, we employed the best-performing ML algorithm model and uploaded the results to a self-made online risk calculator to estimate an individual's probability of CLNM (https://jin63.shinyapps.io/ML_CLNM/).Conclusions: With the incorporation of preoperative and intraoperative risk factors, ML algorithms can achieve acceptable prediction of CLNM with Xgboost model performing the best. Our online risk calculator based on ML algorithm may help determine the optimal extent of initial surgical treatment for patients with T1-T2 stage, non-invasive, and clinically node negative PTC.


2021 ◽  
pp. 1-11
Author(s):  
Ibrahim Karabayir ◽  
Liam Butler ◽  
Samuel M. Goldman ◽  
Rishikesan Kamaleswaran ◽  
Fatma Gunturkun ◽  
...  

Background: Parkinson’s disease (PD) is a chronic, disabling neurodegenerative disorder. Objective: To predict a future diagnosis of PD using questionnaires and simple non-invasive clinical tests. Methods: Participants in the prospective Kuakini Honolulu-Asia Aging Study (HAAS) were evaluated biannually between 1995–2017 by PD experts using standard diagnostic criteria. Autopsies were sought on all deaths. We input simple clinical and risk factor variables into an ensemble-tree based machine learning algorithm and derived models to predict the probability of developing PD. We also investigated relationships of predictive models and neuropathologic features such as nigral neuron density. Results: The study sample included 292 subjects, 25 of whom developed PD within 3 years and 41 by 5 years. 116 (46%) of 251 subjects not diagnosed with PD underwent autopsy. Light Gradient Boosting Machine modeling of 12 predictors correctly classified a high proportion of individuals who developed PD within 3 years (area under the curve (AUC) 0.82, 95%CI 0.76–0.89) or 5 years (AUC 0.77, 95%CI 0.71–0.84). A large proportion of controls who were misclassified as PD had Lewy pathology at autopsy, including 79%of those who died within 3 years. PD probability estimates correlated inversely with nigral neuron density and were strongest in autopsies conducted within 3 years of index date (r = –0.57, p <  0.01). Conclusion: Machine learning can identify persons likely to develop PD during the prodromal period using questionnaires and simple non-invasive tests. Correlation with neuropathology suggests that true model accuracy may be considerably higher than estimates based solely on clinical diagnosis.


2021 ◽  
Vol 28 ◽  
Author(s):  
Annamaria Landolfi ◽  
Carlo Ricciardi ◽  
Leandro Donisi ◽  
Giuseppe Cesarelli ◽  
Jacopo Troisi ◽  
...  

Background:: Parkinson’s disease is the second most frequent neurodegenerative disorder. Its diagnosis is challenging and mainly relies on clinical aspects. At present, no biomarker is available to obtain a diagnosis of certainty in vivo. Objective:: The present review aims at describing machine learning algorithms as they have been variably applied to different aspects of Parkinson’s disease diagnosis and characterization. Methods:: A systematic search was conducted on PubMed in December 2019, resulting in 230 publications obtained with the following search query: “Machine Learning” “AND” “Parkinson Disease”. Results:: the obtained publications were divided into 6 categories, based on different application fields: “Gait Analysis - Motor Evaluation”, “Upper Limb Motor and Tremor Evaluation”, “Handwriting and typing evaluation”, “Speech and Phonation evaluation”, “Neuroimaging and Nuclear Medicine evaluation”, “Metabolomics application”, after excluding the papers of general topic. As a result, a total of 166 articles were analyzed, after elimination of papers written in languages other than English or not directly related to the selected topics. Conclusion:: Machine learning algorithms are computer-based statistical approaches which can be trained and are able to find common patterns from big amounts of data. The machine learning approaches can help clinicians in classifying patients according to several variables at the same time.


Diagnostics ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. 801
Author(s):  
Fahmida Haque ◽  
Mamun Bin Ibne Reaz ◽  
Muhammad Enamul Hoque Chowdhury ◽  
Geetika Srivastava ◽  
Sawal Hamid Md Ali ◽  
...  

Background: Diabetic peripheral neuropathy (DSPN), a major form of diabetic neuropathy, is a complication that arises in long-term diabetic patients. Even though the application of machine learning (ML) in disease diagnosis is a very common and well-established field of research, its application in diabetic peripheral neuropathy (DSPN) diagnosis using composite scoring techniques like Michigan Neuropathy Screening Instrumentation (MNSI), is very limited in the existing literature. Method: In this study, the MNSI data were collected from the Epidemiology of Diabetes Interventions and Complications (EDIC) clinical trials. Two different datasets with different MNSI variable combinations based on the results from the eXtreme Gradient Boosting feature ranking technique were used to analyze the performance of eight different conventional ML algorithms. Results: The random forest (RF) classifier outperformed other ML models for both datasets. However, all ML models showed almost perfect reliability based on Kappa statistics and a high correlation between the predicted output and actual class of the EDIC patients when all six MNSI variables were considered as inputs. Conclusions: This study suggests that the RF algorithm-based classifier using all MNSI variables can help to predict the DSPN severity which will help to enhance the medical facilities for diabetic patients.


10.2196/32771 ◽  
2021 ◽  
Vol 9 (10) ◽  
pp. e32771
Author(s):  
Seo Jeong Shin ◽  
Jungchan Park ◽  
Seung-Hwa Lee ◽  
Kwangmo Yang ◽  
Rae Woong Park

Background Myocardial injury after noncardiac surgery (MINS) is associated with increased postoperative mortality, but the relevant perioperative factors that contribute to the mortality of patients with MINS have not been fully evaluated. Objective To establish a comprehensive body of knowledge relating to patients with MINS, we researched the best performing predictive model based on machine learning algorithms. Methods Using clinical data from 7629 patients with MINS from the clinical data warehouse, we evaluated 8 machine learning algorithms for accuracy, precision, recall, F1 score, area under the receiver operating characteristic (AUROC) curve, and area under the precision-recall curve to investigate the best model for predicting mortality. Feature importance and Shapley Additive Explanations values were analyzed to explain the role of each clinical factor in patients with MINS. Results Extreme gradient boosting outperformed the other models. The model showed an AUROC of 0.923 (95% CI 0.916-0.930). The AUROC of the model did not decrease in the test data set (0.894, 95% CI 0.86-0.922; P=.06). Antiplatelet drugs prescription, elevated C-reactive protein level, and beta blocker prescription were associated with reduced 30-day mortality. Conclusions Predicting the mortality of patients with MINS was shown to be feasible using machine learning. By analyzing the impact of predictors, markers that should be cautiously monitored by clinicians may be identified.


2019 ◽  
Author(s):  
Kasper Van Mens ◽  
Joran Lokkerbol ◽  
Richard Janssen ◽  
Robert de Lange ◽  
Bea Tiemens

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.


Sign in / Sign up

Export Citation Format

Share Document