scholarly journals Comparison of Machine Learning Algorithms for Predictive Modeling of Beef Attributes Using Rapid Evaporative Ionization Mass Spectrometry (REIMS) Data

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Devin A. Gredell ◽  
Amelia R. Schroeder ◽  
Keith E. Belk ◽  
Corey D. Broeckling ◽  
Adam L. Heuberger ◽  
...  
Author(s):  
Anoop Kumar Tiwari ◽  
Abhigyan Nath ◽  
Karthikeyan Subbiah ◽  
Kaushal Kumar Shukla

Imbalanced dataset affects the learning of classifiers. This imbalance problem is almost ubiquitous in biological datasets. Resampling is one of the common methods to deal with the imbalanced dataset problem. In this study, we explore the learning performance by varying the balancing ratios of training datasets, consisting of the observed peptides and absent peptides in the Mass Spectrometry experiment on the different machine learning algorithms. It has been observed that the ideal balancing ratio has yielded better performance than the imbalanced dataset, but it was not the best as compared to some intermediate ratio. By experimenting using Synthetic Minority Oversampling Technique (SMOTE) at different balancing ratios, we obtained the best results by achieving sensitivity of 92.1%, specificity value of 94.7%, overall accuracy of 93.4%, MCC of 0.869, and AUC of 0.982 with boosted random forest algorithm. This study also identifies the most discriminating features by applying the feature ranking algorithm. From the results of current experiments, it can be inferred that the performance of machine learning algorithms for the classification tasks can be enhanced by selecting optimally balanced training dataset, which can be obtained by suitably modifying the class distribution.


Author(s):  
Xihu Yang ◽  
Xiaowei Song ◽  
Xudong Yang ◽  
Wei Han ◽  
Yong Fu ◽  
...  

Background: Oral squamous cell carcinoma (OSCC) accounts for 90 % of oral cancers. If a necessary intervention before tumorigenesis could be conducted, the current 60% 5-year survival rate would be expected to be majorly improved. This fact motivates the search for developing a highly sensitive and specific in vitro diagnostic method to conduct rapid OSCC screening. Method: Serum samples from 819 volunteers, consisted of 241 healthy contrast (HC) and 578 OSCC patients, were collected, and their metabolic profiles were acquired using conductive polymer spray ionization mass spectrometry (CPSI-MS). Univariate analysis was used to select significantly changed metabolite ions in the OSCC group compared to the HC group. Identities of these metabolite ions were determined by MS/MS experiments and reconfirmed at the tissue level by desorption electrospray ionization mass spectrometry (DESI-MS). The supporting vector machine (SVM) algorithm was employed as the machine learning model to implement the automatic prediction of OSCC. Results: Through statistical analysis, 65 metabolites were selected as potential characteristic marker candidates for serum OSCC screening. In situ validation by DESI-MSI revealed that 8 out of top 10 metabolites showed the same trends of change in tissue and serum. With the aid of machine learning, OSCC can be distinguished from HC with an accuracy of 98.0 % by cross-validation in the discovery cohort and 89.2% accuracy in the validation cohort. Furthermore, orthogonal partial least square-discriminant analysis (OPLS-DA) also showed the potential for recognizing OSCC stages. Conclusion: Using CPSI-MS combined with SVM, it is possible to distinguish OSCC from HC in a few minutes with high specificity and sensitivity, making this rapid diagnostic procedure a promising approach for high-risk population screening.


2021 ◽  
Vol 28 ◽  
Author(s):  
Dakila Ledesma ◽  
Steven Symes ◽  
Sean Richards

Background : The adoption of biomarkers as part of high-throughput, complex microarray or sequencing data has necessitated the discovery and validation of these data through machine learning. Machine learning has remained a fundamental and indispensable tool due to its efficacy and efficiency in both feature extraction of relevant biomarkers as well as the classification of samples as validation of the discovered biomarkers. Objectives : This review aims to present the impact and ability of various machine learning methodologies and models to process high-throughput, high-dimensionality data found within mass spectrometry, microarray, and DNA/RNA-sequence data; data that precluded biomarker discovery prior to the use of machine learning. Methods : A vast array of literature highlighting machine learning for biomarker discovery was reviewed, resulting in the eligibility of 21 machine learning algorithms/networks and 3 combinatory architectures, spanning 17 fields of study. This literature was screened to investigate the usage and development of machine learning within the framework of biomarker discovery. Results : Out of the 93 papers collected, a total of 62 biomarker studies were further reviewed across different subfields-49 of which employed machine learning algorithms, and 13 of which employed neural network-based models. Through application, innovation, and creation of tools in biomarker-related machine learning methodologies, its use allowed for the discovery, accumulation, validation, and interpretation of biomarkers within varied data formats, sources, as well as fields of study. Conclusion: The use of machine learning methodologies for biomarker discovery is critical to the analysis of various types of data used for biomarker discovery, such as mass spectrometry, nucleotide and protein sequencing, and image (e.g. CT-scan) data. Further studies containing more standardized techniques for evaluation, and the use of cutting-edge machine learning architectures may lead to more accurate and specific results.


Sign in / Sign up

Export Citation Format

Share Document