Characterization of Cancer Types by Applying Machine Learning Methods on Blood RNA-Sequencing Data

Abstract Background Alternative splicing (AS) plays critical roles in generating protein diversity and complexity. Dysregulation of AS underlies the initiation and progression of tumors. Machine learning approaches have emerged as efficient tools to identify promising biomarkers. It is meaningful to explore pivotal AS events (ASEs) to deepen understanding and improve prognostic assessments of lung adenocarcinoma (LUAD) via machine learning algorithms. Method RNA sequencing data and AS data were extracted from The Cancer Genome Atlas (TCGA) database and TCGA SpliceSeq database. Using several machine learning methods, we identified 24 pairs of LUAD-related ASEs implicated in splicing switches and a random forest-based classifiers for identifying lymph node metastasis (LNM) consisting of 12 ASEs. Furthermore, we identified key prognosis-related ASEs and established a 16-ASE-based prognostic model to predict overall survival for LUAD patients using Cox regression model, random survival forest analysis, and forward selection model. Bioinformatics analyses were also applied to identify underlying mechanisms and associated upstream splicing factors (SFs). Results Each pair of ASEs was spliced from the same parent gene, and exhibited perfect inverse intrapair correlation (correlation coefficient = − 1). The 12-ASE-based classifier showed robust ability to evaluate LNM status of LUAD patients with the area under the receiver operating characteristic (ROC) curve (AUC) more than 0.7 in fivefold cross-validation. The prognostic model performed well at 1, 3, 5, and 10 years in both the training cohort and internal test cohort. Univariate and multivariate Cox regression indicated the prognostic model could be used as an independent prognostic factor for patients with LUAD. Further analysis revealed correlations between the prognostic model and American Joint Committee on Cancer stage, T stage, N stage, and living status. The splicing network constructed of survival-related SFs and ASEs depicts regulatory relationships between them. Conclusion In summary, our study provides insight into LUAD researches and managements based on these AS biomarkers.

Download Full-text

Quantitative characterization of bovine serum albumin thin-films using terahertz spectroscopy and machine learning methods

Biomedical Optics Express ◽

10.1364/boe.9.002917 ◽

2018 ◽

Vol 9 (7) ◽

pp. 2917 ◽

Cited By ~ 7

Author(s):

Yiwen Sun ◽

Pengju Du ◽

Xingxing Lu ◽

Pengfei Xie ◽

Zhengfang Qian ◽

...

Keyword(s):

Machine Learning ◽

Thin Films ◽

Bovine Serum Albumin ◽

Serum Albumin ◽

Bovine Serum ◽

Terahertz Spectroscopy ◽

Quantitative Characterization ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Applying machine learning methods for characterization of hexagonal prisms from their 2D scattering patterns – an investigation using modelled scattering data

Journal of Quantitative Spectroscopy and Radiative Transfer ◽

10.1016/j.jqsrt.2017.07.001 ◽

2017 ◽

Vol 201 ◽

pp. 115-127 ◽

Cited By ~ 2

Author(s):

Emmanuel Oluwatobi Salawu ◽

Evelyn Hesse ◽

Chris Stopford ◽

Neil Davey ◽

Yi Sun

Keyword(s):

Machine Learning ◽

Scattering Data ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Prediction of acetylcholinesterase inhibitors and characterization of correlative molecular descriptors by machine learning methods

European Journal of Medicinal Chemistry ◽

10.1016/j.ejmech.2009.12.038 ◽

2010 ◽

Vol 45 (3) ◽

pp. 1167-1172 ◽

Cited By ~ 19

Author(s):

Wei Lv ◽

Ying Xue

Keyword(s):

Machine Learning ◽

Molecular Descriptors ◽

Acetylcholinesterase Inhibitors ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

Reproductive phasiRNAs in grasses are compositionally distinct from other classes of small RNAs

10.1101/242727 ◽

2018 ◽

Author(s):

Parth Patel ◽

Sandra Mathioni ◽

Atul Kakrana ◽

Hagit Shatkay ◽

Blake C. Meyers

Keyword(s):

Machine Learning ◽

Small Rnas ◽

Structural Features ◽

Classification Performance ◽

List Type ◽

Small Interfering Rnas ◽

Sequencing Data ◽

Learning Methods ◽

Specific Sequence ◽

Machine Learning Methods

Summary and keywordsLittle is known about the characteristics and function of reproductive phased, secondary, small interfering RNAs (phasiRNAs) in the Poaceae, despite the availability of significant genomic resources, experimental data, and a growing number of computational tools. We utilized machine-learning methods to identify sequence-based and structural features that distinguish phasiRNAs in rice and maize from other small RNAs (sRNAs).We developed Random Forest classifiers that can distinguish reproductive phasiRNAs from other sRNAs in complex sets of sequencing data, utilizing sequence-based (k-mers) and features describing position-specific sequence biases.The classification performance attained is >80% in accuracy, sensitivity, specificity, and positive predicted value. Feature selection identified important features in both ends of phasiRNAs. We demonstrated that phasiRNAs have strand specificity and position-specific nucleotide biases potentially influencing AGO sorting; we also predicted targets to infer functions of phasiRNAs, and computationally-assessed their sequence characteristics relative to other sRNAs.Our results demonstrate that machine-learning methods effectively identify phasiRNAs despite the lack of characteristic features typically present in precursor loci of other small RNAs, such as sequence conservation or structural motifs. The 5’-end features we identified provide insights into AGO-phasiRNA interactions; we describe a hypothetical model of competition for AGO loading between phasiRNAs of different nucleotide compositions.

Download Full-text

Recent Development of Computational Predicting Bioluminescent Proteins

Current Pharmaceutical Design ◽

10.2174/1381612825666191107100758 ◽

2020 ◽

Vol 25 (40) ◽

pp. 4264-4273 ◽

Cited By ~ 2

Author(s):

Dan Zhang ◽

Zheng-Xing Guan ◽

Zi-Mei Zhang ◽

Shi-Hao Li ◽

Fu-Ying Dao ◽

...

Keyword(s):

Machine Learning ◽

Light Emission ◽

Biotechnological Application ◽

Learning Methods ◽

Machine Learning Methods ◽

Technological Advances ◽

Living Organisms ◽

Role Of Light

Bioluminescent Proteins (BLPs) are widely distributed in many living organisms that act as a key role of light emission in bioluminescence. Bioluminescence serves various functions in finding food and protecting the organisms from predators. With the routine biotechnological application of bioluminescence, it is recognized to be essential for many medical, commercial and other general technological advances. Therefore, the prediction and characterization of BLPs are significant and can help to explore more secrets about bioluminescence and promote the development of application of bioluminescence. Since the experimental methods are money and time-consuming for BLPs identification, bioinformatics tools have played important role in fast and accurate prediction of BLPs by combining their sequences information with machine learning methods. In this review, we summarized and compared the application of machine learning methods in the prediction of BLPs from different aspects. We wish that this review will provide insights and inspirations for researches on BLPs.

Download Full-text

Machine Learning Methods as a Tool for Predicting Risk of Illness Applying Next‐Generation Sequencing Data

Risk Analysis ◽

10.1111/risa.13239 ◽

2018 ◽

Cited By ~ 4

Author(s):

Patrick Murigu Kamau Njage ◽

Clementine Henri ◽

Pimlapas Leekitcharoenphon ◽

Michel‐Yves Mistou ◽

Rene S. Hendriksen ◽

...

Keyword(s):

Machine Learning ◽

Next Generation Sequencing ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Generation Sequencing

Download Full-text

Machine Learning methods for a complete TGA analysis. Characterization of La1-xCaxNiO3±δ as catalyst precursors for dry methane reforming

10.26226/morressier.5f6c5f439b74b699bf390c72 ◽

2020 ◽

Author(s):

Jaime Gallego ◽

Andres Marulanda-Bran ◽

Jose C-Salazar

Keyword(s):

Machine Learning ◽

Methane Reforming ◽

Learning Methods ◽

Machine Learning Methods ◽

Tga Analysis

Download Full-text

Metagenomic Sequencing Analysis for Acne Using Machine Learning Methods Adapted to Single or Multiple Data

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/8008731 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Yu Wang ◽

Mengru Sun ◽

Yifan Duan

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Metagenomic Sequencing ◽

Sequencing Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Three Samples ◽

The Face

The human health status can be assessed by the means of research and analysis of the human microbiome. Acne is a common skin disease whose morbidity increases year by year. The lipids which influence acne to a large extent are studied by metagenomic methods in recent years. In this paper, machine learning methods are used to analyze metagenomic sequencing data of acne, i.e., all kinds of lipids in the face skin. Firstly, lipids data of the diseased skin (DS) samples and the healthy skin (HS) samples of acne patients and the normal control (NC) samples of healthy person are, respectively, analyzed by using principal component analysis (PCA) and kernel principal component analysis (KPCA). Then, the lipids which have main influence on each kind of sample are obtained. In addition, a multiset canonical correlation analysis (MCCA) is utilized to get lipids which can differentiate the face skins of the above three samples. The experimental results show the machine learning methods can effectively analyze metagenomic sequencing data of acne. According to the results, lipids which only influence one of the three samples or the lipids which simultaneously have different degree of influence on these three samples can be used as indicators to judge skin statuses.

Download Full-text

Machine Learning Integrates Genomic Signatures for Subclassification Beyond Primary and Secondary Acute Myeloid Leukemia

Blood ◽

10.1182/blood.2020010603 ◽

2021 ◽

Author(s):

Hassan Awada ◽

Arda Durmaz ◽

Carmelo Gurnari ◽

Ashwin Kishtagari ◽

Manja Meggendorfer ◽

...

Keyword(s):

Machine Learning ◽

Acute Myeloid Leukemia ◽

Myeloid Leukemia ◽

Latent Class ◽

Cross Validation ◽

De Novo ◽

Sequencing Data ◽

Learning Methods ◽

Machine Learning Methods ◽

Acute Myeloid

While genomic alterations drive the pathogenesis of acute myeloid leukemia (AML), traditional classifications are largely based on morphology and prototypic genetic founder lesions define only a small proportion of AML patients. The historical subdivision of primary/de novo AML (pAML) and secondary AML (sAML) has shown to variably correlate with genetic patterns. Perhaps, the combinatorial complexity and heterogeneity of AML genomic architecture have precluded, so far, the genomic-based subclassification to identify distinct molecularly-defined subtypes more reflective of shared pathogenesis. We integrated cytogenetic and gene sequencing data from a multicenter cohort of 6,788 AML patients that were analyzed using standard and machine learning methods to generate a novel AML molecular subclassification with biological correlates corresponding to underlying pathogenesis. Standard supervised analyses resulted in modest cross-validation accuracy when attempting to use molecular patterns to predict traditional pathomorphological AML classifications. We performed unsupervised analysis by applying Bayesian Latent Class method that identified 4 unique genomic clusters of distinct prognoses. Invariant genomic features driving each cluster were extracted and resulted in 97% cross-validation accuracy when used for genomic subclassification. Subclasses of AML defined by molecular signatures overlapped current pathomorphological and clinically-defined AML subtypes. We internally and externally validated our results and share an open-access molecular classification scheme for AML patients. Although the heterogeneity inherent in the genomic changes across nearly 7,000 AML patients is too vast for traditional prediction methods, however, machine learning methods allowed for the definition of novel genomic AML subclasses indicating that traditional pathomorphological definitions may be less reflective of overlapping pathogenesis.

Download Full-text