Variable Selection and Outlier Detection in Regularized Survival Models: Application to Melanoma Gene Expression Data

Author(s):  
Eunice Carrasquinha ◽  
André Veríssimo ◽  
Marta B. Lopes ◽  
Susana Vinga
Biomedicines ◽  
2020 ◽  
Vol 8 (11) ◽  
pp. 488
Author(s):  
Carolina Peixoto ◽  
Marta B. Lopes ◽  
Marta Martins ◽  
Luís Costa ◽  
Susana Vinga

Colorectal cancer (CRC) is one of the leading causes of mortality and morbidity in the world. Being a heterogeneous disease, cancer therapy and prognosis represent a significant challenge to medical care. The molecular information improves the accuracy with which patients are classified and treated since similar pathologies may show different clinical outcomes and other responses to treatment. However, the high dimensionality of gene expression data makes the selection of novel genes a problematic task. We propose TCox, a novel penalization function for Cox models, which promotes the selection of genes that have distinct correlation patterns in normal vs. tumor tissues. We compare TCox to other regularized survival models, Elastic Net, HubCox, and OrphanCox. Gene expression and clinical data of CRC and normal (TCGA) patients are used for model evaluation. Each model is tested 100 times. Within a specific run, eighteen of the features selected by TCox are also selected by the other survival regression models tested, therefore undoubtedly being crucial players in the survival of colorectal cancer patients. Moreover, the TCox model exclusively selects genes able to categorize patients into significant risk groups. Our work demonstrates the ability of the proposed weighted regularizer TCox to disclose novel molecular drivers in CRC survival by accounting for correlation-based network information from both tumor and normal tissue. The results presented support the relevance of network information for biomarker identification in high-dimensional gene expression data and foster new directions for the development of network-based feature selection methods in precision oncology.


2013 ◽  
Vol 12 ◽  
pp. CIN.S10212 ◽  
Author(s):  
Lingkang Huang ◽  
Hao Helen Zhang ◽  
Zhao-Bang Zeng ◽  
Pierre R. Bushel

Background Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability The source MATLAB code are available from http://math.arizona.edu/∼hzhang/software.html.


2014 ◽  
Vol 13s2 ◽  
pp. CIN.S13787 ◽  
Author(s):  
Lin Zhang ◽  
Jeffrey S. Morris ◽  
Jiexin Zhang ◽  
Robert Z. Orlowski ◽  
Veerabhadran Baladandayuthapani

It is well-established that the development of a disease, especially cancer, is a complex process that results from the joint effects of multiple genes involved in various molecular signaling pathways. In this article, we propose methods to discover genes and molecular pathways significantly associated with clinical outcomes in cancer samples. We exploit the natural hierarchal structure of genes related to a given pathway as a group of interacting genes to conduct selection of both pathways and genes. We posit the problem in a hierarchical structured variable selection (HSVS) framework to analyze the corresponding gene expression data. HSVS methods conduct simultaneous variable selection at the pathway (group level) and the gene (within-group) level. To adapt to the overlapping group structure present in the pathway-gene hierarchy of the data, we developed an overlap-HSVS method that introduces latent partial effect variables that partition the marginal effect of the covariates and corresponding weights for a proportional shrinkage of the partial effects. Combining gene expression data with prior pathway information from the KEGG databases, we identified several gene-pathway combinations that are significantly associated with clinical outcomes of multiple myeloma. Biological discoveries support this relationship for the pathways and the corresponding genes we identified.


2002 ◽  
Vol 176 (1) ◽  
pp. 71-98 ◽  
Author(s):  
A. Szabo ◽  
K. Boucher ◽  
W.L. Carroll ◽  
L.B. Klebanov ◽  
A.D. Tsodikov ◽  
...  

Biometrics ◽  
2014 ◽  
Vol 70 (4) ◽  
pp. 872-880 ◽  
Author(s):  
Quefeng Li ◽  
Sijian Wang ◽  
Chiang-Ching Huang ◽  
Menggang Yu ◽  
Jun Shao

Sign in / Sign up

Export Citation Format

Share Document