Background:
Colorectal cancer (CRC) is the third most common cancer among women and men in the USA,
and recent studies have shown an increasing incidence in less developed regions, including Sub-Saharan Africa (SSA).
We developed a hybrid (DNA mutation and RNA expression) signature and assessed its predictive properties for the
mutation status and survival of CRC patients.
Methods:
Publicly-available microarray and RNASeq data from 54 matched formalin-fixed paraffin-embedded (FFPE)
samples from the Affymetrix GeneChip and RNASeq platforms, were used to obtain differentially expressed genes
between mutant and wild-type samples. We applied the support-vector machines, artificial neural networks, random
forests, k-nearest neighbor, naïve Bayes, negative binomial linear discriminant analysis, and the Poisson linear
discriminant analysis algorithms for classification. Cox proportional hazards model was used for survival analysis.
Results:
Compared to the genelist from each of the individual platforms, the hybrid genelist had the highest accuracy,
sensitivity, specificity, and AUC for mutation status, across all the classifiers and is prognostic for survival in patients
with CRC. NBLDA method was the best performer on the RNASeq data while the SVM method was the most suitable
classifier for CRC across the two data types. Nine genes were found to be predictive of survival.
Conclusion:
This signature could be useful in clinical practice, especially for colorectal cancer diagnosis and therapy.
Future studies should determine the effectiveness of integration in cancer survival analysis and the application on
unbalanced data, where the classes are of different sizes, as well as on data with multiple classes.