Identification of biological pathway and process regulators using sparse partial least squares and triple-gene mutual interaction
AbstractIdentification of biological process- and pathway-specific regulators is essential for advancing our understanding of regulation and formation of various phenotypic and complex traits. In this study, we applied two methods, triple-gene mutual interaction (TGMI) and Sparse Partial Least Squares (SPLS), to identify the regulators of multiple metabolic pathways in Arabidopsis thaliana and Populus trichocarpa using high-throughput gene expression data. We analyzed four pathways: (1) lignin biosynthesis pathway in A. thaliana and P. trichocarpa; (2) flavanones, flavonol and anthocyannin biosynthesis in A. thaliana; (3) light reaction pathway and Calvin cycle in A. thaliana. (4) light reaction pathway alone in A. thaliana. The efficiencies of two methods were evaluated by examining the positive known regulators captured, the receiver operating characteristic (ROC) curves and the area under ROC curves (AUROC). Our results showed that TGMI is in general more efficient than SPLS in identifying true pathway regulators and ranks them to the top of candidate regulatory gene lists, but the two methods are to some degree complementary because they could identify some different pathway regulators. This study identified many regulators that potentially regulate the above pathways in plants and are valuable for genetic engineering of these pathways.