Compositional data network analysis via lasso penalized D-trace loss
Abstract Motivation With the development of high-throughput sequencing techniques for 16S-rRNA gene profiling, the analysis of microbial communities is becoming more and more attractive and reliable. Inferring the direct interaction network among microbial communities helps in the identification of mechanisms underlying community structure. However, the analysis of compositional data remains challenging by the relative information conveyed by such data, as well as its high dimensionality. Results In this article, we first propose a novel loss function for compositional data called CD-trace based on D-trace loss. A sparse matrix estimator for the direct interaction network is defined as the minimizer of lasso penalized CD-trace loss under positive-definite constraint. An efficient alternating direction algorithm is developed for numerical computation. Simulation results show that CD-trace compares favorably to gCoda and that it is better than sparse inverse covariance estimation for ecological association inference (SPIEC-EASI) (hereinafter S-E) in network recovery with compositional data. Finally, we test CD-trace and compare it to the other methods noted above using mouse skin microbiome data. Availability and implementation The CD-trace is open source and freely available from https://github.com/coamo2/CD-trace under GNU LGPL v3. Supplementary information Supplementary data are available at Bioinformatics online.