Accurate and Efficient Gene Function Prediction using a Multi-Bacterial Network
AbstractMotivationNearly 40% of the genes in sequenced genomes have no experimentally- or computationally-derived functional annotations. To fill this gap, we seek to develop methods for network-based gene function prediction that can integrate heterogeneous data for multiple species with experimentally-based functional annotations and systematically transfer them to newly-sequenced organisms on a genomewide scale. However, the large size of such networks pose a challenge for the scalability of current methods.ResultsWe develop a label propagation algorithm called FastSinkSource. By formally bounding its the rate of progress, we decrease the running time by a factor of 100 without sacrificing accuracy. We systematically evaluate many approaches to construct multi-species bacterial networks and apply FastSinkSource and other state-of-the-art methods to these networks. We find that the most accurate and efficient approach is to pre-compute annotation scores for species with experimental annotations, and then to transfer them to other organisms. In this manner, FastSinkSource runs in under three minutes for 200 bacterial species.Availability and ImplementationPython implementations of each algorithm and all data used in this research are available at http://bioinformatics.cs.vt.edu/~jeffl/supplements/[email protected] InformationA supplementary file is available at bioRxiv online.