Abstract
Background: N4-like viruses, with specific genomic features and propagation signatures, comprise a unique viral clade within the Podoviridae family. N4-like viruses are commonly characterized by the N4-like major capsid protein (MCP) and a giant virion-encapsulated RNA polymerase (N4-like RNAP) with a size of approximately 3,500-aa, which is the largest viral protein so far described. To date, our understanding of N4-like viruses is largely derived from 80 viral isolates that infect bacteria. Thus, it is necessary to expand the diversity of N4-like viruses in culturing-independent methods.Methods: A Hidden-Markov-Module based method was designed based on two characterized N4-specific marker genes, major capsid protein and N4-like virion-encapsulated RNA polymerase. Viral sub-clades were classified based on the monophyly presented in phylogenic tree and the results of pangenome analysis. Further analysis assessed different distribution patterns, genomic properties, hosts’ metabolism reprogramming potentialities, significance of viral tRNA and horizontal gene transfer landscape.Results: We identified 1,000 N4-like virus sequences from genomes and metagenomes representing diverse habitats from around the world. N4-like viruses have been classified into 27 sub-clades and detected in almost all habitats from pole to pole, including some novel habitats, such as oral mucosa and Antarctica. Virulent factors might be crucial for some human-associated N4-like viruses to reprogram the metabolism of host cells and mediate their pathogenic ability through horizontal gene transfer. From the pangenome analysis, the protein diversity was expended over 7-fold and 17 conserved house-keeping genes were identified. Transcriptional compensation of tRNA indicates that producing progeny virion might be the main significance of viral tRNAs. From the horizontal gene transfer network, some N4-like viral sub-clades were observed that potentially infect some important human pathogens, such as Campylobacteria and Veillonella , which have not been considered as potential hosts of N4-like virus or even any virus.Conclusion: This study expands the knowledge of N4-like viruses via global metagenomic datasets, reveals the novel ecological and genomic signatures of these viruses and will provide the backbone for further N4-like virus studies.