scholarly journals Systematic Detection of Large-Scale Multi-Gene Horizontal Transfer in Prokaryotes

Author(s):  
Lina Kloub ◽  
Sean Gosselin ◽  
Matthew Fullmer ◽  
Joerg Graf ◽  
J Peter Gogarten ◽  
...  

Abstract Horizontal gene transfer (HGT) is central to prokaryotic evolution. However, little is known about the “scale” of individual HGT events. In this work, we introduce the first computational framework to help answer the following fundamental question: How often does more than one gene get horizontally transferred in a single HGT event? Our method, called HoMer, uses phylogenetic reconciliation to infer single-gene HGT events across a given set of species/strains, employs several techniques to account for inference error and uncertainty, combines that information with gene order information from extant genomes, and uses statistical analysis to identify candidate horizontal multi-gene transfers (HMGTs) in both extant and ancestral species/strains. HoMer is highly scalable and can be easily used to infer HMGTs across hundreds of genomes. We apply HoMer to a genome-scale dataset of over 22000 gene families from 103 Aeromonas genomes and identify a large number of plausible HMGTs of various scales at both small and large phylogenetic distances. Analysis of these HMGTs reveals interesting relationships between gene function, phylogenetic distance, and frequency of multi-gene transfer. Among other insights, we find that (i) the observed relative frequency of HMGT increases as divergence between genomes increases, (ii) HMGTs often have conserved gene functions, and (iii) rare genes are frequently acquired through HMGT. We also analyze in detail HMGTs involving the zonula occludens toxin and type III secretion systems. By enabling the systematic inference of HMGTs on a large scale, HoMer will facilitate a more accurate and more complete understanding of HGT and microbial evolution.

2020 ◽  
Author(s):  
Lina Kloub ◽  
Sean Gosselin ◽  
Matthew Fullmer ◽  
Joerg Graf ◽  
J. Peter Gogarten ◽  
...  

AbstractHorizontal gene transfer (HGT) is central to prokaryotic evolution. However, little is known about the “scale” of individual HGT events. In this work, we introduce the first computational framework to help answer the following fundamental question: How often does more than one gene get horizontally transferred in a single HGT event? Our method, called HoMer, uses phylogenetic reconciliation to infer single-gene HGT events across a given set of species/strains, employs several techniques to account for inference error and uncertainty, combines that information with gene order information from extant genomes, and uses statistical analysis to identify candidate horizontal multi-gene transfers (HMGTs) in both extant and ancestral species/strains. HoMer is highly scalable and can be easily used to infer HMGTs across hundreds of genomes.We apply HoMer to a genome-scale dataset of over 22000 gene families from 103 Aeromonas genomes and identify a large number of plausible HMGTs of various scales at both small and large phylogenetic distances. Analysis of these HMGTs reveals interesting relationships between gene function, phylogenetic distance, and frequency of multi-gene transfer. Among other insights, we find that (i) the relative frequency of HMGT increases as divergence between genomes increases, (ii) HMGTs often have conserved gene functions, and (iii) rare genes are frequently acquired through HMGT. We also analyze in detail HMGTs involving the zonula occludens toxin and type III secretion systems. By enabling the systematic inference of HMGTs on a large scale, HoMer will facilitate a more accurate and more complete understanding of HGT and microbial evolution.


mSphere ◽  
2017 ◽  
Vol 2 (2) ◽  
Author(s):  
Charley G. P. McCarthy ◽  
David A. Fitzpatrick

ABSTRACT The oomycetes are a class of eukaryotes and include ecologically significant animal and plant pathogens. Single-gene and multigene phylogenetic studies of individual oomycete genera and of members of the larger classes have resulted in conflicting conclusions concerning interspecies relationships among these species, particularly for the Phytophthora genus. The onset of next-generation sequencing techniques now means that a wealth of oomycete genomic data is available. For the first time, we have used genome-scale phylogenetic methods to resolve oomycete phylogenetic relationships. We used supertree methods to generate single-gene and multigene species phylogenies. Overall, our supertree analyses utilized phylogenetic data from 8,355 oomycete gene families. We have also complemented our analyses with superalignment phylogenies derived from 131 single-copy ubiquitous gene families. Our results show that a genome-scale approach to oomycete phylogeny resolves oomycete classes and clades. Our analysis represents an important first step in large-scale phylogenomic analysis of the oomycetes. The oomycetes are a class of microscopic, filamentous eukaryotes within the Stramenopiles-Alveolata-Rhizaria (SAR) supergroup which includes ecologically significant animal and plant pathogens, most infamously the causative agent of potato blight Phytophthora infestans. Single-gene and concatenated phylogenetic studies both of individual oomycete genera and of members of the larger class have resulted in conflicting conclusions concerning species phylogenies within the oomycetes, particularly for the large Phytophthora genus. Genome-scale phylogenetic studies have successfully resolved many eukaryotic relationships by using supertree methods, which combine large numbers of potentially disparate trees to determine evolutionary relationships that cannot be inferred from individual phylogenies alone. With a sufficient amount of genomic data now available, we have undertaken the first whole-genome phylogenetic analysis of the oomycetes using data from 37 oomycete species and 6 SAR species. In our analysis, we used established supertree methods to generate phylogenies from 8,355 homologous oomycete and SAR gene families and have complemented those analyses with both phylogenomic network and concatenated supermatrix analyses. Our results show that a genome-scale approach to oomycete phylogeny resolves oomycete classes and individual clades within the problematic Phytophthora genus. Support for the resolution of the inferred relationships between individual Phytophthora clades varies depending on the methodology used. Our analysis represents an important first step in large-scale phylogenomic analysis of the oomycetes. IMPORTANCE The oomycetes are a class of eukaryotes and include ecologically significant animal and plant pathogens. Single-gene and multigene phylogenetic studies of individual oomycete genera and of members of the larger classes have resulted in conflicting conclusions concerning interspecies relationships among these species, particularly for the Phytophthora genus. The onset of next-generation sequencing techniques now means that a wealth of oomycete genomic data is available. For the first time, we have used genome-scale phylogenetic methods to resolve oomycete phylogenetic relationships. We used supertree methods to generate single-gene and multigene species phylogenies. Overall, our supertree analyses utilized phylogenetic data from 8,355 oomycete gene families. We have also complemented our analyses with superalignment phylogenies derived from 131 single-copy ubiquitous gene families. Our results show that a genome-scale approach to oomycete phylogeny resolves oomycete classes and clades. Our analysis represents an important first step in large-scale phylogenomic analysis of the oomycetes.


Genetics ◽  
2001 ◽  
Vol 159 (4) ◽  
pp. 1765-1778
Author(s):  
Gregory J Budziszewski ◽  
Sharon Potter Lewis ◽  
Lyn Wegrich Glover ◽  
Jennifer Reineke ◽  
Gary Jones ◽  
...  

Abstract We have undertaken a large-scale genetic screen to identify genes with a seedling-lethal mutant phenotype. From screening ~38,000 insertional mutant lines, we identified >500 seedling-lethal mutants, completed cosegregation analysis of the insertion and the lethal phenotype for >200 mutants, molecularly characterized 54 mutants, and provided a detailed description for 22 of them. Most of the seedling-lethal mutants seem to affect chloroplast function because they display altered pigmentation and affect genes encoding proteins predicted to have chloroplast localization. Although a high level of functional redundancy in Arabidopsis might be expected because 65% of genes are members of gene families, we found that 41% of the essential genes found in this study are members of Arabidopsis gene families. In addition, we isolated several interesting classes of mutants and genes. We found three mutants in the recently discovered nonmevalonate isoprenoid biosynthetic pathway and mutants disrupting genes similar to Tic40 and tatC, which are likely to be involved in chloroplast protein translocation. Finally, we directly compared T-DNA and Ac/Ds transposon mutagenesis methods in Arabidopsis on a genome scale. In each population, we found only about one-third of the insertion mutations cosegregated with a mutant phenotype.


2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Runyu Jing ◽  
Tingke Wen ◽  
Chengxiang Liao ◽  
Li Xue ◽  
Fengjuan Liu ◽  
...  

Abstract Type III secretion systems (T3SSs) are bacterial membrane-embedded nanomachines that allow a number of humans, plant and animal pathogens to inject virulence factors directly into the cytoplasm of eukaryotic cells. Export of effectors through T3SSs is critical for motility and virulence of most Gram-negative pathogens. Current computational methods can predict type III secreted effectors (T3SEs) from amino acid sequences, but due to algorithmic constraints, reliable and large-scale prediction of T3SEs in Gram-negative bacteria remains a challenge. Here, we present DeepT3 2.0 (http://advintbioinforlab.com/deept3/), a novel web server that integrates different deep learning models for genome-wide predicting T3SEs from a bacterium of interest. DeepT3 2.0 combines various deep learning architectures including convolutional, recurrent, convolutional-recurrent and multilayer neural networks to learn N-terminal representations of proteins specifically for T3SE prediction. Outcomes from the different models are processed and integrated for discriminating T3SEs and non-T3SEs. Because it leverages diverse models and an integrative deep learning framework, DeepT3 2.0 outperforms existing methods in validation datasets. In addition, the features learned from networks are analyzed and visualized to explain how models make their predictions. We propose DeepT3 2.0 as an integrated and accurate tool for the discovery of T3SEs.


2019 ◽  
Vol 116 (12) ◽  
pp. 5613-5622 ◽  
Author(s):  
David S. Milner ◽  
Victoria Attah ◽  
Emily Cook ◽  
Finlay Maguire ◽  
Fiona R. Savory ◽  
...  

Many microbes acquire metabolites in a “feeding” process where complex polymers are broken down in the environment to their subunits. The subsequent uptake of soluble metabolites by a cell, sometimes called osmotrophy, is facilitated by transporter proteins. As such, the diversification of osmotrophic microorganisms is closely tied to the diversification of transporter functions. Horizontal gene transfer (HGT) has been suggested to produce genetic variation that can lead to adaptation, allowing lineages to acquire traits and expand niche ranges. Transporter genes often encode single-gene phenotypes and tend to have low protein–protein interaction complexity and, as such, are potential candidates for HGT. Here we test the idea that HGT has underpinned the expansion of metabolic potential and substrate utilization via transfer of transporter-encoding genes. Using phylogenomics, we identify seven cases of transporter-gene HGT between fungal phyla, and investigate compatibility, localization, function, and fitness consequences when these genes are expressed inSaccharomyces cerevisiae. Using this approach, we demonstrate that the transporters identified can alter how fungi utilize a range of metabolites, including peptides, polyols, and sugars. We then show, for one model gene, that transporter gene acquisition by HGT can significantly alter the fitness landscape ofS. cerevisiae. We therefore provide evidence that transporter HGT occurs between fungi, alters how fungi can acquire metabolites, and can drive gain in fitness. We propose a “transporter-gene acquisition ratchet,” where transporter repertoires are continually augmented by duplication, HGT, and differential loss, collectively acting to overwrite, fine-tune, and diversify the complement of transporters present in a genome.


2015 ◽  
Vol 112 (33) ◽  
pp. 10139-10146 ◽  
Author(s):  
Chuan Ku ◽  
Shijulal Nelson-Sathi ◽  
Mayo Roettger ◽  
Sriram Garg ◽  
Einat Hazkani-Covo ◽  
...  

Endosymbiotic theory in eukaryotic-cell evolution rests upon a foundation of three cornerstone partners—the plastid (a cyanobacterium), the mitochondrion (a proteobacterium), and its host (an archaeon)—and carries a corollary that, over time, the majority of genes once present in the organelle genomes were relinquished to the chromosomes of the host (endosymbiotic gene transfer). However, notwithstanding eukaryote-specific gene inventions, single-gene phylogenies have never traced eukaryotic genes to three single prokaryotic sources, an issue that hinges crucially upon factors influencing phylogenetic inference. In the age of genomes, single-gene trees, once used to test the predictions of endosymbiotic theory, now spawn new theories that stand to eventually replace endosymbiotic theory with descriptive, gene tree-based variants featuring supernumerary symbionts: prokaryotic partners distinct from the cornerstone trio and whose existence is inferred solely from single-gene trees. We reason that the endosymbiotic ancestors of mitochondria and chloroplasts brought into the eukaryotic—and plant and algal—lineage a genome-sized sample of genes from the proteobacterial and cyanobacterial pangenomes of their respective day and that, even if molecular phylogeny were artifact-free, sampling prokaryotic pangenomes through endosymbiotic gene transfer would lead to inherited chimerism. Recombination in prokaryotes (transduction, conjugation, transformation) differs from recombination in eukaryotes (sex). Prokaryotic recombination leads to pangenomes, and eukaryotic recombination leads to vertical inheritance. Viewed from the perspective of endosymbiotic theory, the critical transition at the eukaryote origin that allowed escape from Muller’s ratchet—the origin of eukaryotic recombination, or sex—might have required surprisingly little evolutionary innovation.


mBio ◽  
2018 ◽  
Vol 9 (5) ◽  
Author(s):  
Jason P. Lynch ◽  
Cammie F. Lesser

ABSTRACT Several genome-wide screens have been conducted to identify host cell factors involved in the pathogenesis of bacterial pathogens whose virulence is dependent on type III secretion systems (T3SSs), nanomachines responsible for the translocation of proteins into host cells. In the most recent of these, Pacheco et al. (mBio 9:e01003-18, 2018, http://mbio.asm.org/content/9/3/e01003-18.full) screened a genome-wide CRISPR/Cas9 (clustered regularly interspaced short palindromic repeats with Cas9) knockout library for host proteins involved in the pathogenesis of enterohemorrhagic Escherichia coli (EHEC). Their study revealed an unrecognized link between EHEC’s two major virulence determinants (its T3SS and Shiga toxins). We discuss these findings in light of data from three other genome-wide screens. Each of these studies uncovered multiple host cell determinants, which curiously share little to no overlap but primarily are involved in mediating early interactions between T3SSs and host cells. We therefore consider how each screen was performed, the advantages and disadvantages of each, and how follow-up studies might be designed to address these issues.


Biology ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 715
Author(s):  
Fengjiao Zhang ◽  
Ning Wang ◽  
Guanghao Cheng ◽  
Xiaochun Shu ◽  
Tao Wang ◽  
...  

The genus Lycoris (Amaryllidaceae) consists of about 20 species, which is endemic to East Asia. Although the Lycoris species is of great horticultural and medical importance, challenges in accurate species identification persist due to frequent natural hybridization and large-scale intraspecific variation. In this study, we sequenced chloroplast genomes of four Lycoris species and retrieved seven published chloroplast (cp) genome sequences in this genus for comparative genomic and phylogenetic analyses. The cp genomes of these four newly sequenced species were found to be 158,405–158,498 bp with the same GC content of 37.8%. The structure of the genomes exhibited the typical quadripartite structure with conserved gene order and content. A total of 113 genes (20 duplicated) were identified, including 79 protein-coding genes (PCGs), 30 tRNAs, and 4 rRNAs. Phylogenetic analysis showed that the 11 species were clustered into three main groups, and L. sprengeri locate at the base of Lycoriss. The L. radiata was suggested to be the female donor of the L. incarnata, L. shaanxiensis, and L. squamigera. The L. straminea and L. houdyshelii may be derived from L. anhuiensis, L. chinensis, or L. longituba. These results could not only offer a genome-scale platform for identification and utilization of Lycoris but also provide a phylogenomic framework for future studies in this genus.


2016 ◽  
Author(s):  
Apurva Narechania ◽  
Richard Baker ◽  
Rob DeSalle ◽  
Barun Mathema ◽  
Sergios-Orestis Kolokotronis ◽  
...  

AbstractBackgroundCollective animal behavior such as the flocking of birds or the shoaling of fish has inspired a class of algorithms designed to optimize distance-based clusters in various applications including document analysis and DNA microarrays. In the flocking model, individual agents respond only to their immediate environment and move according to a few simple rules. After several iterations the agents self-organize and clusters emerge without the need for partitional seeds. In addition to their unsupervised nature, flocking offers several computational advantages including the potential to decrease the number of required comparisons.FindingsIn Clusterflock, we implement a flocking algorithm designed to find groups (flocks) of orthologous gene families (OGFs) that share a common evolutionary history. Pairwise distances that measure the phylogenetic incongruence between OGFs guide flock formation. We test this approach on several simulated datasets varying the number of underlying topologies, the proportion of missing data, and evolutionary rates, and show that in datasets containing high levels of missing data and rate heterogeneity, clusterflock outperforms other well-established clustering techniques. We also demonstrate its utility on a known, large-scale recombination event inStaphylococcus aureus. By isolating sets of OGFs with divergent phylogenetic signal, we can pinpoint the recombined region without forcing a pre-determined number of groupings or defining a pre-determined incongruence threshold.ConclusionsClusterflock is an open source tool that can be used to discover horizontally transferred genes, recombining areas of chromosomes, and the phylogenetic “core” of a genome. Though we use it in an evolutionary context, it is generalizable to any clustering problem. Users can write extensions to calculate any distance metric on the unit interval and use these distances to flock any type of data.


2020 ◽  
Author(s):  
Guillaume Louvel ◽  
Hugues Roest Crollius

AbstractMolecular dating is a cornerstone of evolutionary biology, yet it is by far not an exact science. The inference of precise dates using gene sequences is difficult, in part because of the stochastic process of DNA mutation, selective forces that alter substitution rates and many unknown parameters linked to population genetics in ancestral lineages. Dating species divergence is one important challenge in this field, which is usually performed by concatenating extant sequences sampled within a genome as representative of a lineage, and computing distances between these lineages. However, concatenates precludes the dating of events specific to a gene family, such as gene duplication. During evolutionary time, individual gene sequences record different signatures of base substitutions and at rates that may deviate substantially from the average rate. No formal study exists that quantifies which parameters influence this deviation. Here we designed a strategy to date events within a gene family, and we test the influence of more than 30 parameters on dating accuracy. We developed this approach on approximately 5,000 primate gene families comprising 12 genomes that display no gene loss nor gene duplications. We then test its relevance in the complete set of primate gene families to date gene duplications. Our result are compared to previous fossil and molecular dating approaches, and provide a practical set of guidelines for accurate molecular dating at the single gene family level.


Sign in / Sign up

Export Citation Format

Share Document