Gene Tree Diameter for Deep Coalescence

Author(s):  
Pawel Gorecki ◽  
Oliver Eulenstein
2020 ◽  
Author(s):  
Ishrat Tanzila Farah ◽  
Md Muktadirul Islam ◽  
Kazi Tasnim Zinat ◽  
Atif Hasan Rahman ◽  
Md Shamsuzzoha Bayzid

AbstractSpecies tree estimation from multi-locus dataset is extremely challenging, especially in the presence of gene tree heterogeneity across the genome due to incomplete lineage sorting (ILS). Summary methods have been developed which estimate gene trees and then combine the gene trees to estimate a species tree by optimizing various optimization scores. In this study, we have formalized the concept of “phylogenomic terraces” in the species tree space, where multiple species trees with distinct topologies may have exactly the same optimization score (quartet score, extra lineage score, etc.) with respect to a collection of gene trees. We investigated the presence and implication of terraces in species tree estimation from multi-locus data by taking ILS into account. We analyzed two of the most popular ILS-aware optimization criteria: maximize quartet consistency (MQC) and minimize deep coalescence (MDC). Methods based on MQC are provably statistically consistent, whereas MDC is not a consistent criterion for species tree estimation. Our experiments, on a collection of dataset simulated under ILS, indicate that MDC-based methods may achieve competitive or identical quartet consistency score as MQC but could be significantly worse than MQC in terms of tree accuracy – demonstrating the presence and affect of phylogenomic terraces. This is the first known study that formalizes the concept of phylogenomic terraces in the context of species tree estimation from multi-locus data, and reports the presence and implications of terraces in species tree estimation under ILS.


2021 ◽  
Author(s):  
Matthew LeMay ◽  
Yi-Chieh Wu ◽  
Ran Libeskind-Hadas

The maximum parsimony phylogenetic reconciliation problem seeks to explain incongruity between a gene phylogeny and a species phylogeny with respect to a set of evolutionary events. While the reconciliation problem is well-studied for species and gene trees subject to events such as duplication, transfer, loss, and deep coalescence, recent work has examined species phylogenies that incorporate hybridization and are thus represented by networks rather than trees. In this paper, we show that the problem of computing a maximum parsimony reconciliation for a gene tree and species network is NP-hard even when only considering deep coalescence. This result suggests that future work on maximum parsimony reconciliation for species networks should explore approximation algorithms and heuristics.


2018 ◽  
Author(s):  
Yann J K Bertrand ◽  
Anna Petri ◽  
Anne-Cathrine Scheen ◽  
Mats Töpel ◽  
Bengt Oxelman

AbstractPhylogenetic methods that rely on information from multiple, unlinked genes have recently been developed for resolving complex situations where evolutionary relationships do not conform to bifurcated trees and are more adequately depicted by networks. Such situations arise when successive interspecific hybridizations in combination with genome duplications have shaped species phylogenies. Several processes such as homoeolog loss and deep coalescence can potentially hamper our ability to recover the historical signal correctly. Consequently the prospect of reconstructing accurate phylogenies lies in the combination of several low-copy nuclear markers that when, used in concert, can provide homoeologs for all the ancestral genomes and help to disentangle gene tree incongruence due to deep coalescence events. Expressed sequence tag (EST) databases represent valuable resource for the identification of genes in organisms with uncharacterized genomes and for development of molecular markers. The genus Silene L. is a prime example of a plant group whose evolutionary history involves numerous events of hybridization and polyploidization. As for many groups there is currently a shortage of low-copy nuclear markers, for which phylogenetic usefulness has been demonstrated. Here, we present two EST libraries for two species of Silene that belong to large phylogenetic groups not previously investigated with next generation technologies. The assembled and annotated transcriptomes are used for identifying low copy nuclear regions, suitable for sequencing.


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0251107
Author(s):  
Ayed A. R. Alanzi ◽  
James H. Degnan

Species trees, which describe the evolutionary relationships between species, are often inferred from gene trees, which describe the ancestral relationships between sequences sampled at different loci from the species of interest. A common approach to inferring species trees from gene trees is motivated by supposing that gene tree variation is due to incomplete lineage sorting, also known as deep coalescence. One of the earliest methods motivated by deep coalescence is to find the species tree that minimizes the number of deep coalescent events needed to explain discrepancies between the species tree and input gene trees. This minimize deep coalescence (MDC) criterion can be applied in both rooted and unrooted settings. where either rooted or unrooted gene trees can be used to infer a rooted species tree. Previous work has shown that MDC is statistically inconsistent in the rooted setting, meaning that under a probabilistic model for deep coalescence, the multispecies coalescent, for some species trees, increasing the number of input gene trees does not make the method more likely to return a correct species tree. Here, we obtain analogous results in the unrooted setting, showing conditions leading to inconsistency of the MDC criterion using the multispecies coalescent model with unrooted gene trees for four taxa and five taxa.


2020 ◽  
Author(s):  
Liming Cai ◽  
Zhenxiang Xi ◽  
Emily Moriarty Lemmon ◽  
Alan R Lemmon ◽  
Austin Mast ◽  
...  

Abstract The genomic revolution offers renewed hope of resolving rapid radiations in the Tree of Life. The development of the multispecies coalescent (MSC) model and improved gene tree estimation methods can better accommodate gene tree heterogeneity caused by incomplete lineage sorting (ILS) and gene tree estimation error stemming from the short internal branches. However, the relative influence of these factors in species tree inference is not well understood. Using anchored hybrid enrichment, we generated a data set including 423 single-copy loci from 64 taxa representing 39 families to infer the species tree of the flowering plant order Malpighiales. This order includes nine of the top ten most unstable nodes in angiosperms, which have been hypothesized to arise from the rapid radiation during the Cretaceous. Here, we show that coalescent-based methods do not resolve the backbone of Malpighiales and concatenation methods yield inconsistent estimations, providing evidence that gene tree heterogeneity is high in this clade. Despite high levels of ILS and gene tree estimation error, our simulations demonstrate that these two factors alone are insufficient to explain the lack of resolution in this order. To explore this further, we examined triplet frequencies among empirical gene trees and discovered some of them deviated significantly from those attributed to ILS and estimation error, suggesting gene flow as an additional and previously unappreciated phenomenon promoting gene tree variation in Malpighiales. Finally, we applied a novel method to quantify the relative contribution of these three primary sources of gene tree heterogeneity and demonstrated that ILS, gene tree estimation error, and gene flow contributed to 10.0%, 34.8%, and 21.4% of the variation, respectively. Together, our results suggest that a perfect storm of factors likely influence this lack of resolution, and further indicate that recalcitrant phylogenetic relationships like the backbone of Malpighiales may be better represented as phylogenetic networks. Thus, reducing such groups solely to existing models that adhere strictly to bifurcating trees greatly oversimplifies reality, and obscures our ability to more clearly discern the process of evolution.


Genetics ◽  
2003 ◽  
Vol 164 (4) ◽  
pp. 1645-1656 ◽  
Author(s):  
Bruce Rannala ◽  
Ziheng Yang

Abstract The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be ∼20,000, with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.


Author(s):  
Andrew W Legan ◽  
Christopher M Jernigan ◽  
Sara E Miller ◽  
Matthieu F Fuchs ◽  
Michael J Sheehan

Abstract Independent origins of sociality in bees and ants are associated with independent expansions of particular odorant receptor (OR) gene subfamilies. In ants, one clade within the OR gene family, the 9-exon subfamily, has dramatically expanded. These receptors detect cuticular hydrocarbons (CHCs), key social signaling molecules in insects. It is unclear to what extent 9-exon OR subfamily expansion is associated with the independent evolution of sociality across Hymenoptera, warranting studies of taxa with independently derived social behavior. Here we describe odorant receptor gene family evolution in the northern paper wasp, Polistes fuscatus, and compare it to four additional paper wasp species spanning ∼40 million years of evolutionary divergence. We find 200 putatively functional OR genes in P. fuscatus, matching predictions from neuroanatomy, and more than half of these are in the 9-exon subfamily. Most OR gene expansions are tandemly arrayed at orthologous loci in Polistes genomes, and microsynteny analysis shows species-specific gain and loss of 9-exon ORs within tandem arrays. There is evidence of episodic positive diversifying selection shaping ORs in expanded subfamilies. Values of omega (d  N/dS) are higher among 9-exon ORs compared to other OR subfamilies. Within the Polistes OR gene tree, branches in the 9-exon OR clade experience relaxed negative (purifying) selection relative to other branches in the tree. Patterns of OR evolution within Polistes are consistent with 9-exon OR function in CHC perception by combinatorial coding, with both natural selection and neutral drift contributing to interspecies differences in gene copy number and sequence.


Sign in / Sign up

Export Citation Format

Share Document