scholarly journals The Most Parsimonious Reconciliation Problem in the Presence of Incomplete Lineage Sorting and Hybridization is NP-Hard

2021 ◽  
Author(s):  
Matthew LeMay ◽  
Yi-Chieh Wu ◽  
Ran Libeskind-Hadas

The maximum parsimony phylogenetic reconciliation problem seeks to explain incongruity between a gene phylogeny and a species phylogeny with respect to a set of evolutionary events. While the reconciliation problem is well-studied for species and gene trees subject to events such as duplication, transfer, loss, and deep coalescence, recent work has examined species phylogenies that incorporate hybridization and are thus represented by networks rather than trees. In this paper, we show that the problem of computing a maximum parsimony reconciliation for a gene tree and species network is NP-hard even when only considering deep coalescence. This result suggests that future work on maximum parsimony reconciliation for species networks should explore approximation algorithms and heuristics.

2020 ◽  
Author(s):  
Ishrat Tanzila Farah ◽  
Md Muktadirul Islam ◽  
Kazi Tasnim Zinat ◽  
Atif Hasan Rahman ◽  
Md Shamsuzzoha Bayzid

AbstractSpecies tree estimation from multi-locus dataset is extremely challenging, especially in the presence of gene tree heterogeneity across the genome due to incomplete lineage sorting (ILS). Summary methods have been developed which estimate gene trees and then combine the gene trees to estimate a species tree by optimizing various optimization scores. In this study, we have formalized the concept of “phylogenomic terraces” in the species tree space, where multiple species trees with distinct topologies may have exactly the same optimization score (quartet score, extra lineage score, etc.) with respect to a collection of gene trees. We investigated the presence and implication of terraces in species tree estimation from multi-locus data by taking ILS into account. We analyzed two of the most popular ILS-aware optimization criteria: maximize quartet consistency (MQC) and minimize deep coalescence (MDC). Methods based on MQC are provably statistically consistent, whereas MDC is not a consistent criterion for species tree estimation. Our experiments, on a collection of dataset simulated under ILS, indicate that MDC-based methods may achieve competitive or identical quartet consistency score as MQC but could be significantly worse than MQC in terms of tree accuracy – demonstrating the presence and affect of phylogenomic terraces. This is the first known study that formalizes the concept of phylogenomic terraces in the context of species tree estimation from multi-locus data, and reports the presence and implications of terraces in species tree estimation under ILS.


PLoS ONE ◽  
2021 ◽  
Vol 16 (5) ◽  
pp. e0251107
Author(s):  
Ayed A. R. Alanzi ◽  
James H. Degnan

Species trees, which describe the evolutionary relationships between species, are often inferred from gene trees, which describe the ancestral relationships between sequences sampled at different loci from the species of interest. A common approach to inferring species trees from gene trees is motivated by supposing that gene tree variation is due to incomplete lineage sorting, also known as deep coalescence. One of the earliest methods motivated by deep coalescence is to find the species tree that minimizes the number of deep coalescent events needed to explain discrepancies between the species tree and input gene trees. This minimize deep coalescence (MDC) criterion can be applied in both rooted and unrooted settings. where either rooted or unrooted gene trees can be used to infer a rooted species tree. Previous work has shown that MDC is statistically inconsistent in the rooted setting, meaning that under a probabilistic model for deep coalescence, the multispecies coalescent, for some species trees, increasing the number of input gene trees does not make the method more likely to return a correct species tree. Here, we obtain analogous results in the unrooted setting, showing conditions leading to inconsistency of the MDC criterion using the multispecies coalescent model with unrooted gene trees for four taxa and five taxa.


2020 ◽  
Author(s):  
Fernando Lopes ◽  
Larissa R Oliveira ◽  
Amanda Kessler ◽  
Yago Beux ◽  
Enrique Crespo ◽  
...  

Abstract The phylogeny and systematics of fur seals and sea lions (Otariidae) have long been studied with diverse data types, including an increasing amount of molecular data. However, only a few phylogenetic relationships have reached acceptance because of strong gene-tree species tree discordance. Divergence times estimates in the group also vary largely between studies. These uncertainties impeded the understanding of the biogeographical history of the group, such as when and how trans-equatorial dispersal and subsequent speciation events occurred. Here we used high-coverage genome-wide sequencing for 14 of the 15 species of Otariidae to elucidate the phylogeny of the family and its bearing on the taxonomy and biogeographical history. Despite extreme topological discordance among gene trees, we found a fully supported species tree that agrees with the few well-accepted relationships and establishes monophyly of the genus Arctocephalus. Our data support a relatively recent trans-hemispheric dispersal at the base of a southern clade, which rapidly diversified into six major lineages between 3 to 2.5 Ma. Otaria diverged first, followed by Phocarctos and then four major lineages within Arctocephalus. However, we found Zalophus to be non-monophyletic, with California (Z. californianus) and Steller sea lions (Eumetopias jubatus) grouping closer than the Galapagos sea lion (Z. wollebaeki) with evidence for introgression between the two genera. Overall, the high degree of genealogical discordance was best explained by incomplete lineage sorting resulting from quasi-simultaneous speciation within the southern clade with introgresssion playing a subordinate role in explaining the incongruence among and within prior phylogenetic studies of the family.


2022 ◽  
Vol 12 ◽  
Author(s):  
Martha Kandziora ◽  
Petr Sklenář ◽  
Filip Kolář ◽  
Roswitha Schmickl

A major challenge in phylogenetics and -genomics is to resolve young rapidly radiating groups. The fast succession of species increases the probability of incomplete lineage sorting (ILS), and different topologies of the gene trees are expected, leading to gene tree discordance, i.e., not all gene trees represent the species tree. Phylogenetic discordance is common in phylogenomic datasets, and apart from ILS, additional sources include hybridization, whole-genome duplication, and methodological artifacts. Despite a high degree of gene tree discordance, species trees are often well supported and the sources of discordance are not further addressed in phylogenomic studies, which can eventually lead to incorrect phylogenetic hypotheses, especially in rapidly radiating groups. We chose the high-Andean Asteraceae genus Loricaria to shed light on the potential sources of phylogenetic discordance and generated a phylogenetic hypothesis. By accounting for paralogy during gene tree inference, we generated a species tree based on hundreds of nuclear loci, using Hyb-Seq, and a plastome phylogeny obtained from off-target reads during target enrichment. We observed a high degree of gene tree discordance, which we found implausible at first sight, because the genus did not show evidence of hybridization in previous studies. We used various phylogenomic analyses (trees and networks) as well as the D-statistics to test for ILS and hybridization, which we developed into a workflow on how to tackle phylogenetic discordance in recent radiations. We found strong evidence for ILS and hybridization within the genus Loricaria. Low genetic differentiation was evident between species located in different Andean cordilleras, which could be indicative of substantial introgression between populations, promoted during Pleistocene glaciations, when alpine habitats shifted creating opportunities for secondary contact and hybridization.


AoB Plants ◽  
2020 ◽  
Vol 12 (3) ◽  
Author(s):  
Nannie L Persson ◽  
Ingrid Toresen ◽  
Heidi Lie Andersen ◽  
Jenny E E Smedmark ◽  
Torsten Eriksson

Abstract The genus Potentilla (Rosaceae) has been subjected to several phylogenetic studies, but resolving its evolutionary history has proven challenging. Previous analyses recovered six, informally named, groups: the Argentea, Ivesioid, Fragarioides, Reptans, Alba and Anserina clades, but the relationships among some of these clades differ between data sets. The Reptans clade, which includes the type species of Potentilla, has been noticed to shift position between plastid and nuclear ribosomal data sets. We studied this incongruence by analysing four low-copy nuclear markers, in addition to chloroplast and nuclear ribosomal data, with a set of Bayesian phylogenetic and Multispecies Coalescent (MSC) analyses. A selective taxon removal strategy demonstrated that the included representatives from the Fragarioides clade, P. dickinsii and P. fragarioides, were the main sources of the instability seen in the trees. The Fragarioides species showed different relationships in each gene tree, and were only supported as a monophyletic group in a single marker when the Reptans clade was excluded from the analysis. The incongruences could not be explained by allopolyploidy, but rather by homoploid hybridization, incomplete lineage sorting or taxon sampling effects. When P. dickinsii and P. fragarioides were removed from the data set, a fully resolved, supported backbone phylogeny of Potentilla was obtained in the MSC analysis. Additionally, indications of autopolyploid origins of the Reptans and Ivesioid clades were discovered in the low-copy gene trees.


2020 ◽  
Author(s):  
Michael J. Sanderson ◽  
Michelle M. McMahon ◽  
Mike Steel

AbstractTerraces in phylogenetic tree space are sets of trees with identical optimality scores for a given data set, arising from missing data. These were first described for multilocus phylogenetic data sets in the context of maximum parsimony inference and maximum likelihood inference under certain model assumptions. Here we show how the mathematical properties that lead to terraces extend to gene tree - species tree problems in which the gene trees are incomplete. Inference of species trees from either sets of gene family trees subject to duplication and loss, or allele trees subject to incomplete lineage sorting, can exhibit terraces in their solution space. First, we show conditions that lead to a new kind of terrace, which stems from subtree operations that appear in reconciliation problems for incomplete trees. Then we characterize when terraces of both types can occur when the optimality criterion for tree search is based on duplication, loss or deep coalescence scores. Finally, we examine the impact of assumptions about the causes of losses: whether they are due to imperfect sampling or true evolutionary deletion.


2019 ◽  
Vol 68 (6) ◽  
pp. 937-955 ◽  
Author(s):  
Alison Cloutier ◽  
Timothy B Sackton ◽  
Phil Grayson ◽  
Michele Clamp ◽  
Allan J Baker ◽  
...  

Abstract Palaeognathae represent one of the two basal lineages in modern birds, and comprise the volant (flighted) tinamous and the flightless ratites. Resolving palaeognath phylogenetic relationships has historically proved difficult, and short internal branches separating major palaeognath lineages in previous molecular phylogenies suggest that extensive incomplete lineage sorting (ILS) might have accompanied a rapid ancient divergence. Here, we investigate palaeognath relationships using genome-wide data sets of three types of noncoding nuclear markers, together totaling 20,850 loci and over 41 million base pairs of aligned sequence data. We recover a fully resolved topology placing rheas as the sister to kiwi and emu + cassowary that is congruent across marker types for two species tree methods (MP-EST and ASTRAL-II). This topology is corroborated by patterns of insertions for 4274 CR1 retroelements identified from multispecies whole-genome screening, and is robustly supported by phylogenomic subsampling analyses, with MP-EST demonstrating particularly consistent performance across subsampling replicates as compared to ASTRAL. In contrast, analyses of concatenated data supermatrices recover rheas as the sister to all other nonostrich palaeognaths, an alternative that lacks retroelement support and shows inconsistent behavior under subsampling approaches. While statistically supporting the species tree topology, conflicting patterns of retroelement insertions also occur and imply high amounts of ILS across short successive internal branches, consistent with observed patterns of gene tree heterogeneity. Coalescent simulations and topology tests indicate that the majority of observed topological incongruence among gene trees is consistent with coalescent variation rather than arising from gene tree estimation error alone, and estimated branch lengths for short successive internodes in the inferred species tree fall within the theoretical range encompassing the anomaly zone. Distributions of empirical gene trees confirm that the most common gene tree topology for each marker type differs from the species tree, signifying the existence of an empirical anomaly zone in palaeognaths.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Guilherme Rezende Dias ◽  
Eduardo Guimarães Dupim ◽  
Thyago Vanderlinde ◽  
Beatriz Mello ◽  
Antonio Bernardo Carvalho

Abstract Background The Drosophilidae family is traditionally divided into two subfamilies: Drosophilinae and Steganinae. This division is based on morphological characters, and the two subfamilies have been treated as monophyletic in most of the literature, but some molecular phylogenies have suggested Steganinae to be paraphyletic. To test the paraphyletic-Steganinae hypothesis, here, we used genomic sequences of eight Drosophilidae (three Steganinae and five Drosophilinae) and two Ephydridae (outgroup) species and inferred the phylogeny for the group based on a dataset of 1,028 orthologous genes present in all species (> 1,000,000 bp). This dataset includes three genera that broke the monophyly of the subfamilies in previous works. To investigate possible biases introduced by small sample sizes and automatic gene annotation, we used the same methods to infer species trees from a set of 10 manually annotated genes that are commonly used in phylogenetics. Results Most of the 1,028 gene trees depicted Steganinae as paraphyletic with distinct topologies, but the most common topology depicted it as monophyletic (43.7% of the gene trees). Despite the high levels of gene tree heterogeneity observed, species tree inference in ASTRAL, in PhyloNet, and with the concatenation approach strongly supported the monophyly of both subfamilies for the 1,028-gene dataset. However, when using the concatenation approach to infer a species tree from the smaller set of 10 genes, we recovered Steganinae as a paraphyletic group. The pattern of gene tree heterogeneity was asymmetrical and thus could not be explained solely by incomplete lineage sorting (ILS). Conclusions Steganinae was clearly a monophyletic group in the dataset that we analyzed. In addition to ILS, gene tree discordance was possibly the result of introgression, suggesting complex branching processes during the early evolution of Drosophilidae with short speciation intervals and gene flow. Our study highlights the importance of genomic data in elucidating contentious phylogenetic relationships and suggests that phylogenetic inference for drosophilids based on small molecular datasets should be performed cautiously. Finally, we suggest an approach for the correction and cleaning of BUSCO-derived genomic datasets that will be useful to other researchers planning to use this tool for phylogenomic studies.


2021 ◽  
Author(s):  
Niklas Reichelt ◽  
Jun Wen ◽  
Claudia Paetzold ◽  
Marc Appelhans

Background and aims: Zanthoxylum L. is the only pantropical genus within Rutaceae, with a few species native to temperate eastern Asia and North America. Efforts using Sanger sequencing failed to resolve the backbone phylogeny of Zanthoxylum. In this study, we employed target enrichment high-throughput sequencing to improve resolution. Gene trees were examined for concordance and sectional classifications of Zanthoxylum were evaluated. Off-target reads were investigated to identify putative single-copy markers for bait refinement, and low-copy markers for evidence of putative hybridization events. Methods: We developed a custom bait set for target enrichment of 745 exons in Zanthoxylum and applied it to 45 Zanthoxylum species and one Tetradium species as the outgroup. Illumina reads were processed via the HybPhyloMaker pipeline. Phylogenetic inferences were conducted using coalescent and concatenated methods. Concordance was assessed using quartet sampling. Off-target reads were assembled and putative single- and low-copy genes were extracted. Additional phylogenetic analyses were performed based on these alignments. Key results: Four major clades are supported within Zanthoxylum: the African clade, the Z. asiaticum clade, the Asian-Pacific-Australian clade, and the American-eastern Asian clade. While overall support has improved, regions of conflict are similar to those previously observed. Gene tree discordances indicate a hybridization event in the ancestor of the Hawaiian lineage, and incomplete lineage sorting for the American backbone. Off-target putative single-copy genes largely confirm on-target results, and putative low-copy genes provide additional evidence for hybridization in the Hawaiian lineage. Only two of the five sections of Zanthoxylum are resolved as monophyletic. Conclusion: Target enrichment is suitable to assess phylogenetic relationships in Zanthoxylum. Our phylogenetic analyses reveal that current sectional classifications need revision. Quartet tree concordance indicates several instances of reticulate evolution. Off-target reads are proven useful to identify additional phylogenetically informative regions for bait refinement or gene tree based approaches.


2020 ◽  
Author(s):  
Mahim Mahbub ◽  
Zahin Wahab ◽  
Rezwana Reaz ◽  
M. Saifur Rahman ◽  
Md. Shamsuzzoha Bayzid

AbstractMotivationSpecies tree estimation from genes sampled from throughout the whole genome is complicated due to the gene tree-species tree discordance. Incomplete lineage sorting (ILS) is one of the most frequent causes for this discordance, where alleles can coexist in populations for periods that may span several speciation events. Quartet-based summary methods for estimating species trees from a collection of gene trees are becoming popular due to their high accuracy and statistical guarantee under ILS. Generating quartets with appropriate weights, where weights correspond to the relative importance of quartets, and subsequently amalgamating the weighted quartets to infer a single coherent species tree allows for a statistically consistent way of estimating species trees. However, handling weighted quartets is challenging.ResultsWe propose wQFM, a highly accurate method for species tree estimation from multi-locus data, by extending the quartet FM (QFM) algorithm to a weighted setting. wQFM was assessed on a collection of simulated and real biological datasets, including the avian phylogenomic dataset which is one of the largest phylogenomic datasets to date. We compared wQFM with wQMC, which is the best alternate method for weighted quartet amalgamation, and with ASTRAL, which is one of the most accurate and widely used coalescent-based species tree estimation methods. Our results suggest that wQFM matches or improves upon the accuracy of wQMC and ASTRAL.AvailabilitywQFM is available in open source form at https://github.com/Mahim1997/wQFM-2020.


Sign in / Sign up

Export Citation Format

Share Document