population genetic inference
Recently Published Documents


TOTAL DOCUMENTS

36
(FIVE YEARS 8)

H-INDEX

12
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Kieran Samuk ◽  
Mohamed A.F. Noor

Accurate estimates of the rate of recombination are key to understanding a host of evolutionary processes as well as the evolution of recombination rate itself. Model-based population genetic methods that infer recombination rates from patterns of linkage disequilibrium (LD) in the genome have become a popular method to estimate rates of recombination. However, these LD-based methods make a variety of simplifying assumptions about the populations of interest that are often not met in natural populations. One such assumption is the absence of gene flow from other populations. Here, we use forward-time population genetic simulations of isolation-with-migration scenarios to explore how gene flow affects the accuracy of LD-based estimators of recombination rate. We find that moderate levels of gene flow can result in either the overestimation or underestimation of recombination rates by up to 20-50% depending on the timing of divergence. We also find that these biases can affect the detection of interpopulation differences in recombination rate, causing both false positive and false negatives depending on the scenario. We discuss future possibilities for mitigating these biases and recommend that investigators exercise caution and confirm that their study populations meet assumptions before deploying these methods.


2021 ◽  
Author(s):  
Scott T O’Donnell ◽  
Sorel T Fitz-Gibbon ◽  
Victoria L Sork

Abstract Ancient introgression can be an important source of genetic variation that shapes the evolution and diversification of many taxa. Here, we estimate the timing, direction and extent of gene flow between two distantly related oak species in the same section (Quercus sect. Quercus). We estimated these demographic events using genotyping by sequencing data (GBS), which generated 25,702 single nucleotide polymorphisms (SNPs) for 24 individuals of California scrub oak (Quercus berberidifolia) and 23 individuals of Engelmann oak (Q. engelmannii). We tested several scenarios involving gene flow between these species using the diffusion approximation-based population genetic inference framework and model-testing approach of the Python package DaDi. We found that the most likely demographic scenario includes a bottleneck in Q. engelmannii that coincides with asymmetric gene flow from Q. berberidifolia into Q. engelmannii. Given that the timing of this gene flow coincides with the advent of a Mediterranean-type climate in the California Floristic Province, we propose that changing precipitation patterns and seasonality may have favored the introgression of climate-associated genes from the endemic into the non-endemic California oak.


2021 ◽  
Author(s):  
William S Pearman ◽  
Lara Urban ◽  
Alana Alexander

Reduced representation sequencing (RRS) is a widely used method to assay the diversity of genetic loci across the genome of an organism. The dominant class of RRS approaches assay loci associated with restriction sites within the genome (restriction site associated DNA sequencing, or RADseq). RADseq is frequently applied to non-model organisms since it enables population genetic studies without relying on well-characterized reference genomes. However, RADseq requires the use of many bioinformatic filters to ensure the quality of genotyping calls. These filters can have direct impacts on population genetic inference, and therefore require careful consideration. One widely used filtering approach is the removal of loci which do not conform to expectations of Hardy-Weinberg equilibrium (HWE). Despite being widely used, we show that this filtering approach is rarely described in sufficient detail to enable replication. Furthermore, through analyses of in silico and empirical datasets we show that some of the most widely used HWE filtering approaches dramatically impact inference of population structure. In particular, the removal of loci exhibiting departures from HWE after pooling across samples significantly reduces the degree of inferred population structure within a dataset (despite this approach being widely used). Based on these results, we provide recommendations for best practice regarding the implementation of HWE filtering for RADseq datasets.


2020 ◽  
Author(s):  
Yun Deng ◽  
Yun S. Song ◽  
Rasmus Nielsen

AbstractThe ancestral recombination graph (ARG) contains the full genealogical information of the sample, and many population genetic inference problems can be solved using inferred or sampled ARGs. In particular, the waiting distance between tree changes along the genome can be used to make inference about the distribution and evolution of recombination rates. To this end, we here derive an analytic expression for the distribution of waiting distances between tree changes under the sequentially Markovian coalescent model and obtain an accurate approximation to the distribution of waiting distances for topology changes. We use these results to show that some of the recently proposed methods for inferring sequences of trees along the genome provide strongly biased distributions of waiting distances. In addition, we provide a correction to an undercounting problem facing all available ARG inference methods, thereby facilitating the use of ARG inference methods to estimate temporal changes in the recombination rate.


2020 ◽  
Author(s):  
Ryan N Gutenkunst

Extracting insight from population genetic data often demands computationally intensive modeling. dadi is a popular program for fitting models of demographic history and natural selection to such data. Here, I show that running dadi on a Graphics Processing Unit (GPU) can speed computation by orders of magnitude compared to the CPU implementation, with minimal user burden. This speed increase enables the analysis of more complex models, which motivated the extension of dadi to four- and five-population models. Remarkably, dadi performs almost as well on inexpensive consumer-grade GPUs as on expensive server-grade GPUs. GPU computing thus offers large and accessible benefits to the community of dadi users. This functionality is available in dadi version 2.1.0.


2020 ◽  
Author(s):  
William S. DeWitt

AbstractSummaryCharacterization of germline mutation spectrum variation from population genomics data has shed light on the biological complexity of the mutation process, and its evolution within and between species. This analysis augments available population SNP data with estimates of local ancestral genomic context to assign mutation types and aggregate summary statistics thereof, and is increasingly common. There is a need for standardized computational tools to extract mutation spectrum information from sequencing data. Here I describe mutyper, a command-line utility and Python package that uses an ancestral genome estimate to assign mutation types to SNP data, compute mutation spectra for individuals, and compute sample frequency spectra resolved by mutation type for population genetic inference.Availability and implementationmutyper can be installed using the pip package manager and is compatible with Python 3.6+. Documentation is provided at https://harrispopgen.github.io/mutyper; source code is available at https://github.com/harrispopgen/mutyper.


2019 ◽  
Vol 68 (1) ◽  
Author(s):  
Peter Beerli ◽  
Somayeh Mashayekhi ◽  
Marjan Sadeghi ◽  
Marzieh Khodaei ◽  
Kyle Shaw

2018 ◽  
Author(s):  
Lex Flagel ◽  
Yaniv Brandvain ◽  
Daniel R. Schrider

ABSTRACTPopulation-scale genomic datasets have given researchers incredible amounts of information from which to infer evolutionary histories. Concomitant with this flood of data, theoretical and methodological advances have sought to extract information from genomic sequences to infer demographic events such as population size changes and gene flow among closely related populations/species, construct recombination maps, and uncover loci underlying recent adaptation. To date most methods make use of only one or a few summaries of the input sequences and therefore ignore potentially useful information encoded in the data. The most sophisticated of these approaches involve likelihood calculations, which require theoretical advances for each new problem, and often focus on a single aspect of the data (e.g. only allele frequency information) in the interest of mathematical and computational tractability. Directly interrogating the entirety of the input sequence data in a likelihood-free manner would thus offer a fruitful alternative. Here we accomplish this by representing DNA sequence alignments as images and using a class of deep learning methods called convolutional neural networks (CNNs) to make population genetic inferences from these images. We apply CNNs to a number of evolutionary questions and find that they frequently match or exceed the accuracy of current methods. Importantly, we show that CNNs perform accurate evolutionary model selection and parameter estimation, even on problems that have not received detailed theoretical treatments. Thus, when applied to population genetic alignments, CNN are capable of outperforming expert-derived statistical methods, and offer a new path forward in cases where no likelihood approach exists.


Sign in / Sign up

Export Citation Format

Share Document