scholarly journals Genome-wide Imputation Using the Practical Haplotype Graph in the Heterozygous Crop Cassava

Author(s):  
Evan M Long ◽  
Peter J Bradbury ◽  
M Cinta Romay ◽  
Edward S Buckler ◽  
Kelly R Robbins

Abstract Genomic applications such as genomic selection and genome-wide association have become increasingly common since the advent of genome sequencing. The cost of sequencing has decreased in the past two decades, however genotyping costs are still prohibitive to gathering large datasets for these genomic applications, especially in non-model species where resources are less abundant. Genotype imputation makes it possible to infer whole genome information from limited input data, making large sampling for genomic applications more feasible. Imputation becomes increasingly difficult in heterozygous species where haplotypes must be phased. The Practical Haplotype Graph is a recently developed tool that can accurately impute genotypes, using a reference panel of haplotypes. We showcase the ability of the Practical Haplotype Graph to impute genomic information in the highly heterozygous crop cassava (Manihot esculenta). Accurately phased haplotypes were sampled from runs of homozygosity across a diverse panel of individuals to populate PHG, which proved more accurate than relying on computational phasing methods. The Practical Haplotype Graph achieved high imputation accuracy, using sparse skim-sequencing input, which translated to substantial genomic prediction accuracy in cross validation testing. The Practical Haplotype Graph showed improved imputation accuracy, compared to a standard imputation tool Beagle, especially in predicting rare alleles.

2021 ◽  
Author(s):  
Evan M Long ◽  
Peter J. Bradbury ◽  
Cinta Romay ◽  
Edward Buckler ◽  
Kelly R Robbins

Genomic applications such as genomic selection and genome-wide association have become increasingly common since the advent of genome sequencing. The cost of sequencing has decreased in the past two decades, however genotyping costs are still prohibitive to gathering large datasets for these genomic applications, especially in non-model species where resources are less abundant. Genotype imputation makes it possible to infer whole genome information from limited input data, making large sampling for genomic applications more feasible. Imputation becomes increasingly difficult in heterozygous species where haplotypes must be phased. The Practical Haplotype Graph is a recently developed tool that can accurately impute genotypes, using a reference panel of haplotypes. We showcase the ability of the Practical Haplotype Graph to impute genomic information in the highly heterozygous crop cassava (Manihot esculenta). Accurately phased haplotypes were sampled from runs of homozygosity across a diverse panel of individuals to populate PHG, which proved more accurate than relying on computational phasing methods. The Practical Haplotype Graph achieved high imputation accuracy, using sparse skim-sequencing input, which translated to substantial genomic prediction accuracy in cross validation testing. The Practical Haplotype Graph showed improved imputation accuracy, compared to a standard imputation tool Beagle, especially in predicting rare alleles.


2022 ◽  
Author(s):  
Lars Wienbrandt ◽  
David Ellinghaus

Background: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods: We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results: We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions: Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.


2011 ◽  
Vol 2011 ◽  
pp. 1-10 ◽  
Author(s):  
Victor Llaca ◽  
Matthew A. Campbell ◽  
Stéphane Deschamps

Zea mays (maize) has historically been used as a model species for genetics, development, physiology and more recently, genome structure. The maize genome is complex with striking intraspecific variation in gene order, repetitive DNA content, and allelic content exceeding the levels observed between primate species. Maize genome complexity is primarily driven by polyploidization and explosive amplification of LTR retrotransposons, with the counteracting effect of unequal and illegitimate crossover. Transposable elements have been shown to capture genic content, create chimeras, and amplify those sequences via transposition. New sequencing platforms and hybridization-based strategies have appeared over the past decade which are being applied to maize and providing the first genome-wide comprehensive view of structural variation and will provide the basis for investigating the interplay between repeats and genes as well as the amount of species level diversity within maize.


2018 ◽  
Author(s):  
Candelaria Vergara ◽  
Margaret M. Parker ◽  
Liliana Franco ◽  
Michael H. Cho ◽  
Ana V. Valencia-Duarte ◽  
...  

ABSTRACTGenotype imputation is used to estimate unobserved genotypes from genome-wide maker data, to increase genome coverage and power for genome-wide association studies. Imputation has been most successful for European ancestry populations in which very large reference panels are available. Smaller subsets of African descent populations are available in 1000 Genomes (1000G), the Consortium on Asthma among African-Ancestry Populations in the Americas (CAAPA) and the Haplotype Reference Consortium (HRC). We aimed to compare the performance of these reference panels when imputing variation in 3,747 African Americans (AA) from 2 cohorts (HCV and COPDGene) genotyped using the Illumina Omni family of microarrays. The haplotypes of 2,504 individuals (from 1000G), 883 (from CAAPA) and 32,611 (from HRC) were used as reference. We compared the performance of these panels based on number of variants, imputation quality, imputation accuracy and coverage. In both cohorts, 1000G imputed 1.5–1.6x more variants compared to CAAPA and 1.2x more variants than HRC. Similar findings were observed for variants with higher imputation quality (R2>0.5) and for rare, low frequency, and common variants. When merging the results of the three panels the total number of imputed variants was 62M-63M with 20M overlapping variants imputed by all three panels, and a range of 5 to 15M unique variants imputed exclusively with one of the three panels. For overlapping variants, imputation quality was highest for HRC, followed by 1000G, then CAAPA, and improved as the minor allele frequency increased. The 1000G, HRC and CAAPA participants of African ancestry provided high performance and accuracy for imputation of African American admixed individuals, increasing the total number of variants with high quality available for subsequent analyses. These three panels are complementary and would benefit from the development of an integrated African reference panel, including data from multiple sources and populations.


2020 ◽  
Vol 15 ◽  
Author(s):  
Weiwen Zhang ◽  
Long Wang ◽  
Theint Theint Aye

Background: Asia is the largest continent in the world with a large group of populations. However, we are still in lack of an imputation server with an Asian-specific reference panel to estimate genotypes for genome wide association study in Asia. Currently, two well-known imputation servers are available, i.e., Michigan imputation server in the US and Sanger in the UK. However, the quality of imputation for Southeast Asia's populations is not satisfying by using their genotype imputation services and reference panels. Objective: In this paper, we develop ModStore imputation server with a specially designed reference panel to offer genotype imputation as a service, aiming to increase the power of genome wide association study of Singapore in the context of National Precision Medicine. Method: We present the implementation and customization of ModStore imputation server on high performance computing infrastructure. Meanwhile, we construct a reference panel based on whole-genome sequencing of Singaporeans, referred to as the SG10K reference panel, for improving the imputation accuracy of Southeast Asia's populations. Results: Experiment results show that by using the SG10K reference panel, over 79% improvement of mean Rsq can be achieved for the imputation of three Singapore ethnic populations data set, i.e., Malay, Chinese, and Indian, under MAF<0.005 compared to the 1000 Genome reference panel. Conclusion: With ModStore imputation server, genotype imputation can be performed more accurately for data derived from array-based pharmacogenomics and pre-existing Southeast Asia's population-scale genetic.


2021 ◽  
Author(s):  
Zhi Ming Xu ◽  
Sina Rüeger ◽  
Michaela Zwyer ◽  
Daniela Brites ◽  
Hellen Hiza ◽  
...  

AbstractGenome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genome of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on SNPs, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed addon SNPs to the base H3Africa array.


2021 ◽  
Author(s):  
Su Wang ◽  
Miran Kim ◽  
Xiaoqian Jiang ◽  
Arif Ozgun Harmanci

The decreasing cost of DNA sequencing has led to a great increase in our knowledge about genetic variation. While population-scale projects bring important insight into genotype-phenotype relationships, the cost of performing whole-genome sequencing on large samples is still prohibitive. In-silico genotype imputation coupled with genotyping-by-arrays is a cost-effective and accurate alternative for genotyping of common and uncommon variants. Imputation methods compare the genotypes of the typed variants with the large population-specific reference panels and estimate the genotypes of untyped variants by making use of the linkage disequilibrium patterns. Most accurate imputation methods are based on the Li-Stephens hidden Markov model, HMM, that treats the sequence of each chromosome as a mosaic of the haplotypes from the reference panel. Here we assess the accuracy of local-HMMs, where each untyped variant is imputed using the typed variants in a small window around itself (as small as 1 centimorgan). Locality-based imputation is used recently by machine learning-based genotype imputation approaches. We assess how the parameters of the local-HMMs impact the imputation accuracy in a comprehensive set of benchmarks and show that local-HMMs can accurately impute common and uncommon variants and can be relaxed to impute rare variants as well. The source code for the local HMM implementations is publicly available at https://github.com/harmancilab/LoHaMMer.


2022 ◽  
Vol 18 (1) ◽  
pp. e1009628
Author(s):  
Zhi Ming Xu ◽  
Sina Rüeger ◽  
Michaela Zwyer ◽  
Daniela Brites ◽  
Hellen Hiza ◽  
...  

Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array.


2015 ◽  
Vol 77 ◽  
pp. 159-166
Author(s):  
T.O.R. Macdonald ◽  
J.S. Rowarth ◽  
F.G. Scrimgeour

The link between dairy farm systems and cost of environmental compliance is not always clear. A survey of Waikato dairy farmers was conducted to establish the real (non-modelled) cost of compliance with environmental regulation in the region. Quantitative and qualitative data were gathered to improve understanding of compliance costs and implementation issues for a range of Waikato farm systems. The average oneoff capital cost of compliance determined through a survey approach was $1.02 per kg milksolids, $1490 per hectare and $403 per cow. Costs experienced by Waikato farmers have exceeded average economic farm surplus for the region in the past 5 years. As regulation increases there are efficiencies to be gained through implementing farm infrastructure and farm management practice to best match farm system intensity. Keywords: Dairy, compliance, farm systems, nitrogen, Waikato


Author(s):  
John D. Horner ◽  
Bartosz J. Płachno ◽  
Ulrike Bauer ◽  
Bruno Di Giusto

The ability to attract prey has long been considered a universal trait of carnivorous plants. We review studies from the past 25 years that have investigated the mechanisms by which carnivorous plants attract prey to their traps. Potential attractants include nectar, visual, olfactory, and acoustic cues. Each of these has been well documented to be effective in various species, but prey attraction is not ubiquitous among carnivorous plants. Directions for future research, especially in native habitats in the field, include: the qualitative and quantitative analysis of visual cues, volatiles, and nectar; temporal changes in attractants; synergistic action of combinations of attractants; the cost of attractants; and responses to putative attractants in electroantennograms and insect behavioral tests.


Sign in / Sign up

Export Citation Format

Share Document