genotype imputation Latest Research Papers

A comprehensive evaluation of factors affecting the accuracy of pig genotype imputation using a single or multi-breed reference population

Journal of Integrative Agriculture ◽

10.1016/s2095-3119(21)63695-x ◽

2022 ◽

Vol 21 (2) ◽

pp. 486-495

Author(s):

Kai-li ZHANG ◽

Xia PENG ◽

Sai-xian ZHANG ◽

Hui-wen ZHAN ◽

Jia-hui LU ◽

...

Keyword(s):

Comprehensive Evaluation ◽

Reference Population ◽

Genotype Imputation ◽

Factors Affecting

Download Full-text

Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009628 ◽

2022 ◽

Vol 18 (1) ◽

pp. e1009628

Author(s):

Zhi Ming Xu ◽

Sina Rüeger ◽

Michaela Zwyer ◽

Daniela Brites ◽

Hellen Hiza ◽

...

Keyword(s):

Association Studies ◽

Imputation Accuracy ◽

Genotype Imputation ◽

Small Subset ◽

Study Cohort ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Selection Of

Genome-wide association studies rely on the statistical inference of untyped variants, called imputation, to increase the coverage of genotyping arrays. However, the results are often suboptimal in populations underrepresented in existing reference panels and array designs, since the selected single nucleotide polymorphisms (SNPs) may fail to capture population-specific haplotype structures, hence the full extent of common genetic variation. Here, we propose to sequence the full genomes of a small subset of an underrepresented study cohort to inform the selection of population-specific add-on tag SNPs and to generate an internal population-specific imputation reference panel, such that the remaining array-genotyped cohort could be more accurately imputed. Using a Tanzania-based cohort as a proof-of-concept, we demonstrate the validity of our approach by showing improvements in imputation accuracy after the addition of our designed add-on tags to the base H3Africa array.

Download Full-text

Genotype calling and haplotype inference from low coverage sequence data in heterozygous plant genome using HetMap

10.21203/rs.3.rs-1220819/v1 ◽

2022 ◽

Author(s):

Hao Gong ◽

Bin Han

Keyword(s):

Wild Rice ◽

Hybrid Rice ◽

Sequence Data ◽

Genotype Imputation ◽

Plant Genome ◽

High Coverage ◽

Software Packages ◽

Heterozygous Plant ◽

Low Coverage ◽

Genotype Inference

Abstract Many software packages and pipelines had been developed to handle the sequence data of the model species. However, Genotyping from complex heterozygous plant genome needs further improvement on the previous methods. Here we present a new pipeline available at https://github.com/Ncgrhg/HetMapv1) for variant calling and missing genotype imputation from low coverage sequence data for heterozygous plant genomes. To check the performance of the HetMap on the real sequence data, HetMap was applied to both the F1 hybrid rice population which consists of 1495 samples and wild rice population with 446 samples. Four high coverage sequence hybrid rice accessions and two high coverage sequence wild rice accessions, which were also included in low coverage sequence data, are used to validate the genotype inference accuracy. The validation results showed that HetMap archived significant improvement in heterozygous genotype inference accuracy (13.65% for hybrid rice, 26.05% for wild rice) and total accuracy compared with other similar software packages. The application of the new genotype with the genome wide association study also showed improvement of association power in two wild rice phenotypes. It could archive high genotype inference accuracy with low sequence coverage with a small population size with both the natural population and constructed recombination population. HetMap provided a powerful tool for the heterozygous plant genome sequence data analysis, which may help the discover of new phenotype regions for the plant species with complex heterozygous genome.

Download Full-text

EagleImp: Fast and Accurate Genome-wide Phasing and Imputation in a Single Tool

10.1101/2022.01.11.475810 ◽

2022 ◽

Author(s):

Lars Wienbrandt ◽

David Ellinghaus

Keyword(s):

Memory Management ◽

Imputation Accuracy ◽

Simulated Data ◽

Genotype Imputation ◽

Whole Genome Sequencing Data ◽

Common Variants ◽

Sequencing Data ◽

1000 Genomes ◽

Genome Wide ◽

Reference Genomes

Background: Reference-based phasing and genotype imputation algorithms have been developed with sublinear theoretical runtime behaviour, but runtimes are still high in practice when large genome-wide reference datasets are used. Methods: We developed EagleImp, a software with algorithmic and technical improvements and new features for accurate and accelerated phasing and imputation in a single tool. Results: We compared accuracy and runtime of EagleImp with Eagle2, PBWT and prominent imputation servers using whole-genome sequencing data from the 1000 Genomes Project, the Haplotype Reference Consortium and simulated data with more than 1 million reference genomes. EagleImp is 2 to 10 times faster (depending on the single or multiprocessor configuration selected) than Eagle2/PBWT, with the same or better phasing and imputation quality in all tested scenarios. For common variants investigated in typical GWAS studies, EagleImp provides same or higher imputation accuracy than the Sanger Imputation Service, Michigan Imputation Server and the newly developed TOPMed Imputation Server, despite larger (not publicly available) reference panels. It has many new features, including automated chromosome splitting and memory management at runtime to avoid job aborts, fast reading and writing of large files, and various user-configurable algorithm and output options. Conclusions: Due to the technical optimisations, EagleImp can perform fast and accurate reference-based phasing and imputation for future very large reference panels with more than 1 million genomes. EagleImp is freely available for download from https://github.com/ikmb/eagleimp.

Download Full-text

Imputation Performance in Latin American Populations: Improving Rare Variants Representation With the Inclusion of Native American Genomes

Frontiers in Genetics ◽

10.3389/fgene.2021.719791 ◽

2022 ◽

Vol 12 ◽

Author(s):

Andrés Jiménez-Kaufmann ◽

Amanda Y. Chong ◽

Adrián Cortés ◽

Consuelo D. Quinto-Cortés ◽

Selene L. Fernandez-Valverde ◽

...

Keyword(s):

Native American ◽

Latin American ◽

Statistical Power ◽

Association Studies ◽

Demographic History ◽

Genotype Imputation ◽

Genomic Research ◽

European Ancestry ◽

Local Source ◽

Genome Wide Association Studies

Current Genome-Wide Association Studies (GWAS) rely on genotype imputation to increase statistical power, improve fine-mapping of association signals, and facilitate meta-analyses. Due to the complex demographic history of Latin America and the lack of balanced representation of Native American genomes in current imputation panels, the discovery of locally relevant disease variants is likely to be missed, limiting the scope and impact of biomedical research in these populations. Therefore, the necessity of better diversity representation in genomic databases is a scientific imperative. Here, we expand the 1,000 Genomes reference panel (1KGP) with 134 Native American genomes (1KGP + NAT) to assess imputation performance in Latin American individuals of mixed ancestry. Our panel increased the number of SNPs above the GWAS quality threshold, thus improving statistical power for association studies in the region. It also increased imputation accuracy, particularly in low-frequency variants segregating in Native American ancestry tracts. The improvement is subtle but consistent across countries and proportional to the number of genomes added from local source populations. To project the potential improvement with a higher number of reference genomes, we performed simulations and found that at least 3,000 Native American genomes are needed to equal the imputation performance of variants in European ancestry tracts. This reflects the concerning imbalance of diversity in current references and highlights the contribution of our work to reducing it while complementing efforts to improve global equity in genomic research.

Download Full-text

Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data

Frontiers in Genetics ◽

10.3389/fgene.2021.704118 ◽

2022 ◽

Vol 12 ◽

Author(s):

Tianyu Deng ◽

Pengfei Zhang ◽

Dorian Garrick ◽

Huijiang Gao ◽

Lixian Wang ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Demographic History ◽

Imputation Accuracy ◽

Reference Population ◽

Genotype Imputation ◽

Whole Genome ◽

Snp Chip ◽

A Genome ◽

Low Coverage

Genotype imputation is the term used to describe the process of inferring unobserved genotypes in a sample of individuals. It is a key step prior to a genome-wide association study (GWAS) or genomic prediction. The imputation accuracy will directly influence the results from subsequent analyses. In this simulation-based study, we investigate the accuracy of genotype imputation in relation to some factors characterizing SNP chip or low-coverage whole-genome sequencing (LCWGS) data. The factors included the imputation reference population size, the proportion of target markers /SNP density, the genetic relationship (distance) between the target population and the reference population, and the imputation method. Simulations of genotypes were based on coalescence theory accounting for the demographic history of pigs. A population of simulated founders diverged to produce four separate but related populations of descendants. The genomic data of 20,000 individuals were simulated for a 10-Mb chromosome fragment. Our results showed that the proportion of target markers or SNP density was the most critical factor affecting imputation accuracy under all imputation situations. Compared with Minimac4, Beagle5.1 reproduced higher-accuracy imputed data in most cases, more notably when imputing from the LCWGS data. Compared with SNP chip data, LCWGS provided more accurate genotype imputation. Our findings provided a relatively comprehensive insight into the accuracy of genotype imputation in a realistic population of domestic animals.

Download Full-text

Leveraging TOPMed Imputation Server and Constructing a Cohort-Specific Imputation Reference Panel to Enhance Genotype Imputation among Cystic Fibrosis Patients

Human Genetics and Genomics Advances ◽

10.1016/j.xhgg.2022.100090 ◽

2022 ◽

pp. 100090

Author(s):

Quan Sun ◽

Weifang Liu ◽

Jonathan D. Rosen ◽

Le Huang ◽

Rhonda G. Pace ◽

...

Keyword(s):

Cystic Fibrosis ◽

Genotype Imputation ◽

Reference Panel

Download Full-text

An empirical evaluation of genotype imputation of ancient DNA

10.1101/2021.12.22.473913 ◽

2021 ◽

Author(s):

Kristiina Ausmees ◽

Federico Sanchez-Quinto ◽

Mattias Jakobsson ◽

Carl Nettelblad

Keyword(s):

Ancient Dna ◽

Empirical Evaluation ◽

Genotype Imputation ◽

Systematic Evaluation ◽

High Coverage ◽

Depth Analysis ◽

Missing Genotypes ◽

And Performance ◽

Downstream Analysis ◽

Reference Bias

With capabilities of sequencing ancient DNA to high coverage often limited by sample quality or cost, imputation of missing genotypes presents a possibility to increase power of inference as well as cost-effectiveness for the analysis of ancient data. However, the high degree of uncertainty often associated with ancient DNA poses several methodological challenges, and performance of imputation methods in this context has not been fully explored. To gain further insights, we performed a systematic evaluation of imputation of ancient data using Beagle 4.0 and reference data from phase 3 of the 1000 Genomes project, investigating the effects of coverage, phased reference and study sample size. Making use of five ancient samples with high-coverage data available, we evaluated imputed data with respect to accuracy, reference bias and genetic affinities as captured by PCA. We obtained genotype concordance levels of over 99% for data with 1x coverage, and similar levels of accuracy and reference bias at levels as low as 0.75x. Our findings suggest that using imputed data can be a realistic option for various population genetic analyses even for data in coverage ranges below 1x. We also show that a large and varied phased reference set as well as the inclusion of low- to moderate-coverage ancient samples can increase imputation performance, particularly for rare alleles. In-depth analysis of imputed data with respect to genetic variants and allele frequencies gave further insight into the nature of errors arising during imputation, and can provide practical guidelines for post-processing and validation prior to downstream analysis.

Download Full-text

Leveraging TOPMed Imputation Server and Constructing a Cohort-Specific Imputation Reference Panel to Enhance Genotype Imputation among Cystic Fibrosis Patients

10.1101/2021.12.20.473535 ◽

2021 ◽

Author(s):

Quan Sun ◽

Weifang Liu ◽

Jonathan D Rosen ◽

Le Huang ◽

Rhonda G Pace ◽

...

Keyword(s):

Cystic Fibrosis ◽

Sample Size ◽

Association Studies ◽

Genetic Disorder ◽

Genome Project ◽

Genotype Imputation ◽

Reference Panel ◽

Effective Sample Size ◽

Polygenic Risk Score ◽

Genome Wide Association Studies

Cystic fibrosis (CF) is a severe genetic disorder that can cause multiple comorbidities affecting the lungs, the pancreas, the luminal digestive system and beyond. In our previous genome-wide association studies (GWAS), we genotyped ~8,000 CF samples using a mixture of different genotyping platforms. More recently, the Cystic Fibrosis Genome Project (CFGP) performed deep (~30x) whole genome sequencing (WGS) of 5,095 samples to better understand the genetic mechanisms underlying clinical heterogeneity among CF patients. For mixtures of GWAS array and WGS data, genotype imputation has proven effective in increasing effective sample size. Therefore, we first performed imputation for the ~8,000 CF samples with GWAS array genotype using the TOPMed freeze 8 reference panel. Our results demonstrate that TOPMed can provide high-quality imputation for CF patients, boosting genomic coverage from ~0.3 - 4.2 million genotyped markers to ~11 - 43 million well-imputed markers, and significantly improving Polygenic Risk Score (PRS) prediction accuracy. Furthermore, we built a CF-specific CFGP reference panel based on WGS data of CF patients. We demonstrate that despite having ~3% the sample size of TOPMed, our CFGP reference panel can still outperform TOPMed when imputing some CF disease-causing variants, likely due to allele and haplotype differences between CF patients and general populations. We anticipate our imputed data for 4,656 samples without WGS data will benefit our subsequent genetic association studies, and the CFGP reference panel built from CF WGS samples will benefit other investigators studying CF.

Download Full-text

A Joint Use of Pooling And Imputation For Genotyping SNPs

10.21203/rs.3.rs-1131930/v1 ◽

2021 ◽

Author(s):

Camille Clouard ◽

Kristiina Ausmees ◽

Carl Nettelblad

Keyword(s):

Large Scale ◽

Group Testing ◽

Snp Genotyping ◽

Genotype Imputation ◽

Model Organisms ◽

Limiting Factor ◽

Pooling Design ◽

Pooled Data ◽

Human Data ◽

Genotype Frequencies

Abstract Background: Despite continuing technological advances, the cost for large-scale genotyping of a high number of samples can be prohibitive. The purpose of this study is to design a cost-saving strategy for SNP genotyping. We suggest making use of pooling, a group testing technique, to drop the amount of SNP arrays needed. We believe that this will be of the greatest importance for non-model organisms with more limited resources in terms of cost-efficient large-scale chips and high-quality reference genomes, such as application in wildlife monitoring, plant and animal breeding, but it is in essence species-agnostic. The proposed approach consists in grouping and mixing individual DNA samples into pools before testing these pools on bead-chips, such that the number of pools is less than the number of individual samples. We present a statistical estimation algorithm, based on the pooling outcomes, for inferring marker-wise the most likely genotype of every sample in each pool. Finally, we input these estimated genotypes into existing imputation algorithms. We compare the imputation performance from pooled data with the Beagle algorithm, and a local likelihood-aware phasing algorithm closely modeled on MaCH that we implemented. Results: We conduct simulations based on human data from the 1000 Genomes Project, to aid comparison with other imputation studies. Based on the simulated data, we find that pooling impacts the genotype frequencies of the directly identifiable markers, without imputation. We also demonstrate how a combinatorial estimation of the genotype probabilities from the pooling design can improve the prediction performance of imputation models. Our algorithm achieves 93% concordance in predicting unassayed markers from pooled data, thus it outperforms the Beagle imputation model which reaches 80% concordance. We observe that the pooling design gives higher concordance for the rare variants than traditional low-density to high-density imputation commonly used for cost-effective genotyping of large cohorts. Conclusions: We present promising results for combining a pooling scheme for SNP genotyping with computational genotype imputation, as demonstrated in simulations on human data, while using half the number of assays needed for sample-wise genotyping. These results could find potential applications in any context where the genotyping costs form a limiting factor on the study size, such as in marker-assisted selection in plant breeding.

Download Full-text

genotype imputation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A comprehensive evaluation of factors affecting the accuracy of pig genotype imputation using a single or multi-breed reference population

Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations

Genotype calling and haplotype inference from low coverage sequence data in heterozygous plant genome using HetMap

EagleImp: Fast and Accurate Genome-wide Phasing and Imputation in a Single Tool

Imputation Performance in Latin American Populations: Improving Rare Variants Representation With the Inclusion of Native American Genomes

Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data

Leveraging TOPMed Imputation Server and Constructing a Cohort-Specific Imputation Reference Panel to Enhance Genotype Imputation among Cystic Fibrosis Patients

An empirical evaluation of genotype imputation of ancient DNA

Leveraging TOPMed Imputation Server and Constructing a Cohort-Specific Imputation Reference Panel to Enhance Genotype Imputation among Cystic Fibrosis Patients

A Joint Use of Pooling And Imputation For Genotyping SNPs

Export Citation Format

genotype imputationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A comprehensive evaluation of factors affecting the accuracy of pig genotype imputation using a single or multi-breed reference population

Using population-specific add-on polymorphisms to improve genotype imputation in underrepresented populations

Genotype calling and haplotype inference from low coverage sequence data in heterozygous plant genome using HetMap

EagleImp: Fast and Accurate Genome-wide Phasing and Imputation in a Single Tool

Imputation Performance in Latin American Populations: Improving Rare Variants Representation With the Inclusion of Native American Genomes

Comparison of Genotype Imputation for SNP Array and Low-Coverage Whole-Genome Sequencing Data

Leveraging TOPMed Imputation Server and Constructing a Cohort-Specific Imputation Reference Panel to Enhance Genotype Imputation among Cystic Fibrosis Patients

An empirical evaluation of genotype imputation of ancient DNA

Leveraging TOPMed Imputation Server and Constructing a Cohort-Specific Imputation Reference Panel to Enhance Genotype Imputation among Cystic Fibrosis Patients

A Joint Use of Pooling And Imputation For Genotyping SNPs

genotype imputation
Recently Published Documents