genotype probabilities
Recently Published Documents


TOTAL DOCUMENTS

31
(FIVE YEARS 8)

H-INDEX

9
(FIVE YEARS 1)

Author(s):  
Emil Jørsboe ◽  
Anders Albrechtsen

Abstract Association studies using genetic data from SNP-chip-based imputation or low-depth sequencing data provide a cost-efficient design for large-scale association studies. We explore methods for performing association studies applicable to such genetic data and investigate how using different priors when estimating genotype probabilities affects the association results. Our proposed method, ANGSD-asso’s latent model, models the unobserved genotype as a latent variable in a generalized linear model framework. The software is implemented in C/C++ and can be run multi-threaded. ANGSD-asso is based on genotype probabilities, which can be estimated using either the sample allele frequency or the individual allele frequencies as a prior. We explore through simulations how genotype probability-based methods compare with using genetic dosages. Our simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. In scenarios with sequencing depth and phenotype correlation ANGSD-asso’s latent model has higher statistical power and less bias than using dosages. Adding additional covariates to the linear model of ANGSD-asso’s latent model has higher statistical power and less bias than other methods that accommodate genotype uncertainty, while also being much faster. This is shown with imputed data from UK Biobank and simulations.


2021 ◽  
Vol 12 ◽  
Author(s):  
Katharina Stahl ◽  
Damian Gola ◽  
Inke R. König

Despite the widespread use of genotype imputation tools and the availability of different approaches, late developments of currently used programs have not been compared comprehensively. We therefore assessed the performance of 35 combinations of phasing and imputation programs, including versions of SHAPEIT, Eagle, Beagle, minimac, PBWT, and IMPUTE, for genetic imputation of completely missing SNPs with a HRC reference panel regarding quality and speed. We used a data set comprising 1,149 fully sequenced individuals from the German population, subsetting the SNPs to approximate the Illumina Infinium-Omni5 array. Five hundred fifty-three thousand two hundred and thirty-four SNPs across two selected chromosomes were utilized for comparison between imputed and sequenced genotypes. We found that all tested programs with the exception of PBWT impute genotypes with very high accuracy (mean error rate < 0.005). PBTW hardly ever imputes the less frequent allele correctly (mean concordance for genotypes including the minor allele <0.0002). For all programs, imputation accuracy drops for rare alleles with a frequency <0.05. Even though overall concordance is high, concordance drops with genotype probability, indicating that low genotype probabilities are rare. The mean concordance of SNPs with a genotype probability <95% drops below 0.9, at which point disregarding imputed genotypes might prove favorable. For fast and accurate imputation, a combination of Eagle2.4.1 using a reference panel for phasing and Beagle5.1 for imputation performs best. Replacing Beagle5.1 with minimac3, minimac4, Beagle4.1, or IMPUTE4 results in a small gain in accuracy at a high cost of speed.


Author(s):  
Yasuhiro Sato ◽  
Kazuya Takeda ◽  
Atsushi J Nagano

Abstract Phenotypes of sessile organisms, such as plants, rely not only on their own genotypes but also on those of neighboring individuals. Previously, we incorporated such neighbor effects into a single-marker regression using the Ising model of ferromagnetism. However, little is known regarding how neighbor effects should be incorporated in quantitative trait locus (QTL) mapping. In this study, we propose a new method for interval QTL mapping of neighbor effects, designated” neighbor QTL,” the algorithm of which includes: (i) obtaining conditional self-genotype probabilities with recombination fraction between flanking markers; (ii) calculating conditional neighbor genotypic identity using the self-genotype probabilities; and (iii) estimating additive and dominance deviations for neighbor effects. Our simulation using F2 and backcross lines showed that the power to detect neighbor effects increased as the effective range decreased. The neighbor QTL was applied to insect herbivory on Col × Kas recombinant inbred lines of Arabidopsis thaliana. Consistent with previous results, the pilot experiment detected a self-QTL effect on the herbivory at the GLABRA1 locus. Regarding neighbor QTL effects on herbivory, we observed a weak QTL on the top of chromosome 4, at which a weak self-bolting QTL was also identified. The neighbor QTL method is available as an R package ( https://cran.r-project.org/package=rNeighborQTL ), providing a novel tool to investigate neighbor effects in QTL studies.


Author(s):  
Chaozhi Zheng ◽  
Rodrigo R. Amadeu ◽  
Patricio R. Munoz ◽  
Jeffrey B. Endelman

AbstractIn diploid species, many multi-parental populations have been developed to increase genetic diversity and quantitative trait loci (QTL) mapping resolution. In these populations, haplotype reconstruction has been used as a standard practice to increase QTL detection power in comparison with the marker-based association analysis. To realize similar benefits in tetraploid species (and eventually higher ploidy levels), a statistical framework for haplotype reconstruction has been developed and implemented in the software PolyOrigin for connected tetraploid F1 populations with shared parents. Haplotype reconstruction proceeds in two steps: first, parental genotypes are phased based on multi-locus linkage analysis; second, genotype probabilities for the parental alleles are inferred in the progeny. PolyOrigin can utilize genetic marker data from single nucleotide polymorphism (SNP) arrays or from sequence-based genotyping; in the latter case, bi-allelic read counts can be used (and are preferred) as input data to minimize the influence of genotype call errors at low depth. To account for errors in the input map, PolyOrigin includes functionality for filtering markers, inferring inter-marker distances, and refining local marker ordering. Simulation studies were used to investigate the effect of several variables on the accuracy of haplotype reconstruction, including the mating design, the number of parents, population size, and sequencing depth. PolyOrigin was further evaluated using an autotetraploid potato dataset with a 3×3 half-diallel mating design. In conclusion, PolyOrigin opens up exciting new possibilities for haplotype analysis in tetraploid breeding populations.


2020 ◽  
Author(s):  
Yasuhiro Sato ◽  
Kazuya Takeda ◽  
Atsushi J. Nagano

AbstractPhenotypes of sessile organisms, such as plants, rely not only on their own genotype but also on the genotypes of neighboring individuals. Previously, we incorporated such neighbor effects into a single-marker regression using the Ising model of ferromagnetism. However, little is known about how to incorporate neighbor effects in quantitative trait locus (QTL) mapping. In this study, we propose a new method for interval QTL mapping of neighbor effects, named “Neighbor QTL”. The algorithm of neighbor QTL involves the following: (i) obtaining conditional self-genotype probabilities with recombination fraction between flanking markers, (ii) calculating neighbor genotypic identity using the self-genotype probabilities, and (iii) estimating additive and dominance deviation for neighbor effects. Our simulation using F2 and backcross lines showed that the power to detect neighbor effects increased as the effective range became smaller. The neighbor QTL was applied to insect herbivory on Col × Kas recombinant inbred lines of Arabidopsis thaliana. Consistent with previous evidence, the pilot experiment detected a self QTL effect on the herbivory at GLABRA1 locus. We also observed a weak QTL on chromosome 4 regarding neighbor effects on the herbivory. The neighbor QTL method is available as an R package (https://cran.r-project.org/package=rNeighborQTL), providing a novel tool to investigate neighbor effects in QTL studies.


2019 ◽  
Author(s):  
Emil Jørsboe ◽  
Anders Albrechtsen

1AbstractIntroductionAssociation studies using genetic data from SNP-chip based imputation or low depth sequencing data provide a cost efficient design for large scale studies. However, these approaches provide genetic data with uncertainty of the observed genotypes. Here we explore association methods that can be applied to data where the genotype is not directly observed. We investigate how using different priors when estimating genotype probabilities affects the association results in different scenarios such as studies with population structure and varying depth sequencing data. We also suggest a method (ANGSD-asso) that is computational feasible for analysing large scale low depth sequencing data sets, such as can be generated by the non-invasive prenatal testing (NIPT) with low-pass sequencing.MethodsANGSD-asso’s EM model works by modelling the unobserved genotype as a latent variable in a generalised linear model framework. The software is implemented in C/C++ and can be run multi-threaded enabling the analysis of big data sets. ANGSD-asso is based on genotype probabilities, they can be estimated in various ways, such as using the sample allele frequency as a prior, using the individual allele frequencies as a prior or using haplotype frequencies from haplotype imputation. Using simulations of sequencing data we explore how genotype probability based method compares to using genetic dosages in large association studies with genotype uncertainty.Results & DiscussionOur simulations show that in a structured population using the individual allele frequency prior has better power than the sample allele frequency. If there is a correlation between genotype uncertainty and phenotype, then the individual allele frequency prior also helps control the false positive rate. In the absence of population structure the sample allele frequency prior and the individual allele frequency prior perform similarly. In scenarios with sequencing depth and phenotype correlation ANGSD-asso’s EM model has better statistical power and less bias compared to using dosages. Lastly when adding additional covariates to the linear model ANGSD-asso’s EM model has more statistical power and provides less biased effect sizes than other methods that accommodate genotype uncertainly, while also being much faster. This makes it possible to properly account for genotype uncertainty in large scale association studies.


2019 ◽  
Vol 35 (21) ◽  
pp. 4321-4326
Author(s):  
Mark Abney ◽  
Aisha ElSherbiny

Abstract Motivation Genotype imputation, though generally accurate, often results in many genotypes being poorly imputed, particularly in studies where the individuals are not well represented by standard reference panels. When individuals in the study share regions of the genome identical by descent (IBD), it is possible to use this information in combination with a study-specific reference panel (SSRP) to improve the imputation results. Kinpute uses IBD information—due to recent, familial relatedness or distant, unknown ancestors—in conjunction with the output from linkage disequilibrium (LD) based imputation methods to compute more accurate genotype probabilities. Kinpute uses a novel method for IBD imputation, which works even in the absence of a pedigree, and results in substantially improved imputation quality. Results Given initial estimates of average IBD between subjects in the study sample, Kinpute uses a novel algorithm to select an optimal set of individuals to sequence and use as an SSRP. Kinpute is designed to use as input both this SSRP and the genotype probabilities output from other LD-based imputation software, and uses a new method to combine the LD imputed genotype probabilities with IBD configurations to substantially improve imputation. We tested Kinpute on a human population isolate where 98 individuals have been sequenced. In half of this sample, whose sequence data was masked, we used Impute2 to perform LD-based imputation and Kinpute was used to obtain higher accuracy genotype probabilities. Measures of imputation accuracy improved significantly, particularly for those genotypes that Impute2 imputed with low certainty. Availability and implementation Kinpute is an open-source and freely available C++ software package that can be downloaded from. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Mark Abney ◽  
Aisha El Sherbiny

1AbstractMotivationGenotype imputation, though generally accurate, often results in many genotypes being poorly imputed, particularly in studies where the individuals are not well represented by standard reference panels. When individuals in the study share regions of the genome identical by descent (IBD), it is possible to use this information in combination with a study specific reference panel (SSRP) to improve the imputation results. Kinpute uses IBD information—due to either recent, familial relatedness or distant, unknown ancestors— in conjunction with the output from linkage disequilibrium (LD) based imputation methods to compute more accurate genotype probabilities. Kinpute uses a novel method for IBD imputation, which works even in the absence of a pedigree, and results in substantially improved imputation quality.ResultsGiven initial estimates of average IBD between subjects in the study sample, Kinpute uses a novel algorithm to select an optimal set of individuals to sequence and use as an SSRP. Kinpute is designed to use as input both this SSRP and the genotype probabilities output from other LD based imputation software, and uses a new method to combine the LD imputed genotype probabilities with IBD configurations to substantially improve imputation. We tested Kinpute on a human population isolate where 98 individuals have been sequenced. In half of this sample, whose sequence data was masked, we used Impute2 to perform LD based imputation and Kinpute was used to obtain higher accuracy genotype probabilities. Measures of imputation accuracy improved significantly, particularly for those genotypes that Impute2 imputed with low certainty.AvailabilityKinpute is an open-source and freely available C++ software package that can be downloaded from https://github.com/markabney/Kinpute/releases.


2017 ◽  
Author(s):  
Frank Technow ◽  
Justin Gerke

AbstractThe increased usage of whole-genome selection (WGS) and other molecular evaluation methods in plant breeding relies on the ability to genotype a very large number of untested individuals in each breeding cycle. Many plant breeding programs evaluate large biparental populations of homozygous individuals derived from homozygous parent inbred lines. This structure lends itself to parent-progeny imputation, which transfers the genotype scores of the parents to progeny individuals that are genotyped for a much smaller number of loci. Here we introduce a parent-progeny imputation method that infers individual genotypes from index-free pooled samples of DNA of multiple individuals using a Hidden Markov Model (HMM). We demonstrated the method for pools of simulated maize double haploids (DH) from biparental populations, genotyped using a genotyping by sequencing (GBS) approach for 3,000 loci at 0.125xto 4xcoverage. We observed high concordance between true and imputed marker scores and the HMM produced well-calibrated genotype probabilities that correctly reflected the uncertainty of the imputed scores. Genomic estimated breeding values (GEBV) calculated from the imputed scores closely matched GEBV calculated from the true marker scores. The within-population correlation between these sets of GEBV approached 0.95 at 1xand 4xcoverage when pooling two or four individuals, respectively. Our approach can reduce the genotyping cost per individual by a factor up to the number of pooled individuals in GBS applications without the need for extra sequencing coverage, thereby enabling cost-effective large scale genotyping for applications such as WGS in plant breeding.


Sign in / Sign up

Export Citation Format

Share Document