Investigating a complex genotype-phenotype map for the development of methods to predict genetic values based on genome-wide marker data – a simulation study for the livestock perspective

N. Melzer; D. Wittenburg; D. Repsilber

doi:10.7482/0003-9438-56-037

Investigating a complex genotype-phenotype map for the development of methods to predict genetic values based on genome-wide marker data – a simulation study for the livestock perspective

Archives Animal Breeding ◽

10.7482/0003-9438-56-037 ◽

2013 ◽

Vol 56 (1) ◽

pp. 380-398

Author(s):

N. Melzer ◽

D. Wittenburg ◽

D. Repsilber

Keyword(s):

Experimental Data ◽

Genomic Selection ◽

Prediction Method ◽

Simulated Data ◽

Conventional Approach ◽

Data Simulation ◽

Single Nucleotide ◽

Genome Wide ◽

Alternative Approach ◽

Additive Genetic Effects

Abstract. Phenotypic variation can partly be explained by genetic variation, such as variation in single nucleotide polymorphism (SNP) genotypes. Genomic selection methods seek to predict genetic values (breeding values) based on SNP genotypes. To develop and to optimize these methods, simulated data are often used, which follow a rather simple genotype-phenotype map. Is the conventional approach for data simulation in this field an appropriate basis to optimize such methods in view of experimental data? Here, we present an alternative approach, striving to simulate more realistic data based on a genotype-phenotype map which includes a simulated metabolome level. This level was used to simulate genetic values, implicitly including additive and non-additive genetic effects, whereas in a conventional approach additive and dominance effects were explicitly simulated and assembled to genetic values. For both simulation approaches, different scenarios regarding numbers of quantitative trait loci (QTLs) and SNPs were analysed using fastBayesB as prediction method. We observed that our alternative map showed a smaller prediction precision (at least 3.75 %) compared to the conventional approach in all investigated scenarios. The observed degree of linearity is at least 94.12 % of the conventional approach or less. Additionally, we present results for different simulated data and experimental data to allow a comparison on a purely conceptual level. Concluding, simulating a more complex genotype-phenotype map including a molecular level, allows to study processing of variation from the genetic to the phenotype level in more detail and may prepare the ground for modern methods of genomic selection.

Download Full-text

Genetic Instrumental Variable (GIV) regression: Explaining socioeconomic and health outcomes in non-experimental data

10.1101/134197 ◽

2017 ◽

Cited By ~ 2

Author(s):

Thomas A. DiPrete ◽

Casper A.P. Burik ◽

Philipp D. Koellinger

Keyword(s):

Experimental Data ◽

Fixed Effects ◽

Genome Wide Association Study ◽

Body Height ◽

Outcome Variable ◽

Endogeneity Bias ◽

Multiple Indicators ◽

Genome Wide ◽

Alternative Approach ◽

Polygenic Scores

Identifying causal effects in non-experimental data is an enduring challenge. One proposed solution that recently gained popularity is the idea to use genes as instrumental variables (i.e. Mendelian Randomization - MR). However, this approach is problematic because many variables of interest are genetically correlated, which implies the possibility that many genes could affect both the exposure and the outcome directly or via unobserved confounding factors. Thus, pleiotropic effects of genes are themselves a source of bias in non-experimental data that would also undermine the ability of MR to correct for endogeneity bias from non-genetic sources. Here, we propose an alternative approach, GIV regression, that provides estimates for the effect of an exposure on an outcome in the presence of pleiotropy. As a valuable byproduct, GIV regression also provides accurate estimates of the chip heritability of the outcome variable. GIV regression uses polygenic scores (PGS) for the outcome of interest which can be constructed from genome-wide association study (GWAS) results. By splitting the GWAS sample for the outcome into non-overlapping subsamples, we obtain multiple indicators of the outcome PGS that can be used as instruments for each other, and, in combination with other methods such as sibling fixed effects, can address endogeneity bias from both pleiotropy and the environment. In two empirical applications, we demonstrate that our approach produces reasonable estimates of the chip heritability of educational attainment (EA) and show that standard regression and MR provide upwardly biased estimates of the effect of body height on EA.

Download Full-text

EpiGEN: an epistasis simulation pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa245 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4957-4959

Author(s):

David B Blumenthal ◽

Lorenzo Viola ◽

Markus List ◽

Jan Baumbach ◽

Paolo Tieri ◽

...

Keyword(s):

Arbitrary Order ◽

Association Studies ◽

Simulated Data ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Supplementary Data ◽

Single Nucleotide ◽

Genome Wide

Abstract Summary Simulated data are crucial for evaluating epistasis detection tools in genome-wide association studies. Existing simulators are limited, as they do not account for linkage disequilibrium (LD), support limited interaction models of single nucleotide polymorphisms (SNPs) and only dichotomous phenotypes or depend on proprietary software. In contrast, EpiGEN supports SNP interactions of arbitrary order, produces realistic LD patterns and generates both categorical and quantitative phenotypes. Availability and implementation EpiGEN is implemented in Python 3 and is freely available at https://github.com/baumbachlab/epigen. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Quantile regression in genomic selection for oligogenic traits in autogamous plants: A simulation study

PLoS ONE ◽

10.1371/journal.pone.0243666 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0243666

Author(s):

Gabriela França Oliveira ◽

Ana Carolina Campana Nascimento ◽

Moysés Nascimento ◽

Isabela de Castro Sant'Anna ◽

Juan Vicente Romero ◽

...

Keyword(s):

Quantile Regression ◽

Genomic Selection ◽

Simulated Data ◽

Fourth Generation ◽

Plant Populations ◽

Favorable Alleles ◽

Genome Wide ◽

Selection Intensities ◽

Genetic Value ◽

Autogamous Plant

This study assessed the efficiency of Genomic selection (GS) or genome‐wide selection (GWS), based on Regularized Quantile Regression (RQR), in the selection of genotypes to breed autogamous plant populations with oligogenic traits. To this end, simulated data of an F2 population were used, with traits with different heritability levels (0.10, 0.20 and 0.40), controlled by four genes. The generations were advanced (up to F6) at two selection intensities (10% and 20%). The genomic genetic value was computed by RQR for different quantiles (0.10, 0.50 and 0.90), and by the traditional GWS methods, specifically RR-BLUP and BLASSO. A second objective was to find the statistical methodology that allows the fastest fixation of favorable alleles. In general, the results of the RQR model were better than or equal to those of traditional GWS methodologies, achieving the fixation of favorable alleles in most of the evaluated scenarios. At a heritability level of 0.40 and a selection intensity of 10%, RQR (0.50) was the only methodology that fixed the alleles quickly, i.e., in the fourth generation. Thus, it was concluded that the application of RQR in plant breeding, to simulated autogamous plant populations with oligogenic traits, could reduce time and consequently costs, due to the reduction of selfing generations to fix alleles in the evaluated scenarios.

Download Full-text

Bayesian models with dominance effects for genomic evaluation of quantitative traits

Genetics Research ◽

10.1017/s0016672312000018 ◽

2012 ◽

Vol 94 (1) ◽

pp. 21-37 ◽

Cited By ~ 50

Author(s):

ROBIN WELLMANN ◽

JÖRN BENNEWITZ

Keyword(s):

Genomic Selection ◽

Bayesian Methods ◽

Mate Selection ◽

Quantitative Traits ◽

Main Research ◽

Genome Wide ◽

Standard Tool ◽

Additive Genetic Effects ◽

Subsequent Selection ◽

Selection Of

SummaryGenomic selection refers to the use of dense, genome-wide markers for the prediction of breeding values (BV) and subsequent selection of breeding individuals. It has become a standard tool in livestock and plant breeding for accelerating genetic gain. The core of genomic selection is the prediction of a large number of marker effects from a limited number of observations. Various Bayesian methods that successfully cope with this challenge are known. Until now, the main research emphasis has been on additive genetic effects. Dominance coefficients of quantitative trait loci (QTLs), however, can also be large, even if dominance variance and inbreeding depression are relatively small. Considering dominance might contribute to the accuracy of genomic selection and serve as a guide for choosing mating pairs with good combining abilities. A general hierarchical Bayesian model for genomic selection that can realistically account for dominance is introduced. Several submodels are proposed and compared with respect to their ability to predict genomic BV, dominance deviations and genotypic values (GV) by stochastic simulation. These submodels differ in the way the dependency between additive and dominance effects is modelled. Depending on the marker panel, the inclusion of dominance effects increased the accuracy of GV by about 17% and the accuracy of genomic BV by 2% in the offspring. Furthermore, it slowed down the decrease of the accuracies in subsequent generations. It was possible to obtain accurate estimates of GV, which enables mate selection programmes.

Download Full-text

Simulated Data for Genomic Selection and Genome-Wide Association Studies Using a Combination of Coalescent and Gene Drop Methods

G3 Genes|Genome|Genetics ◽

10.1534/g3.111.001297 ◽

2012 ◽

Vol 2 (4) ◽

pp. 425-427 ◽

Cited By ~ 41

Author(s):

John M. Hickey ◽

Gregor Gorjanc

Keyword(s):

Genomic Selection ◽

Association Studies ◽

Simulated Data ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Genome-Wide Association Study (GWAS) for Bunch Components of The Interspecific Population of Elaeis oleifera and Elaeis guineensis

Jurnal Penelitian Kelapa Sawit ◽

10.22302/iopri.jur.jpks.v29i2.144 ◽

2021 ◽

Vol 29 (2) ◽

pp. 97-114

Author(s):

Heri Adriwan Siregar ◽

Edy Suprianto ◽

Sujadi Sujadi ◽

Hernawan Y Rahmadi ◽

Mohamad Arif ◽

...

Keyword(s):

Oil Palm ◽

Genome Wide Association Study ◽

Elaeis Guineensis ◽

Genotyping By Sequencing ◽

Breeding Program ◽

Phenotypic Data ◽

Single Nucleotide ◽

Elaeis Oleifera ◽

Genome Wide ◽

Alternative Approach

The oil palm breeding program for the species Elaeis guineensis and the backcross Elaeis oleifera is running slowly because oil palm is an annual plant. Therefore, it is necessary to have an alternative approach that can accelerate the oil palm breeding program. The SNP (single nucleotide polymorphism) genome-wide approach was then used to study the association between 18 phenotypes of bunch component in oil palm germplasm of E. oleifera from Suriname and Brazil Coari, some interspecific hybrids and some elite progeny of E. guineensis. The genotyping by sequencing (GBS) analysis produced a total of 459 million or approximately 798 thousand reads per sample and 3,252 SNPs were eligible for 456 genotypes. Using various association models, eleven normalized phenotypic data showed significant associations with 29 SNPs. Based on the annotations, 17 SNPs were related to genes wtih certain biological functions. Three SNPs were found to be at the exon of a gene, namely SNP4416, SNP349 and SNP3865, while the other 15 SNPs were at the intragenic to a gene. Four SNPs are common SNPs in phenotypes C16:0 and C18:1 as weel as in C20 0 and C20:1. This research shows the potential of SNPs that can be used as an alternative approach to E. oleifera backcross breeding, although further research is needed for validation purposes.

Download Full-text

Locally Epistatic Models for Genome-wide Prediction and Association by Importance Sampling

10.1101/046177 ◽

2016 ◽

Author(s):

Deniz Akdemir ◽

Jean-Luc Jannink

Keyword(s):

Complex Traits ◽

Simulated Data ◽

Genetic Effects ◽

Data Sets ◽

Phenotypic Variance ◽

Modeling Methodology ◽

Genome Wide ◽

Marker Annotations ◽

Additive Genetic Effects ◽

Simulated Data Sets

AbstractIn statistical genetics an important task involves building predictive models for the genotype-phenotype relationships and thus attribute a proportion of the total phenotypic variance to the variation in genotypes. Numerous models have been proposed to incorporate additive genetic effects into models for prediction or association. However, there is a scarcity of models that can adequately account for gene by gene or other forms of genetical interactions. In addition, there is an increased interest in using marker annotations in genome-wide prediction and association. In this paper, we discuss an hybrid modeling methodology which combines the parametric mixed modeling approach and the non-parametric rule ensembles. This approach gives us a flexible class of models that can be used to capture additive, locally epistatic genetic effects, gene x background interactions and allows us to incorporate one or more annotations into the genomic selection or association models. We use benchmark data sets covering a range of organisms and traits in addition to simulated data sets to illustrate the strengths of this approach. The improvement of model accuracies and association results suggest that a part of the ’’missing heritability” in complex traits can be captured by modeling local epistasis.

Download Full-text

Inference of multiple-wave population admixture by modeling decay of linkage disequilibrium with polynomial functions

10.1101/082644 ◽

2016 ◽

Author(s):

Ying Zhou ◽

Kai Yuan ◽

Yaoliang Yu ◽

Xumin Ni ◽

Pengtao Xie ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Simulated Data ◽

Population Admixture ◽

Multiple Wave ◽

Single Nucleotide ◽

Genome Wide ◽

Complex Population ◽

Important Challenge ◽

Source Populations ◽

Admixture Linkage Disequilibrium

AbstractTo infer the histories of population admixture, one important challenge with methods based on the admixture linkage disequilibrium (ALD) is to get rid of the effect of source LD (SLD) which is directly inherited from source populations. In previous methods, only the decay curve of weighted LD between pairs of sites whose genetic distance were larger than a certain starting distance was fitted by single or multiple exponential functions, for the inference of recent single- or multiple-wave of admixture. However, the effect of SLD has not been well defined and no tool has been developed to estimate the effect of SLD on weighted LD decay. In this study, we defined the SLD in the formularized weighted LD statistic under the two-way admixture model, and proposed polynomial spectrum (p-spectrum) to study the weighted SLD and weighted LD. We also found reference populations could be used to reduce the SLD in weighted LD statistic. We further developed a method, iMAAPs, to infer Multiple-wave Admixture by fitting ALD using Polynomial spectrum. We evaluated the performance of iMAAPs under various admixture models in simulated data and applied iMAAPs into analysis of genome-wide single nucleotide polymorphism data from the Human Genome Diversity Project (HGDP) and the HapMap Project. We showed that iMAAPs is a considerable improvement over other current methods and further facilitates the inference of the histories of complex population admixtures.

Download Full-text

Nonparametric Disequilibrium Mapping of Functional Sites Using Haplotypes of Multiple Tightly Linked Single-Nucleotide Polymorphism Markers

Genetics ◽

10.1093/genetics/164.3.1175 ◽

2003 ◽

Vol 164 (3) ◽

pp. 1175-1187

Author(s):

Rong Cheng ◽

Jennie Z Ma ◽

Fred A Wright ◽

Shili Lin ◽

Xin Gao ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Simulated Data ◽

Haplotype Frequency ◽

Nucleotide Polymorphisms ◽

Data Set ◽

Single Nucleotide ◽

Functional Sites ◽

Genome Wide ◽

Snp Map ◽

Risk Of Disease

Abstract As the speed and efficiency of genotyping single-nucleotide polymorphisms (SNPs) increase, using the SNP map, it becomes possible to evaluate the extent to which a common haplotype contributes to the risk of disease. In this study we propose a new procedure for mapping functional sites or regions of a candidate gene of interest using multiple linked SNPs. Based on a case-parent trio family design, we use expectation-maximization (EM) algorithm-derived haplotype frequency estimates of multiple tightly linked SNPs from both unambiguous and ambiguous families to construct a contingency statistic S for linkage disequilibrium (LD) analysis. In the procedure, a moving-window scan for functional SNP sites or regions can cover an unlimited number of loci except for the limitation of computer storage. Within a window, all possible widths of haplotypes are utilized to find the maximum statistic S* for each site (or locus). Furthermore, this method can be applied to regional or genome-wide scanning for determining linkage disequilibrium using SNPs. The sensitivity of the proposed procedure was examined on the simulated data set from the Genetic Analysis Workshop (GAW) 12. Compared with the conventional and generalized TDT methods, our procedure is more flexible and powerful.

Download Full-text