scholarly journals Inferring the Nature of Missing Heritability in Human Traits Using Data from the GWAS Catalog

Genetics ◽  
2019 ◽  
Vol 212 (3) ◽  
pp. 891-904 ◽  
Author(s):  
Eugenio López-Cortegano ◽  
Armando Caballero

Thousands of genes responsible for many diseases and other common traits in humans have been detected by Genome Wide Association Studies (GWAS) in the last decade. However, candidate causal variants found so far usually explain only a small fraction of the heritability estimated by family data. The most common explanation for this observation is that the missing heritability corresponds to variants, either rare or common, with very small effect, which pass undetected due to a lack of statistical power. We carried out a meta-analysis using data from the NHGRI-EBI GWAS Catalog in order to explore the observed distribution of locus effects for a set of 42 complex traits and to quantify their contribution to narrow-sense heritability. With the data at hand, we were able to predict the expected distribution of locus effects for 16 traits and diseases, their expected contribution to heritability, and the missing number of loci yet to be discovered to fully explain the familial heritability estimates. Our results indicate that, for 6 out of the 16 traits, the additive contribution of a great number of loci is unable to explain the familial (broad-sense) heritability, suggesting that the gap between GWAS and familial estimates of heritability may not ever be closed for these traits. In contrast, for the other 10 traits, the additive contribution of hundreds or thousands of loci yet to be found could potentially explain the familial heritability estimates, if this were the case. Computer simulations are used to illustrate the possible contribution from nonadditive genetic effects to the gap between GWAS and familial estimates of heritability.

2015 ◽  
Vol 14s2 ◽  
pp. CIN.S17305 ◽  
Author(s):  
Yaping Wang ◽  
Donghui Li ◽  
Peng Wei

Genome-wide association studies (GWASs) have identified thousands of single nucleotide polymorphisms (SNPs) robustly associated with hundreds of complex human diseases including cancers. However, the large number of G WAS-identified genetic loci only explains a small proportion of the disease heritability. This “missing heritability” problem has been partly attributed to the yet-to-be-identified gene-gene (G × G) and gene-environment (G × E) interactions. In spite of the important roles of G × G and G × E interactions in understanding disease mechanisms and filling in the missing heritability, straightforward GWAS scanning for such interactions has very limited statistical power, leading to few successes. Here we propose a two-step statistical approach to test G × G/G × E interactions: the first step is to perform principal component analysis (PCA) on the multiple SNPs within a gene region, and the second step is to perform Tukey's one degree-of-freedom (1-df) test on the leading PCs. We derive a score test that is computationally fast and numerically stable for the proposed Tukey's 1-df interaction test. Using extensive simulations we show that the proposed approach, which combines the two parsimonious models, namely, the PCA and Tukey's 1-df form of interaction, outperforms other state-of-the-art methods. We also demonstrate the utility and efficiency gains of the proposed method with applications to testing G × G interactions for Crohn's disease using the Wellcome Trust Case Control Consortium (WTCCC) GWAS data and testing G × E interaction using data from a case-control study of pancreatic cancer.


2020 ◽  
Author(s):  
Jiawen Chen ◽  
Jing You ◽  
Zijie Zhao ◽  
Zheng Ni ◽  
Kunling Huang ◽  
...  

AbstractPolygenic risk scores (PRS) derived from summary statistics of genome-wide association studies (GWAS) have enjoyed great popularity in human genetics research. Applied to population cohorts, PRS can effectively stratify individuals by risk group and has promising applications in early diagnosis and clinical intervention. However, our understanding of within-family polygenic risk is incomplete, in part because the small samples per family significantly limits power. Here, to address this challenge, we introduce ORIGAMI, a computational framework that uses parental genotype data to simulate offspring genomes. ORIGAMI uses state-of-the-art genetic maps to simulate realistic recombination events on phased parental genomes and allows quantifying the prospective PRS variability within each family. We quantify and showcase the substantially reduced yet highly heterogeneous PRS variation within families for numerous complex traits. Further, we incorporate within-family PRS variability to improve polygenic transmission disequilibrium test (pTDT). Through simulations, we demonstrate that modeling within-family risk substantially improves the statistical power of pTDT. Applied to 7,805 trios of autism spectrum disorder (ASD) probands and healthy parents, we successfully replicated previously reported over-transmission of ASD, educational attainment, and schizophrenia risk, and identified multiple novel traits with significant transmission disequilibrium. These results provided novel etiologic insights into the shared genetic basis of various complex traits and ASD.


2015 ◽  
Author(s):  
Guo-Bo Chen ◽  
Sang Hong Lee ◽  
Matthew R Robinson ◽  
Maciej Trzaskowski ◽  
Zhi-Xiang Zhu ◽  
...  

Genome-wide association studies (GWASs) have been successful in discovering replicable SNP-trait associations for many quantitative traits and common diseases in humans. Typically the effect sizes of SNP alleles are very small and this has led to large genome-wide association meta-analyses (GWAMA) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study we propose a new set of metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We proposed a pair of methods in examining the concordance between demographic information and summary statistics. In method I, we use the population genetics Fststatistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. In method II, we conduct principal component analysis based on reported allele frequencies, and is able to recover the ancestral information for each cohort. In addition, we propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. Finally, to quantify unknown sample overlap across all pairs of cohorts we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.


2019 ◽  
Author(s):  
Cong Guo ◽  
Karsten B. Sieber ◽  
Jorge Esparza-Gordillo ◽  
Mark R. Hurle ◽  
Kijoung Song ◽  
...  

AbstractIdentifying the effector genes from genome-wide association studies (GWAS) is a crucial step towards understanding the biological mechanisms underlying complex traits and diseases. Colocalization of expression and protein quantitative trait loci (eQTL and pQTL, hereafter collectively called “xQTL”) can be effective for mapping associations to genes in many loci. However, existing colocalization methods require full single-variant summary statistics which are often not readily available for many published GWAS or xQTL studies. Here, we present PICCOLO, a method that uses minimum SNP p-values within a locus to determine if pairs of genetic associations are colocalized. This method greatly expands the number of GWAS and xQTL datasets that can be tested for colocalization. We applied PICCOLO to 10,759 genome-wide significant associations across the NHGRI-EBI GWAS Catalog with xQTLs from 28 studies. We identified at least one colocalized gene-xQTL in at least one tissue for 30% of associations, and we pursued multiple lines of evidence to demonstrate that these mappings are biologically meaningful. PICCOLO genes are significantly enriched for biologically relevant tissues, and 4.3-fold enriched for targets of approved drugs.


2017 ◽  
Author(s):  
Oriol Canela-Xandri ◽  
Konrad Rawlik ◽  
Albert Tenesa

ABSTRACTGenome-wide association studies have revealed many loci contributing to the variation of complex traits, yet the majority of loci that contribute to the heritability of complex traits remain elusive. Large study populations with sufficient statistical power are required to detect the small effect sizes of the yet unidentified genetic variants. However, the analysis of huge cohorts, like UK Biobank, is complicated by incidental structure present when collecting such large cohorts. For instance, UK Biobank comprises 107,162 third degree or closer related participants. Traditionally, GWAS have removed related individuals because they comprised an insignificant proportion of the overall sample size, however, removing related individuals in UK Biobank would entail a substantial loss of power. Furthermore, modelling such structure using linear mixed models is computationally expensive, which requires a computational infrastructure that may not be accessible to all researchers. Here we present an atlas of genetic associations for 118 non-binary and 599 binary traits of 408,455 related and unrelated UK Biobank participants of White-British descent. Results are compiled in a publicly accessible database that allows querying genome-wide association summary results for 623,944 genotyped and HapMap2 imputed SNPs, as well downloading whole GWAS summary statistics for over 30 million imputed SNPs from the Haplotype Reference Consortium panel. Our atlas of associations (GeneATLAS,http://geneatlas.roslin.ed.ac.uk) will help researchers to query UK Biobank results in an easy way without the need to incur in high computational costs.


2016 ◽  
Author(s):  
Farhad Hormozdiari ◽  
Anthony Zhu ◽  
Gleb Kichaev ◽  
Ayellet V. Segrè ◽  
Chelsea J.-T. Ju ◽  
...  

AbstractRecent successes in genome-wide association studies (GWASs) make it possible to address important questions about the genetic architecture of complex traits, such as allele frequency and effect size. One lesser-known aspect of complex traits is the extent of allelic heterogeneity (AH) arising from multiple causal variants at a locus. We developed a computational method to infer the probability of AH and applied it to three GWAS and four expression quantitative trait loci (eQTL) datasets. We identified a total of 4152 loci with strong evidence of AH. The proportion of all loci with identified AH is 4-23% in eQTLs, 35% in GWAS of High-Density Lipoprotein (HDL), and 23% in schizophrenia. For eQTLs, we observed a strong correlation between sample size and the proportion of loci with AH (R2=0.85, P = 2.2e-16), indicating that statistical power prevents identification of AH in other loci. Understanding the extent of AH may guide the development of new methods for fine mapping and association mapping of complex traits.


2017 ◽  
Author(s):  
Qiongshi Lu ◽  
Boyang Li ◽  
Derek Ou ◽  
Margret Erlendsdottir ◽  
Ryan L. Powles ◽  
...  

AbstractDespite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits’ genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses we demonstrate that our method provides accurate covariance estimates, thus enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (Ntotal ≈ 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer’s disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD’s correlation with cognitive traits and hints at an autoimmune component for ALS.


2018 ◽  
Author(s):  
Lorin Crawford ◽  
Xiang Zhou

AbstractEpistasis, commonly defined as the interaction between genetic loci, is an important contributor to the genetic architecture underlying many complex traits and common diseases. Most existing epistatic mapping methods in genome-wide association studies explicitly search over all pairwise or higher-order interactions. However, due to the potentially large search space and the resulting multiple testing burden, these conventional approaches often suffer from heavy computational cost and low statistical power. A recently proposed attractive alternative for mapping epistasis focuses instead on detecting marginal epistasis, which is defined as the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact — thus, potentially alleviating much of the statistical and computational burden associated with conventional epistatic mapping procedures. However, previous marginal epistatic mapping methods are based on quantitative trait models. As we will show here, these lack statistical power in case-control studies. Here, we develop a liability threshold mixed model that extends marginal epistatic mapping to case-control studies. Our method properly accounts for case-control ascertainment and the binary nature of case-control data. We refer to this method as the liability threshold marginal epistasis test (LT-MAPIT). With simulations, we illustrate the benefits of LT-MAPIT in terms of providing effective type I error control, and being more powerful than both existing marginal epistatic mapping methods and conventional explicit search-based approaches in case-control data. We finally apply LT-MAPIT to identify both marginal and pairwise epistasis in seven complex diseases from the Wellcome Trust Case Control Consortium (WTCCC) 1 study.


2021 ◽  
Author(s):  
Meng Lin ◽  
Danny S. Park ◽  
Noah A. Zaitlen ◽  
Brenna M. Henn ◽  
Christopher R. Gignoux

AbstractGenome-wide association studies (GWAS) are primarily conducted in single-ancestry settings. The low transferability of results has limited our understanding of human genetic architecture across a range of complex traits. In contrast to homogeneous populations, admixed populations provide an opportunity to capture genetic architecture contributed from multiple source populations and thus improve statistical power. Here, we provide a mechanistic simulation framework to investigate the statistical power and transferability of GWAS under directional polygenic selection or varying divergence. We focus on a two-way admixed population and show that GWAS in admixed populations can be enriched for power in discovery by up to 2-fold compared to the ancestral populations under similar sample size. Moreover, higher accuracy of cross-population polygenic score estimates is also observed if variants and weights are trained in the admixed group rather than in the ancestral groups. Common variant associations are also more likely to replicate if first discovered in the admixed group and then transferred to an ancestral population, than the other way around (across 50 iterations with 1,000 causal SNPs, training on 10,000 individuals, testing on 1,000 in each population, p=3.78e-6, 6.19e-101, ~0 for FST = 0.2, 0.5, 0.8, respectively). While some of these FST values may appear extreme, we demonstrate that they are found across the entire phenome in the GWAS catalog. This framework demonstrates that investigation of admixed populations harbors significant advantages over GWAS in single-ancestry cohorts for uncovering the genetic architecture of traits and will improve downstream applications such as personalized medicine across diverse populations.


2017 ◽  
Author(s):  
Genevieve L Wojcik ◽  
Mariaelisa Graff ◽  
Katherine K Nishimura ◽  
Ran Tao ◽  
Jeffrey Haessler ◽  
...  

Summary/AbstractGenome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development, and clinical guidelines. However, the dominance of European-ancestry populations in GWAS creates a biased view of the role of human variation in disease, and hinders the equitable translation of genetic associations into clinical and public health applications. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioral phenotypes in 49,839 non-European individuals. Using strategies designed for analysis of multi-ethnic and admixed populations, we confirm 574 GWAS catalog variants across these traits, and find 38 secondary signals in known loci and 27 novel loci. Our data shows strong evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts, and insights into clinical implications. We strongly advocate for continued, large genome-wide efforts in diverse populations to reduce health disparities.


Sign in / Sign up

Export Citation Format

Share Document