PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics

Mapping Intimacies ◽

10.1101/148627 ◽

2017 ◽

Cited By ~ 4

Author(s):

Jie Zheng ◽

Tom G. Richardson ◽

Louise A. C. Millard ◽

Gibran Hemani ◽

Christopher Raistrick ◽

...

Keyword(s):

Complex Traits ◽

Multiple Testing ◽

Large Scale ◽

Genome Wide Association Study ◽

Phenotypic Correlation ◽

Summary Statistics ◽

Multiple Testing Correction ◽

Individual Level ◽

Phenotypic Correlations ◽

Complex Human Traits

AbstractBackgroundIdentifying phenotypic correlations between complex traits and diseases can provide useful etiological insights. Restricted access to individual-level phenotype data makes it difficult to estimate large-scale phenotypic correlation across the human phenome. State-of-the-art methods, metaCCA and LD score regression, provide an alternative approach to estimate phenotypic correlation using genome-wide association study (GWAS) summary statistics.ResultsHere, we present an integrated R toolkit, PhenoSpD, to 1) apply metaCCA (or LD score regression) to estimate phenotypic correlations using GWAS summary statistics; and 2) to utilize the estimated phenotypic correlations to inform correction of multiple testing for complex human traits using the spectral decomposition of matrices (SpD). The simulations suggest it is possible to estimate phenotypic correlation using samples with only a partial overlap, but as overlap decreases correlations will attenuate towards zero and multiple testing correction will be more stringent than in perfectly overlapping samples. In a case study, PhenoSpD using GWAS results suggested 324.4 independent tests among 452 metabolites, which is close to the 296 independent tests estimated using true phenotypic correlation. We further applied PhenoSpD to estimated 7,503 pair-wise phenotypic correlations among 123 metabolites using GWAS summary statistics from Kettunen et al. and PhenoSpD suggested 44.9 number of independent tests for theses metabolites.ConclusionPhenoSpD integrates existing methods and provides a simple and conservative way to reduce dimensionality for complex human traits using GWAS summary statistics, which is particularly valuable for post-GWAS analysis of complex molecular traits.AvailabilityR code and documentation for PhenoSpD V1.0.0 is available online (https://github.com/MRCIEU/PhenoSpD).

Download Full-text

Improved estimation of phenotypic correlations using summary association statistics

10.1101/2020.12.10.419325 ◽

2020 ◽

Author(s):

Xia Shen ◽

Ting Li ◽

Zheng Ning

Keyword(s):

Complex Traits ◽

State Of The Art ◽

Phenotypic Correlation ◽

Summary Statistics ◽

Z Score ◽

Phenotypic Correlations ◽

Simple Strategy ◽

Null Effect ◽

Correlation Estimation ◽

Genome Wide

Estimating the phenotypic correlations between complex traits and diseases based on their genome-wide association summary statistics has been a useful technique in genetic epidemiology and statistical genetics inference. Two state-of-the-art strategies, Z-score correlation across null-effect SNPs and LD score regression intercept, were widely applied to estimate phenotypic correlations. Here, we propose an improved Z-score correlation strategy based on SNPs with low minor allele frequencies (MAFs), and show how this simple strategy can correct the bias generated by the current methods. Comparing to LDSC, the low-MAF estimator improves phenotypic correlation estimation thus is beneficial for methods and applications using phenotypic correlations inferred from summary association statistics.

Download Full-text

Capturing SNP Association across the NK Receptor and HLA Gene Regions in Multiple Sclerosis by Targeted Penalised Regression Models

Genes ◽

10.3390/genes13010087 ◽

2021 ◽

Vol 13 (1) ◽

pp. 87

Author(s):

Sean M. Burnard ◽

Rodney A. Lea ◽

Miles Benton ◽

David Eccles ◽

Daniel W. Kennedy ◽

...

Keyword(s):

Multiple Sclerosis ◽

Complex Traits ◽

Multiple Testing ◽

Large Scale ◽

Disease Risk ◽

Association Studies ◽

Meta Analysis ◽

Elastic Net ◽

Genome Wide Association Studies ◽

Multiple Testing Correction

Conventional genome-wide association studies (GWASs) of complex traits, such as Multiple Sclerosis (MS), are reliant on per-SNP p-values and are therefore heavily burdened by multiple testing correction. Thus, in order to detect more subtle alterations, ever increasing sample sizes are required, while ignoring potentially valuable information that is readily available in existing datasets. To overcome this, we used penalised regression incorporating elastic net with a stability selection method by iterative subsampling to detect the potential interaction of loci with MS risk. Through re-analysis of the ANZgene dataset (1617 cases and 1988 controls) and an IMSGC dataset as a replication cohort (1313 cases and 1458 controls), we identified new association signals for MS predisposition, including SNPs above and below conventional significance thresholds while targeting two natural killer receptor loci and the well-established HLA loci. For example, rs2844482 (98.1% iterations), otherwise ignored by conventional statistics (p = 0.673) in the same dataset, was independently strongly associated with MS in another GWAS that required more than 40 times the number of cases (~45 K). Further comparison of our hits to those present in a large-scale meta-analysis, confirmed that the majority of SNPs identified by the elastic net model reached conventional statistical GWAS thresholds (p < 5 × 10−8) in this much larger dataset. Moreover, we found that gene variants involved in oxidative stress, in addition to innate immunity, were associated with MS. Overall, this study highlights the benefit of using more advanced statistical methods to (re-)analyse subtle genetic variation among loci that have a biological basis for their contribution to disease risk.

Download Full-text

Approximate conditional phenotype analysis based on genome wide association summary statistics

Scientific Reports ◽

10.1038/s41598-021-82000-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Peitao Wu ◽

Biqi Wang ◽

Steven A. Lubitz ◽

Emelia J. Benjamin ◽

James B. Meigs ◽

...

Keyword(s):

Large Scale ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Summary Statistics ◽

Phenotypic Data ◽

Individual Level ◽

Genome Wide ◽

Level Data ◽

A Genome ◽

Phenotype Analysis

AbstractBecause single genetic variants may have pleiotropic effects, one trait can be a confounder in a genome-wide association study (GWAS) that aims to identify loci associated with another trait. A typical approach to address this issue is to perform an additional analysis adjusting for the confounder. However, obtaining conditional results can be time-consuming. We propose an approximate conditional phenotype analysis based on GWAS summary statistics, the covariance between outcome and confounder, and the variant minor allele frequency (MAF). GWAS summary statistics and MAF are taken from GWAS meta-analysis results while the traits covariance may be estimated by two strategies: (i) estimates from a subset of the phenotypic data; or (ii) estimates from published studies. We compare our two strategies with estimates using individual level data from the full GWAS sample (gold standard). A simulation study for both binary and continuous traits demonstrates that our approximate approach is accurate. We apply our method to the Framingham Heart Study (FHS) GWAS and to large-scale cardiometabolic GWAS results. We observed a high consistency of genetic effect size estimates between our method and individual level data analysis. Our approach leads to an efficient way to perform approximate conditional analysis using large-scale GWAS summary statistics.

Download Full-text

PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics

GigaScience ◽

10.1093/gigascience/giy090 ◽

2018 ◽

Vol 7 (8) ◽

Cited By ~ 12

Author(s):

Jie Zheng ◽

Tom G Richardson ◽

Louise A C Millard ◽

Gibran Hemani ◽

Benjamin L Elsworth ◽

...

Keyword(s):

Multiple Testing ◽

Phenotypic Correlation ◽

Summary Statistics ◽

Multiple Testing Correction ◽

Correlation Estimation

Download Full-text

A powerful and efficient two-stage method for detecting gene-to-gene interactions in GWAS

Biostatistics ◽

10.1093/biostatistics/kxw060 ◽

2017 ◽

Vol 18 (3) ◽

pp. 477-494 ◽

Cited By ~ 5

Author(s):

Jakub Pecanka ◽

Marianne A. Jonker ◽

Zoltan Bochdanovits ◽

Aad W. Van Der Vaart ◽

Keyword(s):

Complex Traits ◽

Multiple Testing ◽

Statistical Power ◽

Genome Wide Association Study ◽

Score Test ◽

Interaction Model ◽

Type I ◽

Two Stage ◽

Genome Wide ◽

Strong Control

Summary For over a decade functional gene-to-gene interaction (epistasis) has been suspected to be a determinant in the “missing heritability” of complex traits. However, searching for epistasis on the genome-wide scale has been challenging due to the prohibitively large number of tests which result in a serious loss of statistical power as well as computational challenges. In this article, we propose a two-stage method applicable to existing case-control data sets, which aims to lessen both of these problems by pre-assessing whether a candidate pair of genetic loci is involved in epistasis before it is actually tested for interaction with respect to a complex phenotype. The pre-assessment is based on a two-locus genotype independence test performed in the sample of cases. Only the pairs of loci that exhibit non-equilibrium frequencies are analyzed via a logistic regression score test, thereby reducing the multiple testing burden. Since only the computationally simple independence tests are performed for all pairs of loci while the more demanding score tests are restricted to the most promising pairs, genome-wide association study (GWAS) for epistasis becomes feasible. By design our method provides strong control of the type I error. Its favourable power properties especially under the practically relevant misspecification of the interaction model are illustrated. Ready-to-use software is available. Using the method we analyzed Parkinson’s disease in four cohorts and identified possible interactions within several SNP pairs in multiple cohorts.

Download Full-text

Within and across populations complex traits and diseases prediction using summary statistics from large-scale genomewide association studies

10.14264/11574da ◽

2021 ◽

Author(s):

◽

Ying Wang

Keyword(s):

Complex Traits ◽

Large Scale ◽

Association Studies ◽

Summary Statistics ◽

Genomewide Association

Download Full-text

Better estimation of SNP heritability from summary statistics provides a new understanding of the genetic architecture of complex traits

10.1101/284976 ◽

2018 ◽

Cited By ~ 6

Author(s):

Doug Speed ◽

David J Balding

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Confounding Bias ◽

Conserved Regions ◽

Genome Wide ◽

Variation Explained

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.

Download Full-text

A compendium of uniformly processed human gene expression and splicing quantitative trait loci

Nature Genetics ◽

10.1038/s41588-021-00924-w ◽

2021 ◽

Vol 53 (9) ◽

pp. 1290-1299

Author(s):

Nurlan Kerimov ◽

James D. Hayhurst ◽

Kateryna Peikova ◽

Jonathan R. Manning ◽

Peter Walter ◽

...

Keyword(s):

Gene Expression ◽

Quantitative Trait ◽

Target Genes ◽

Genome Wide Association Study ◽

Cell Types ◽

Summary Statistics ◽

Genome Wide ◽

Cell Type Specific ◽

Trait Locus ◽

Complex Human Traits

AbstractMany gene expression quantitative trait locus (eQTL) studies have published their summary statistics, which can be used to gain insight into complex human traits by downstream analyses, such as fine mapping and co-localization. However, technical differences between these datasets are a barrier to their widespread use. Consequently, target genes for most genome-wide association study (GWAS) signals have still not been identified. In the present study, we present the eQTL Catalogue (https://www.ebi.ac.uk/eqtl), a resource of quality-controlled, uniformly re-computed gene expression and splicing QTLs from 21 studies. We find that, for matching cell types and tissues, the eQTL effect sizes are highly reproducible between studies. Although most QTLs were shared between most bulk tissues, we identified a greater diversity of cell-type-specific QTLs from purified cell types, a subset of which also manifested as new disease co-localizations. Our summary statistics are freely available to enable the systematic interpretation of human GWAS associations across many cell types and tissues.

Download Full-text

A data harmonization pipeline to leverage external controls and boost power in GWAS

10.1101/2020.11.30.405415 ◽

2020 ◽

Author(s):

Danfeng Chen ◽

Katherine Tashman ◽

Duncan S. Palmer ◽

Benjamin Neale ◽

Kathryn Roeder ◽

...

Keyword(s):

Genome Wide Association Study ◽

Control Sample ◽

Summary Statistics ◽

Batch Effects ◽

Multiple Sources ◽

Individual Level ◽

Data Harmonization ◽

Genome Wide ◽

Before And After ◽

Spurious Results

AbstractThe use of external controls in genome-wide association study (GWAS) can significantly increase the size and diversity of the control sample, enabling high-resolution ancestry matching and enhancing the power to detect association signals. However, the aggregation of controls from multiple sources is challenging due to batch effects, difficulty in identifying genotyping errors, and the use of different genotyping platforms. These obstacles have impeded the use of external controls in GWAS and can lead to spurious results if not carefully addressed. We propose a unified data harmonization pipeline that includes an iterative approach to quality control (QC) and imputation, implemented before and after merging cohorts and arrays. We apply this harmonization pipeline to aggregate 27,517 European control samples from 16 collections within dbGaP. We leverage these harmonized controls to conduct a GWAS of Crohn’s disease. We demonstrate a boost in power over using the cohort samples alone, and that our procedure results in summary statistics free of any significant batch effects. This harmonization pipeline for aggregating genotype data from multiple sources can also serve other applications where individual level genotypes, rather than summary statistics, are required.

Download Full-text

A Powerful Procedure for Pathway-based Meta-Analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations

10.1101/041244 ◽

2016 ◽

Author(s):

Han Zhang ◽

William Wheeler ◽

Paula L Hyland ◽

Yifan Yang ◽

Jianxin Shi ◽

...

Keyword(s):

Pathway Analysis ◽

Complex Traits ◽

Meta Analysis ◽

Genetic Data ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Individual Level ◽

Testing Procedures ◽

European Populations

AbstractMeta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs.Author SummaryAs GWAS continue to grow in sample size, it is evident that these studies need to be utilized more effectively for detecting individual susceptibility variants, and more importantly to provide insight into global genetic architecture of complex traits. Towards this goal, identifying association with respect to a collection of variants in biological pathways can be particularly insightful for understanding how networks of genes might be affecting pathophysiology of diseases. Here we present a new pathway analysis procedure that can be conducted using summary-level association statistics, which have become the main vehicle for performing meta-analysis of individual genetic variants across studies in large consortia. Through simulation studies we showed the proposed method was more powerful than the existing state-of-art method. We carried out a comprehensive pathway analysis of 4,713 candidate pathways on their association with T2D using two large studies with European ancestry and identified 43 T2D-associated pathways. Further examinations of those 43 pathways in 8 Asian studies showed that some pathways were trans-ethnically associated with T2D. This analysis clearly highlights novel T2D-associated pathways beyond what has been known from single-variant association analysis reported from largest GWAS to date.

Download Full-text