scholarly journals An approach to gene-based testing accounting for dependence of tests among nearby genes

2021 ◽  
Author(s):  
Ronald J Yurko ◽  
Kathryn Roeder ◽  
Bernie Devlin ◽  
Max G'Sell

In genome-wide association studies (GWAS), it has become commonplace to test millions of SNPs for phenotypic association. Gene-based testing can improve power to detect weak signal by reducing multiple testing and pooling signal strength. While such tests account for linkage disequilibrium (LD) structure of SNP alleles within each gene, current approaches do not capture LD of SNPs falling in different nearby genes, which can induce correlation of gene-based test statistics. We introduce an algorithm to account for this correlation. When a gene's test statistic is independent of others, it is assessed separately; when test statistics for nearby genes are strongly correlated, their SNPs are agglomerated and tested as a locus. To provide insight into SNPs and genes driving association within loci, we develop an interactive visualization tool to explore localized signal. We demonstrate our approach in the context of weakly powered GWAS for autism spectrum disorder, which is contrasted to more highly powered GWAS for schizophrenia and educational attainment. To increase power for these analyses, especially those for autism, we use adaptive p-value thresholding (AdaPT), guided by high-dimensional metadata modeled with gradient boosted trees, highlighting when and how it can be most useful. Notably our workflow is based on summary statistics.

2019 ◽  
Vol 116 (4) ◽  
pp. 1195-1200 ◽  
Author(s):  
Daniel J. Wilson

Analysis of “big data” frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywise error rate (FWER) is considered the strongest protection against false positives but makes it difficult to reach the multiple testing-corrected significance threshold. Here, I introduce the harmonic mean p-value (HMP), which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP effortlessly combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human–pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all ways to group hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini–Hochberg procedure to detect significant hypotheses, although the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets, because it enhances the potential for scientific discovery.


2019 ◽  
Author(s):  
Jianan Zhan ◽  
Dan E. Arking ◽  
Joel S. Bader

AbstractBiological experiments often involve hypothesis testing at the scale of thousands to millions of tests. Alleviating the multiple testing burden has been a goal of many methods designed to boost test power by focusing tests on the alternative hypotheses most likely to be true. Very often, these methods either explicitly or implicitly make use of prior probabilities that bias significance for favored sets thought to be enriched for significant finding. Nevertheless, most genomics experiments, and in particular genome-wide association studies (GWAS), still use traditional univariate tests rather than more sophisticated approaches. Here we use GWAS to demonstrate why unbiased tests remain in favor. We calculate test power assuming perfect knowledge of a prior distribution and then derive the population size increase required to provided the same boost without a prior. We show that population size is exponentially more important than prior, providing a rigorous explanation for the observed avoidance of prior-based methods.Author summaryBiological experiments often test thousands to millions of hypotheses. Gene-based tests for human RNA-Seq data, for example, involve approximately 20,000; genome-wide association studies (GWAS) involve about 1 million effective tests. The conventional approach is to perform individual tests and then apply a Bonferroni correction to account for multiple testing. This approach implies a single-test p-value of 2.5 × 10−6 for RNA-Seq experiments, and a p-value of 5 × 10−8 for GWAS, to control the false-positive rate at a conventional value of 0.05. Many methods have been proposed to alleviate the multiple-testing burden by incorporating a prior probability that boosts the significance for a subset of candidate genes or variants. At the extreme limit, only the candidate set is tested, corresponding to a decreased multiple testing burden. Despite decades of methods development, prior-based tests have not been generally used. Here we compare the power increase possible with a prior with the increase possible with a much simpler strategy of increasing a study size. We show that increasing the population size is exponentially more valuable than increasing the strength of prior, even when the true prior is known exactly. These results provide a rigorous explanation for the continued use of simple, robust methods rather than more sophisticated approaches.


2014 ◽  
Author(s):  
Brendan K. Bulik-Sullivan ◽  
Po-Ru Loh ◽  
Hilary Finucane ◽  
Stephan Ripke ◽  
Jian Yang ◽  
...  

AbstractBoth polygenicity1,2 (i.e. many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification3, can yield inflated distributions of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from bias and true signal from polygenicity. We have developed an approach that quantifies the contributions of each by examining the relationship between test statistics and linkage disequilibrium (LD). We term this approach LD Score regression. LD Score regression provides an upper bound on the contribution of confounding bias to the observed inflation in test statistics and can be used to estimate a more powerful correction factor than genomic control4–14. We find strong evidence that polygenicity accounts for the majority of test statistic inflation in many GWAS of large sample size.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Riccardo Farinella ◽  
Ilaria Erbi ◽  
Alice Bedini ◽  
Sara Donato ◽  
Manuel Gentiluomo ◽  
...  

AbstractThe first thousand days of life from conception have a significant impact on the health status with short, and long-term effects. Among several anthropometric and maternal lifestyle parameters birth weight plays a crucial role on the growth and neurological development of infants. Recent genome wide association studies (GWAS) have demonstrated a robust foetal and maternal genetic background of birth weight, however only a small proportion of the genetic hereditability has been already identified. Considering the extensive number of phenotypes on which they are involved, we focused on identifying the possible effect of genetic variants belonging to taste receptor genes and birthweight. In the human genome there are two taste receptors family the bitter receptors (TAS2Rs) and the sweet and umami receptors (TAS1Rs). In particular sweet perception is due to a heterodimeric receptor encoded by the TAS1R2 and the TAS1R3 gene, while the umami taste receptor is encoded by the TAS1R1 and the TAS1R3 genes. We observed that carriers of the T allele of the TAS1R1-rs4908932 SNPs showed an increase in birthweight compared to GG homozygotes Coeff: 87.40 (35.13–139.68) p-value = 0.001. The association remained significant after correction for multiple testing. TAS1R1-rs4908932 is a potentially functional SNP and is in linkage disequilibrium with another polymorphism that has been associated with BMI in adults showing the importance of this variant from the early stages of conception through all the adult life.


2017 ◽  
Author(s):  
Daniel J. Wilson

Analysis of ‘big data’ frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the family-wise error rate (FWER) is considered the strongest protection against false positives, but makes it difficult to reach the multiple testing-corrected significance threshold. Here I introduce the harmonic mean p-value (HMP) which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP easily combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human-pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all combinations of hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini-Hochberg procedure to detect significant hypotheses, even though the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets because it enhances the potential for scientific discovery.


Diabetes ◽  
2021 ◽  
Vol 70 (Supplement 1) ◽  
pp. 26-OR
Author(s):  
K. ALAINE BROADAWAY ◽  
XIANYONOG YIN ◽  
ALICE WILLIAMSON ◽  
EMMA WILSON ◽  
MAGIC INVESTIGATORS

2021 ◽  
Vol 99 (Supplement_3) ◽  
pp. 243-244
Author(s):  
Brittany N Diehl ◽  
Andres A Pech-Cervantes ◽  
Thomas H Terrill ◽  
Ibukun M Ogunade ◽  
Owen Rae ◽  
...  

Abstract Florida Native sheep is an indigenous breed from Florida and expresses superior parasite resistance. Previous candidate and genome wide association studies with Florida Native sheep have identified single nucleotide polymorphisms with additive and non-additive effects associated with parasite resistance. However, the role of other potential DNA variants, such as copy number variants (CNVs), controlling this complex trait have not been evaluated. The objective of the present study was to investigate the importance of CNVs on resistance to natural Haemonchus contortus infections in Florida Native sheep. A total of 200 sheep were evaluated in the present study. Phenotypic records included fecal egg count (FEC, eggs/gram), FAMACHA score, and packed cell volume (PCV, %). Sheep were genotyped using the GGP Ovine 50K SNP chip. The copy number analysis was used to identify CNVs using the univariate method. A total of 170 animals with CNVs and phenotypic data were used for the association testing. Association tests were carried out using single linear regression and Principal Component Analysis (PCA) correction to identify CNVs associated with FEC, FAMACHA, and PCV. To confirm our results, a second association testing using the correlation-trend test with PCA correction was performed. Significant CNVs were detected when their adjusted p-value was < 0.05 after FDR correction. A deletion CNV in chromosome 21 was associated with FEC. This DNA variant was located in intron 2 of RAB3IL gene and overlapped a QTL associated with changes in eosinophil number. Our study demonstrated for the first time that CNVs could be potentially involved with parasite resistance in this heritage sheep breed.


2018 ◽  
Author(s):  
David M. Howard ◽  
Mark J. Adams ◽  
Toni-Kim Clarke ◽  
Jonathan D. Hafferty ◽  
Jude Gibson ◽  
...  

AbstractMajor depression is a debilitating psychiatric illness that is typically associated with low mood, anhedonia and a range of comorbidities. Depression has a heritable component that has remained difficult to elucidate with current sample sizes due to the polygenic nature of the disorder. To maximise sample size, we meta-analysed data on 807,553 individuals (246,363 cases and 561,190 controls) from the three largest genome-wide association studies of depression. We identified 102 independent variants, 269 genes, and 15 gene-sets associated with depression, including both genes and gene-pathways associated with synaptic structure and neurotransmission. Further evidence of the importance of prefrontal brain regions in depression was provided by an enrichment analysis. In an independent replication sample of 1,306,354 individuals (414,055 cases and 892,299 controls), 87 of the 102 associated variants were significant following multiple testing correction. Based on the putative genes associated with depression this work also highlights several potential drug repositioning opportunities. These findings advance our understanding of the complex genetic architecture of depression and provide several future avenues for understanding aetiology and developing new treatment approaches.


Author(s):  
Jack W. O’Sullivan ◽  
John P. A. Ioannidis

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.


Cosmetics ◽  
2020 ◽  
Vol 7 (2) ◽  
pp. 49
Author(s):  
Miranda A. Farage ◽  
Yunxuan Jiang ◽  
Jay P. Tiesman ◽  
Pierre Fontanillas ◽  
Rosemarie Osborne

Individuals suffering from sensitive skin often have other skin conditions and/or diseases, such as fair skin, freckles, rosacea, or atopic dermatitis. Genome-wide association studies (GWAS) have been performed for some of these conditions, but not for sensitive skin. In this study, a total of 23,426 unrelated participants of European ancestry from the 23andMe database were evaluated for self-declared sensitive skin, other skin conditions, and diseases using an online questionnaire format. Responders were separated into two groups: those who declared they had sensitive skin (n = 8971) and those who declared their skin was not sensitive (controls, n = 14,455). A GWAS of sensitive skin individuals identified three genome-wide significance loci (p-value < 5 × 10−8) and seven suggestive loci (p-value < 1 × 10−6). Of the three most significant loci, all have been associated with pigmentation and two have been associated with acne.


Sign in / Sign up

Export Citation Format

Share Document