Secure large-scale genome-wide association studies using homomorphic encryption

Marcelo Blatt; Alexander Gusev; Yuriy Polyakov; Shafi Goldwasser

doi:10.1073/pnas.1918257117

Secure large-scale genome-wide association studies using homomorphic encryption

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1918257117 ◽

2020 ◽

Vol 117 (21) ◽

pp. 11608-11613 ◽

Cited By ~ 1

Author(s):

Marcelo Blatt ◽

Alexander Gusev ◽

Yuriy Polyakov ◽

Shafi Goldwasser

Keyword(s):

Large Scale ◽

Homomorphic Encryption ◽

Association Studies ◽

Genome Wide Association ◽

Single Server ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

User Interactions ◽

Individual Level ◽

Genome Wide

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.

Download Full-text

SCEBE: an efficient and scalable algorithm for genome-wide association studies on longitudinal outcomes with mixed-effects modeling

Briefings in Bioinformatics ◽

10.1093/bib/bbaa130 ◽

2020 ◽

Author(s):

Min Yuan ◽

Xu Steven Xu ◽

Yaning Yang ◽

Yinsheng Zhou ◽

Yi Li ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Mixed Effects ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Scalable Algorithm ◽

Longitudinal Outcomes ◽

Mixed Effects Modeling ◽

Genome Wide

Abstract Genome-wide association studies (GWAS) using longitudinal phenotypes collected over time is appealing due to the improvement of power. However, computation burden has been a challenge because of the complex algorithms for modeling the longitudinal data. Approximation methods based on empirical Bayesian estimates (EBEs) from mixed-effects modeling have been developed to expedite the analysis. However, our analysis demonstrated that bias in both association test and estimation for the existing EBE-based methods remains an issue. We propose an incredibly fast and unbiased method (simultaneous correction for EBE, SCEBE) that can correct the bias in the naive EBE approach and provide unbiased P-values and estimates of effect size. Through application to Alzheimer’s Disease Neuroimaging Initiative data with 6 414 695 single nucleotide polymorphisms, we demonstrated that SCEBE can efficiently perform large-scale GWAS with longitudinal outcomes, providing nearly 10 000 times improvement of computational efficiency and shortening the computation time from months to minutes. The SCEBE package and the example datasets are available at https://github.com/Myuan2019/SCEBE.

Download Full-text

Bayesian large-scale multiple regression with summary statistics from genome-wide association studies

10.1101/042457 ◽

2016 ◽

Cited By ~ 5

Author(s):

Xiang Zhu ◽

Matthew Stephens

Keyword(s):

Multiple Regression ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Individual Level ◽

Genome Wide ◽

Level Data ◽

Wide Range

Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously-proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously-unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.

Download Full-text

Large Scale Association Analysis for Drug Addiction: Results from SNP to Gene

The Scientific World JOURNAL ◽

10.1100/2012/939584 ◽

2012 ◽

Vol 2012 ◽

pp. 1-6 ◽

Cited By ~ 5

Author(s):

Xiaobo Guo ◽

Zhifa Liu ◽

Xueqin Wang ◽

Heping Zhang

Keyword(s):

Association Analysis ◽

Animal Studies ◽

Large Scale ◽

Association Studies ◽

Genetic Association Studies ◽

Complex Diseases ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide

Many genetic association studies used single nucleotide polymorphisms (SNPs) data to identify genetic variants for complex diseases. Although SNP-based associations are most common in genome-wide association studies (GWAS), gene-based association analysis has received increasing attention in understanding genetic etiologies for complex diseases. While both methods have been used to analyze the same data, few genome-wide association studies compare the results or observe the connection between them. We performed a comprehensive analysis of the data from the Study of Addiction: Genetics and Environment (SAGE) and compared the results from the SNP-based and gene-based analyses. Our results suggest that the gene-based method complements the individual SNP-based analysis, and conceptually they are closely related. In terms of gene findings, our results validate many genes that were either reported from the analysis of the same dataset or based on animal studies for substance dependence.

Download Full-text

GWASpro: a high-performance genome-wide association analysis server

Bioinformatics ◽

10.1093/bioinformatics/bty989 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2512-2514 ◽

Cited By ~ 4

Author(s):

Bongsong Kim ◽

Xinbin Dai ◽

Wenchao Zhang ◽

Zhaohong Zhuang ◽

Darlene L Sanchez ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Linear Mixed Model ◽

Association Studies ◽

Learning Curves ◽

Experimental Designs ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The Mediation Effects of Aluminum in Plasma and Dipeptidyl Peptidase Like Protein 6 (DPP6) Polymorphism on Renal Function via Genome-Wide Typing Association

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph181910484 ◽

2021 ◽

Vol 18 (19) ◽

pp. 10484

Author(s):

Ting-Hao Chen ◽

Chen-Cheng Yang ◽

Kuei-Hau Luo ◽

Chia-Yen Dai ◽

Yao-Chung Chuang ◽

...

Keyword(s):

Renal Function ◽

Mediation Analysis ◽

Association Studies ◽

Dipeptidyl Peptidase ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Mediation Effects ◽

Genome Wide ◽

Estimated Glomerular Filtration Rates

Aluminum (Al) toxicity is related to renal failure and the failure of other systems. Although there were some genome-wide association studies (GWAS) in Australia and England, there were no GWAS about Han Chinese to our knowledge. Thus, this research focused on using whole genomic genotypes from the Taiwan Biobank for exploring the association between Al concentrations in plasma and renal function. Participants, who underwent questionnaire interviews, biomarkers, and genotyping, were from the Taiwan Biobank database. Then, we measured their plasma Al concentrations with ICP-MS in the laboratory at Kaohsiung Medical University. We used this data to link genome-wide association (GWA) tests while looking for candidate genes and associated plasma Al concentration to renal function. Furthermore, we examined the path relationship between Single Nucleotide Polymorphisms (SNPs), Al concentrations, and estimated glomerular filtration rates (eGFR) through the mediation analysis with 3000 replication bootstraps. Following the principles of GWAS, we focused on three SNPs within the dipeptidyl peptidase-like protein 6 (DPP6) gene in chromosome 7, rs10224371, rs2316242, and rs10268004, respectively. The results of the mediation analysis showed that all of the selected SNPs have indirectly affected eGFR through a mediation of Al concentrations. Our analysis revealed the association between DPP6 SNPs, plasma Al concentrations, and eGFR. However, further longitudinal studies and research on mechanism are in need. Our analysis was still be the first study that explored the association between the DPP6, SNPs, and Al in plasma affecting eGFR.

Download Full-text

Genome-wide association studies of yield-related traits in high-latitude japonica rice

BMC Genomic Data ◽

10.1186/s12863-021-00995-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Guomin Zhang ◽

Rongsheng Wang ◽

Juntao Ma ◽

Hongru Gao ◽

Lingwei Deng ◽

...

Keyword(s):

Linear Model ◽

High Latitude ◽

Association Studies ◽

Genome Wide Association ◽

Japonica Rice ◽

Genome Wide Association Studies ◽

Heilongjiang Province ◽

Nucleotide Polymorphisms ◽

Coding Region ◽

Genome Wide

Abstract Background Heilongjiang Province is a high-quality japonica rice cultivation area in China. One in ten bowls of Chinese rice is produced here. Increasing yield is one of the main aims of rice production in this area. However, yield is a complex quantitative trait composed of many factors. The purpose of this study was to determine how many genetic loci are associated with yield-related traits. Genome-wide association studies (GWAS) were performed on 450 accessions collected from northeast Asia, including Russia, Korea, Japan and Heilongjiang Province of China. These accessions consist of elite varieties and landraces introduced into Heilongjiang Province decade ago. Results After resequencing of the 450 accessions, 189,019 single nucleotide polymorphisms (SNPs) were used for association studies by two different models, a general linear model (GLM) and a mixed linear model (MLM), examining four traits: days to heading (DH), plant height (PH), panicle weight (PW) and tiller number (TI). Over 25 SNPs were found to be associated with each trait. Among them, 22 SNPs were selected to identify candidate genes, and 2, 8, 1 and 11 SNPs were found to be located in 3′ UTR region, intron region, coding region and intergenic region, respectively. Conclusions All SNPs detected in this research may become candidates for further fine mapping and may be used in the molecular breeding of high-latitude rice.

Download Full-text

Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation

Nucleic Acids Research ◽

10.1093/nar/gkz854 ◽

2019 ◽

Vol 48 (D1) ◽

pp. D659-D667 ◽

Cited By ~ 2

Author(s):

Wenqian Yang ◽

Yanbo Yang ◽

Cecheng Zhao ◽

Kun Yang ◽

Dongyang Wang ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

High Quality ◽

Single Nucleotide ◽

Genome Wide ◽

Whole Genome Resequencing ◽

Missing Genotypes

Abstract Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.

Download Full-text

A critical evaluation of results from genome-wide association studies of micronutrient status and their utility in the practice of precision nutrition

British Journal Of Nutrition ◽

10.1017/s0007114519001119 ◽

2019 ◽

Vol 122 (2) ◽

pp. 121-130 ◽

Cited By ~ 2

Author(s):

Marie-Joe Dib ◽

Ruan Elliott ◽

Kourosh R. Ahmadi

Keyword(s):

Large Scale ◽

Association Studies ◽

Critical Evaluation ◽

Water Soluble ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Micronutrient Deficiencies ◽

Micronutrient Status ◽

Genome Wide ◽

Fat Soluble Vitamins

AbstractRapid advances in ‘omics’ technologies have paved the way forward to an era where more ‘precise’ approaches – ‘precision’ nutrition – which leverage data on genetic variability alongside the traditional indices, have been put forth as the state-of-the-art solution to redress the effects of malnutrition across the life course. We purport that this inference is premature and that it is imperative to first review and critique the existing evidence from large-scale epidemiological findings. We set out to provide a critical evaluation of findings from genome-wide association studies (GWAS) in the roadmap to precision nutrition, focusing on GWAS of micronutrient disposition. We found that a large number of loci associated with biomarkers of micronutrient status have been identified. Mean estimates of heritability of micronutrient status ranged between 20 and 35 % for minerals, 56–59 % for water-soluble and 30–70 % for fat-soluble vitamins. With some exceptions, the majority of the identified genetic variants explained little of the overall variance in status for each micronutrient, ranging between 1·3 and 8 % (minerals), <0·1–12 % (water-soluble) and 1·7–2·3 % for (fat-soluble) vitamins. However, GWAS have provided some novel insight into mechanisms that underpin variability in micronutrient status. Our findings highlight obvious gaps that need to be addressed if the full scope of precision nutrition is ever to be realised, including research aimed at (i) dissecting the genetic basis of micronutrient deficiencies or ‘response’ to intake/supplementation (ii) identifying trans-ethnic and ethnic-specific effects (iii) identifying gene–nutrient interactions for the purpose of unravelling molecular ‘behaviour’ in a range of environmental contexts.

Download Full-text

Genome-wide association studies of cardiac electrical phenotypes

Cardiovascular Research ◽

10.1093/cvr/cvaa144 ◽

2020 ◽

Vol 116 (9) ◽

pp. 1620-1634

Author(s):

Charlotte Glinge ◽

Najim Lahrouchi ◽

Reza Jabbari ◽

Jacob Tfelt-Hansen ◽

Connie R Bezzina

Keyword(s):

Genetic Basis ◽

Association Studies ◽

Individual Variability ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Challenges And Opportunities ◽

Electrocardiographic Parameters

Abstract The genetic basis of cardiac electrical phenotypes has in the last 25 years been the subject of intense investigation. While in the first years, such efforts were dominated by the study of familial arrhythmia syndromes, in recent years, large consortia of investigators have successfully pursued genome-wide association studies (GWAS) for the identification of single-nucleotide polymorphisms that govern inter-individual variability in electrocardiographic parameters in the general population. We here provide a review of GWAS conducted on cardiac electrical phenotypes in the last 14 years and discuss the implications of these discoveries for our understanding of the genetic basis of disease susceptibility and variability in disease severity. Furthermore, we review functional follow-up studies that have been conducted on GWAS loci associated with cardiac electrical phenotypes and highlight the challenges and opportunities offered by such studies.

Download Full-text

An independent validation study of three single nucleotide polymorphisms at the sex hormone-binding globulin locus for testosterone levels identified by genome-wide association studies

Human Reproduction Open ◽

10.1093/hropen/hox002 ◽

2017 ◽

Vol 2017 (1) ◽

Author(s):

Youichi Sato ◽

Atsushi Tajima ◽

Motoki Katsurayama ◽

Shiari Nozawa ◽

Miki Yoshiike ◽

...

Keyword(s):

Validation Study ◽

Association Studies ◽

Genome Wide Association ◽

Sex Hormone Binding Globulin ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Hormone Binding ◽

Single Nucleotide ◽

Independent Validation ◽

Genome Wide

Download Full-text