Heritability Estimation and Differential Analysis with Generalized Linear Mixed Models in Genomic Sequencing Studies

ABSTRACTMotivationGenomic sequencing studies, including RNA sequencing and bisulfite sequencing studies, are becoming increasingly common and increasingly large. Large genomic sequencing studies open doors for accurate molecular trait heritability estimation and powerful differential analysis. Heritability estimation and differential analysis in sequencing studies requires the development of statistical methods that can properly account for the count nature of the sequencing data and that are computationally efficient for large data sets.ResultsHere, we develop such a method, PQLseq (Penalized Quasi-Likelihood for sequencing count data), to enable effective and efficient heritability estimation and differential analysis using the generalized linear mixed model framework. With extensive simulations and comparisons to previous methods, we show that PQLseq is the only method currently available that can produce unbiased heritability estimates for sequencing count data. In addition, we show that PQLseq is well suited for differential analysis in large sequencing studies, providing calibrated type I error control and more power compared to the standard linear mixed model methods. Finally, we apply PQLseq to perform gene expression heritability estimation and differential expression analysis in a large RNA sequencing study in the Hutterites.Availability and implementationPQLseq is implemented as an R package with source code freely available at www.xzlab.org/software.html and https://cran.r-project.org/web/packages/PQLseq/index.html.ContactXZ ([email protected])Supplementary informationSupplementary data are available online.

Download Full-text

Heritability estimation and differential analysis of count data with generalized linear mixed models in genomic sequencing studies

Bioinformatics ◽

10.1093/bioinformatics/bty644 ◽

2018 ◽

Vol 35 (3) ◽

pp. 487-496 ◽

Cited By ~ 16

Author(s):

Shiquan Sun ◽

Jiaqiang Zhu ◽

Sahar Mozaffari ◽

Carole Ober ◽

Mengjie Chen ◽

...

Keyword(s):

Count Data ◽

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Genomic Sequencing ◽

Differential Analysis ◽

Heritability Estimation ◽

Sequencing Studies

Download Full-text

Analyzing discontinuities in longitudinal count data: A multilevel generalized linear mixed model.

Psychological Methods ◽

10.1037/met0000347 ◽

2020 ◽

Author(s):

James L. Peugh ◽

Sarah J. Beal ◽

Meghan E. McGrady ◽

Michael D. Toland ◽

Constance Mara

Keyword(s):

Count Data ◽

Mixed Model ◽

Linear Mixed Model ◽

Generalized Linear Mixed Model ◽

Longitudinal Count Data

Download Full-text

A Bayesian linear mixed model for prediction of complex traits

Bioinformatics ◽

10.1093/bioinformatics/btaa1023 ◽

2020 ◽

Author(s):

Yang Hai ◽

Yalu Wen

Keyword(s):

Complex Traits ◽

Mixed Model ◽

Linear Mixed Model ◽

Rare Variants ◽

Disease Risk ◽

R Package ◽

Underlying Disease ◽

Supplementary Information ◽

True Effect Size ◽

Bayes Algorithm

Abstract Motivation Accurate disease risk prediction is essential for precision medicine. Existing models either assume that diseases are caused by groups of predictors with small-to-moderate effects or a few isolated predictors with large effects. Their performance can be sensitive to the underlying disease mechanisms, which are usually unknown in advance. Results We developed a Bayesian linear mixed model (BLMM), where genetic effects were modelled using a hybrid of the sparsity regression and linear mixed model with multiple random effects. The parameters in BLMM were inferred through a computationally efficient variational Bayes algorithm. The proposed method can resemble the shape of the true effect size distributions, captures the predictive effects from both common and rare variants, and is robust against various disease models. Through extensive simulations and the application to a whole-genome sequencing dataset obtained from the Alzheimer’s Disease Neuroimaging Initiatives, we have demonstrated that BLMM has better prediction performance than existing methods and can detect variables and/or genetic regions that are predictive. Availability The R-package is available at https://github.com/yhai943/BLMM Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Linear mixed model for heritability estimation that explicitly addresses environmental variation

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1510497113 ◽

2016 ◽

Vol 113 (27) ◽

pp. 7377-7382 ◽

Cited By ~ 42

Author(s):

David Heckerman ◽

Deepti Gurdasani ◽

Carl Kadie ◽

Cristina Pomilla ◽

Tommy Carstensen ◽

...

Keyword(s):

Mixed Model ◽

Linear Mixed Model ◽

Random Effect ◽

Simulated Data ◽

Spatial Location ◽

Liver Function Tests ◽

Narrow Sense Heritability ◽

Heritability Estimation ◽

Genome Wide Data ◽

Estimate Heritability

The linear mixed model (LMM) is now routinely used to estimate heritability. Unfortunately, as we demonstrate, LMM estimates of heritability can be inflated when using a standard model. To help reduce this inflation, we used a more general LMM with two random effects—one based on genomic variants and one based on easily measured spatial location as a proxy for environmental effects. We investigated this approach with simulated data and with data from a Uganda cohort of 4,778 individuals for 34 phenotypes including anthropometric indices, blood factors, glycemic control, blood pressure, lipid tests, and liver function tests. For the genomic random effect, we used identity-by-descent estimates from accurately phased genome-wide data. For the environmental random effect, we constructed a covariance matrix based on a Gaussian radial basis function. Across the simulated and Ugandan data, narrow-sense heritability estimates were lower using the more general model. Thus, our approach addresses, in part, the issue of “missing heritability” in the sense that much of the heritability previously thought to be missing was fictional. Software is available at https://github.com/MicrosoftGenomics/FaST-LMM.

Download Full-text

Estimating Heritability of Glycaemic Response to Metformin using Nationwide Electronic Health Records and Population-Sized Pedigree

10.21203/rs.3.rs-122793/v1 ◽

2021 ◽

Author(s):

Iris Kalka ◽

Amir Gavrieli ◽

Smadar Shilo ◽

Hagai Rossman ◽

Nitzan Shalom Artzi ◽

...

Keyword(s):

Electronic Health Records ◽

Mixed Model ◽

Linear Mixed Model ◽

Metformin Treatment ◽

Glycaemic Response ◽

Health Records ◽

Heritability Estimation ◽

Estimate Heritability ◽

Electronic Health ◽

Pre Treatment

Abstract Variability of response to medication is a well known phenomenon, determined by both environmental and genetic factors. Understanding the heritable component of the response to medication is of great interest but challenging due to several reasons, including small study cohorts and computational limitations. Here, we studied the heritability of variation in the glycaemic response to metformin, first-line therapeutic agent for type 2 diabetes (T2D), by leveraging 17 years of electronic health records (EHR) data from Israel’s largest healthcare service provider, consisting of over five million patients of diverse ethnicities and socio-economic background. Our cohort consisted of 74,871 T2D patients treated with metformin, with an accumulated number of 1,358,776 HbA1C measurements and 323,260 metformin prescriptions. We estimated the explained variance of glycated hemoglobin (HbA1c%) reduction due to heritability by constructing a six-generation population-size pedigree from national registries and linking it to medical health records. Using Linear Mixed Model-based framework, a common-practice method for heritability estimation, we calculated a heritability measure of h2 = 15.9% (95% CI, 1.2% − 30.5%) for absolute reduction of HbA1c% after metformin treatment in males and h2 = 20.9% (95% CI, 7.5% − 34.3%) in females. Results remained unchanged after adjusting for pre-treatment HbA1c% in females, and for both genders in proportional reduction of HbA1c%. To the best of our knowledge, our work is the first to estimate heritability of drug response using EHR data. We demonstrated that while response to metformin treatment has a heritable component, most of the variation is likely due to other factors, further motivating non-genetic analyses aimed at unraveling metformin’s mechanism of action.

Download Full-text

Estimating SNP heritability in presence of population substructure in biobank-scale datasets

10.1101/2020.08.05.236901 ◽

2020 ◽

Author(s):

Zhaotong Lin ◽

Souvik Seal ◽

Saonli Basu

Keyword(s):

Complex Traits ◽

Population Stratification ◽

Mixed Model ◽

Linear Mixed Model ◽

Population Substructure ◽

Relationship Matrix ◽

Phenotypic Variance ◽

Genetic Contribution ◽

Heritability Estimation ◽

The Impact

AbstractSNP heritability of a trait is measured by the proportion of total variance explained by the additive effects of genome-wide single nucleotide polymorphisms (SNPs). Linear mixed models are routinely used to estimate SNP heritability for many complex traits. The basic concept behind this approach is to model genetic contribution as a random effect, where the variance of this genetic contribution attributes to the heritability of the trait. This linear mixed model approach requires estimation of ‘relatedness’ among individuals in the sample, which is usually captured by estimating a genetic relationship matrix (GRM). Heritability is estimated by the restricted maximum likelihood (REML) or method of moments (MOM) approaches, and this estimation relies heavily on the GRM computed from the genetic data on individuals. Presence of population substructure in the data could significantly impact the GRM estimation and may introduce bias in heritability estimation. The common practice of accounting for such population substructure is to adjust for the top few principal components of the GRM as covariates in the linear mixed model. Here we propose an alternative way of estimating heritability in multi-ethnic studies. Our proposed approach is a MOM estimator derived from the Haseman-Elston regression and gives an asymptotically unbiased estimate of heritability in presence of population stratification. It introduces adjustments for the population stratification in a second-order estimating equation and allows for the total phenotypic variance vary by ethnicity. We study the performance of different MOM and REML approaches in presence of population stratification through extensive simulation studies. We estimate the heritability of height, weight and other anthropometric traits in the UK Biobank cohort to investigate the impact of subtle population substructure on SNP heritability estimation.

Download Full-text

Multi-kernel linear mixed model with adaptive lasso for prediction analysis on high-dimensional multi-omics data

Bioinformatics ◽

10.1093/bioinformatics/btz822 ◽

2019 ◽

Vol 36 (6) ◽

pp. 1785-1794

Author(s):

Jun Li ◽

Qing Lu ◽

Yalu Wen

Keyword(s):

Risk Prediction ◽

Mixed Model ◽

Linear Mixed Model ◽

R Package ◽

Kernel Functions ◽

Adaptive Lasso ◽

Supplementary Information ◽

High Dimensional ◽

Omics Data ◽

Modeling Framework

Abstract Motivation The use of human genome discoveries and other established factors to build an accurate risk prediction model is an essential step toward precision medicine. While multi-layer high-dimensional omics data provide unprecedented data resources for prediction studies, their corresponding analytical methods are much less developed. Results We present a multi-kernel penalized linear mixed model with adaptive lasso (MKpLMM), a predictive modeling framework that extends the standard linear mixed models widely used in genomic risk prediction, for multi-omics data analysis. MKpLMM can capture not only the predictive effects from each layer of omics data but also their interactions via using multiple kernel functions. It adopts a data-driven approach to select predictive regions as well as predictive layers of omics data, and achieves robust selection performance. Through extensive simulation studies, the analyses of PET-imaging outcomes from the Alzheimer’s Disease Neuroimaging Initiative study, and the analyses of 64 drug responses, we demonstrate that MKpLMM consistently outperforms competing methods in phenotype prediction. Availability and implementation The R-package is available at https://github.com/YaluWen/OmicPred. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Efficient multivariate analysis algorithms for longitudinal genome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btz304 ◽

2019 ◽

Vol 35 (23) ◽

pp. 4879-4885 ◽

Cited By ~ 4

Author(s):

Chao Ning ◽

Dan Wang ◽

Lei Zhou ◽

Julong Wei ◽

Yuanxin Liu ◽

...

Keyword(s):

Longitudinal Data ◽

Software Package ◽

Mixed Model ◽

Linear Mixed Model ◽

Association Studies ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Computational Speed

Abstract Motivation Current dynamic phenotyping system introduces time as an extra dimension to genome-wide association studies (GWAS), which helps to explore the mechanism of dynamical genetic control for complex longitudinal traits. However, existing methods for longitudinal GWAS either ignore the covariance among observations of different time points or encounter computational efficiency issues. Results We herein developed efficient genome-wide multivariate association algorithms for longitudinal data. In contrast to existing univariate linear mixed model analyses, the proposed method has improved statistic power for association detection and computational speed. In addition, the new method can analyze unbalanced longitudinal data with thousands of individuals and more than ten thousand records within a few hours. The corresponding time for balanced longitudinal data is just a few minutes. Availability and implementation A software package to implement the efficient algorithm named GMA (https://github.com/chaoning/GMA) is available freely for interested users in relevant fields. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Privacy-preserving construction of generalized linear mixed model for biomedical computation

Bioinformatics ◽

10.1093/bioinformatics/btaa478 ◽

2020 ◽

Vol 36 (Supplement_1) ◽

pp. i128-i135

Author(s):

Rui Zhu ◽

Chao Jiang ◽

Xiaofeng Wang ◽

Shuang Wang ◽

Hao Zheng ◽

...

Keyword(s):

Em Algorithm ◽

Random Effects ◽

Mixed Model ◽

Linear Mixed Model ◽

Generalized Linear Mixed Model ◽

Privacy Preserving ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Linear Predictor ◽

Biomedical Computation

Abstract Motivation The generalized linear mixed model (GLMM) is an extension of the generalized linear model (GLM) in which the linear predictor takes random effects into account. Given its power of precisely modeling the mixed effects from multiple sources of random variations, the method has been widely used in biomedical computation, for instance in the genome-wide association studies (GWASs) that aim to detect genetic variance significantly associated with phenotypes such as human diseases. Collaborative GWAS on large cohorts of patients across multiple institutions is often impeded by the privacy concerns of sharing personal genomic and other health data. To address such concerns, we present in this paper a privacy-preserving Expectation–Maximization (EM) algorithm to build GLMM collaboratively when input data are distributed to multiple participating parties and cannot be transferred to a central server. We assume that the data are horizontally partitioned among participating parties: i.e. each party holds a subset of records (including observational values of fixed effect variables and their corresponding outcome), and for all records, the outcome is regulated by the same set of known fixed effects and random effects. Results Our collaborative EM algorithm is mathematically equivalent to the original EM algorithm commonly used in GLMM construction. The algorithm also runs efficiently when tested on simulated and real human genomic data, and thus can be practically used for privacy-preserving GLMM construction. We implemented the algorithm for collaborative GLMM (cGLMM) construction in R. The data communication was implemented using the rsocket package. Availability and implementation The software is released in open source at https://github.com/huthvincent/cGLMM. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

LuxUS: DNA Methylation Analysis Using Generalized Linear Mixed Model with Spatial Correlation

10.1101/536722 ◽

2019 ◽

Cited By ~ 2

Author(s):

Viivi Halla-aho ◽

Harri Lähdesmäki

Keyword(s):

Dna Methylation ◽

Spatial Correlation ◽

Mixed Model ◽

Bisulfite Sequencing ◽

Linear Mixed Model ◽

Generalized Linear Mixed Model ◽

Statistical Testing ◽

Supplementary Information ◽

Sequencing Data ◽

Bisulfite Sequencing Data

AbstractMotivationDNA methylation is an important epigenetic modification, which has multiple functions. DNA methylation and its connections to diseases have been extensively studied in recent years. It is known that DNA methylation levels of neighboring cytosines are correlated and that differential DNA methylation typically occurs rather as regions instead of individual cytosine level.ResultsWe have developed a generalized linear mixed model, LuxUS, that makes use of the correlation between neighboring cytosines to facilitate analysis of differential methylation. LuxUS implements a likelihood model for bisulfite sequencing data that accounts for experimental variation in underlying biochemistry. LuxUS can model both binary and continuous covariates, and mixed model formulation enables including replicate and cytosine random effects. Spatial correlation is included to the model through a cytosine random effect correlation structure. We show with simulation experiments that by utilizing the spatial correlation we gain more power to the statistical testing of differential DNA methylation. Results with real bisulfite sequencing data set show that LuxUS is able to detect biologically significant differentially methylated cytosines.AvailabilityThe tool is available at https://github.com/hallav/LuxUS.Supplementary informationSupplementary data are available at bioRxiv.

Download Full-text