Spatial rank-based multifactor dimensionality reduction to detect gene–gene interactions for multivariate phenotypes

Abstract Background Identifying interaction effects between genes is one of the main tasks of genome-wide association studies aiming to shed light on the biological mechanisms underlying complex diseases. Multifactor dimensionality reduction (MDR) is a popular approach for detecting gene–gene interactions that has been extended in various forms to handle binary and continuous phenotypes. However, only few multivariate MDR methods are available for multiple related phenotypes. Current approaches use Hotelling’s T2 statistic to evaluate interaction models, but it is well known that Hotelling’s T2 statistic is highly sensitive to heavily skewed distributions and outliers. Results We propose a robust approach based on nonparametric statistics such as spatial signs and ranks. The new multivariate rank-based MDR (MR-MDR) is mainly suitable for analyzing multiple continuous phenotypes and is less sensitive to skewed distributions and outliers. MR-MDR utilizes fuzzy k-means clustering and classifies multi-locus genotypes into two groups. Then, MR-MDR calculates a spatial rank-sum statistic as an evaluation measure and selects the best interaction model with the largest statistic. Our novel idea lies in adopting nonparametric statistics as an evaluation measure for robust inference. We adopt tenfold cross-validation to avoid overfitting. Intensive simulation studies were conducted to compare the performance of MR-MDR with current methods. Application of MR-MDR to a real dataset from a Korean genome-wide association study demonstrated that it successfully identified genetic interactions associated with four phenotypes related to kidney function. The R code for conducting MR-MDR is available at https://github.com/statpark/MR-MDR. Conclusions Intensive simulation studies comparing MR-MDR with several current methods showed that the performance of MR-MDR was outstanding for skewed distributions. Additionally, for symmetric distributions, MR-MDR showed comparable power. Therefore, we conclude that MR-MDR is a useful multivariate non-parametric approach that can be used regardless of the phenotype distribution, the correlations between phenotypes, and sample size.

Download Full-text

Multifactor Dimensionality Reduction as a Filter-Based Approach for Genome Wide Association Studies

Frontiers in Genetics ◽

10.3389/fgene.2011.00080 ◽

2011 ◽

Vol 2 ◽

Cited By ~ 6

Author(s):

Noffisat O. Oki ◽

Alison A. Motsinger-Reif

Keyword(s):

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

The search for gene-gene interactions in genome-wide association studies: challenges in abundance of methods, practical considerations, and biological interpretation

Annals of Translational Medicine ◽

10.21037/atm.2018.04.05 ◽

2018 ◽

Vol 6 (8) ◽

pp. 157-157 ◽

Cited By ~ 19

Author(s):

Marylyn D. Ritchie ◽

Kristel Van Steen

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Biological Interpretation

Download Full-text

A new efficient method to detect genetic interactions for lung cancer GWAS

BMC Medical Genomics ◽

10.1186/s12920-020-00807-9 ◽

2020 ◽

Vol 13 (1) ◽

Author(s):

Jennifer Luyapan ◽

Xuemei Ji ◽

Siting Li ◽

Xiangjun Xiao ◽

Dakai Zhu ◽

...

Keyword(s):

Lung Cancer ◽

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Disease Onset ◽

Genetic Interactions ◽

Survival Outcomes ◽

Type I ◽

Genome Wide Association Studies ◽

Data Set ◽

Genome Wide

Abstract Background Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset. Methods To demonstrate efficacy, we evaluated this method on two simulation data sets to estimate the type I error rate and power. Simulations showed that ES-MDR identified interactions using less computational workload and allowed for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data. Results Our experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR 1.24, P = 3.15 × 10–15), as the top marker to predict age of lung cancer onset. Conclusions From the results of our extensive simulations and analysis of a large GWAS study, we demonstrated that our method is an efficient algorithm that identified genetic interactions to include in our models to predict survival outcomes.

Download Full-text

Rapid testing of gene-gene interactions in genome-wide association studies of binary and quantitative phenotypes

Genetic Epidemiology ◽

10.1002/gepi.20629 ◽

2011 ◽

Vol 35 (8) ◽

pp. 800-808 ◽

Cited By ~ 7

Author(s):

Kanishka Bhattacharya ◽

Mark I. McCarthy ◽

Andrew P. Morris

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Rapid Testing ◽

Genome Wide

Download Full-text

A fast algorithm for detecting gene–gene interactions in genome-wide association studies

The Annals of Applied Statistics ◽

10.1214/14-aoas771 ◽

2014 ◽

Vol 8 (4) ◽

pp. 2292-2318 ◽

Cited By ~ 16

Author(s):

Jiahan Li ◽

Wei Zhong ◽

Runze Li ◽

Rongling Wu

Keyword(s):

Fast Algorithm ◽

Association Studies ◽

Genome Wide Association ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

A Comparative Study on Multifactor Dimensionality Reduction Methods for Detecting Gene-Gene Interactions with the Survival Phenotype

BioMed Research International ◽

10.1155/2015/671859 ◽

2015 ◽

Vol 2015 ◽

pp. 1-7 ◽

Cited By ~ 5

Author(s):

Seungyeoun Lee ◽

Yongkang Kim ◽

Min-Seok Kwon ◽

Taesung Park

Keyword(s):

Dimensionality Reduction ◽

Genetic Variants ◽

Multifactor Dimensionality Reduction ◽

Association Studies ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Missing Heritability ◽

Analytical Strategy ◽

Reduction Methods ◽

Missing Heritability Problem

Genome-wide association studies (GWAS) have extensively analyzed single SNP effects on a wide variety of common and complex diseases and found many genetic variants associated with diseases. However, there is still a large portion of the genetic variants left unexplained. This missing heritability problem might be due to the analytical strategy that limits analyses to only single SNPs. One of possible approaches to the missing heritability problem is to consider identifying multi-SNP effects or gene-gene interactions. The multifactor dimensionality reduction method has been widely used to detect gene-gene interactions based on the constructive induction by classifying high-dimensional genotype combinations into one-dimensional variable with two attributes of high risk and low risk for the case-control study. Many modifications of MDR have been proposed and also extended to the survival phenotype. In this study, we propose several extensions of MDR for the survival phenotype and compare the proposed extensions with earlier MDR through comprehensive simulation studies.

Download Full-text

A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies

BMC Bioinformatics ◽

10.1186/1471-2105-12-16 ◽

2011 ◽

Vol 12 (1) ◽

Cited By ~ 26

Author(s):

Raphaël Mourad ◽

Christine Sinoquet ◽

Philippe Leray

Keyword(s):

Linkage Disequilibrium ◽

Dimensionality Reduction ◽

Bayesian Network ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Hierarchical Bayesian ◽

Network Approach ◽

Genome Wide ◽

Data Dimensionality Reduction

Download Full-text

Testing Gene-Gene Interactions in Genome Wide Association Studies

Genetic Epidemiology ◽

10.1002/gepi.21786 ◽

2014 ◽

Vol 38 (2) ◽

pp. 123-134 ◽

Cited By ~ 15

Author(s):

Jie Kate Hu ◽

Xianlong Wang ◽

Pei Wang

Keyword(s):

Association Studies ◽

Genome Wide Association ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

Combining least absolute shrinkage and selection operator (LASSO) and principal-components analysis for detection of gene-gene interactions in genome-wide association studies

BMC Proceedings ◽

10.1186/1753-6561-3-s7-s62 ◽

2009 ◽

Vol 3 (S7) ◽

Cited By ~ 22

Author(s):

Gina M D'Angelo ◽

DC Rao ◽

C Charles Gu

Keyword(s):

Principal Components Analysis ◽

Principal Components ◽

Association Studies ◽

Genome Wide Association ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Components Analysis ◽

Selection Operator

Download Full-text

Gene-Based Testing of Interactions Using XGBoost in Genome-Wide Association Studies

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.801113 ◽

2021 ◽

Vol 9 ◽

Author(s):

Yingjie Guo ◽

Chenxi Wu ◽

Zhian Yuan ◽

Yansu Wang ◽

Zhen Liang ◽

...

Keyword(s):

Association Studies ◽

Real Data ◽

Gene Interaction ◽

Genome Wide Association ◽

Superior Performance ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

The Difference

Among the myriad of statistical methods that identify gene–gene interactions in the realm of qualitative genome-wide association studies, gene-based interactions are not only powerful statistically, but also they are interpretable biologically. However, they have limited statistical detection by making assumptions on the association between traits and single nucleotide polymorphisms. Thus, a gene-based method (GGInt-XGBoost) originated from XGBoost is proposed in this article. Assuming that log odds ratio of disease traits satisfies the additive relationship if the pair of genes had no interactions, the difference in error between the XGBoost model with and without additive constraint could indicate gene–gene interaction; we then used a permutation-based statistical test to assess this difference and to provide a statistical p-value to represent the significance of the interaction. Experimental results on both simulation and real data showed that our approach had superior performance than previous experiments to detect gene–gene interactions.

Download Full-text