Performance of Model-Based Multifactor Dimensionality Reduction Methods for Epistasis Detection by Controlling Population Structure

Abstract Background: In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods: To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results: Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power.Conclusion: We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.

Download Full-text

Performance of model-based multifactor dimensionality reduction methods for epistasis detection by controlling population structure

BioData Mining ◽

10.1186/s13040-021-00247-w ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Fentaw Abegaz ◽

François Van Lishout ◽

Jestinah M. Mahachie John ◽

Kridsadakorn Chiachoompu ◽

Archana Bhardwaj ◽

...

Keyword(s):

Population Structure ◽

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Structured Populations ◽

Population Substructure ◽

Type I ◽

Genome Wide Association Studies ◽

Genetic Population ◽

Model Based ◽

Causal Variants

Abstract Background In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. Conclusion We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.

Download Full-text

Performance of Model Based Multifactor Dimensionality Reduction Methods for Epistasis Detection by Controlling Population Structure

10.21203/rs.3.rs-38377/v1 ◽

2020 ◽

Author(s):

Fentaw Abegaz ◽

Francois van Lishout ◽

Jestinah M. Mahachie John ◽

Kridsadakorn Chiachoompu ◽

Archana Bhjardwa ◽

...

Keyword(s):

Population Structure ◽

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Structured Populations ◽

Population Substructure ◽

Type I ◽

Genome Wide Association Studies ◽

Genetic Population ◽

Model Based ◽

Causal Variants

Abstract Background In genome-wide association studies the extent and impact of confounding due population structure has been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one being based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies is much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods In order to identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on model-based multifactor dimensionality reduction approach for structured populations namely: MBMDR-PC, MBMDR-PG and MBMDR-GC. Results Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. Conclusion We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.

Download Full-text

Epistasis Detection using Model Based Multifactor Dimensionality Reduction in Structured Populations

10.1101/541946 ◽

2019 ◽

Author(s):

Fentaw Abegaz ◽

François Van Lishout ◽

Jestinah M Mahachie John ◽

Kridsadakorn Chiachoompu ◽

Archana Bhardwaj ◽

...

Keyword(s):

Population Structure ◽

Multifactor Dimensionality Reduction ◽

Genetic Similarity ◽

Association Studies ◽

Gene Interaction ◽

Structured Populations ◽

Simulation Studies ◽

Genetic Population ◽

Extensive Simulation ◽

Causal Variants

AbstractIn genome-wide association studies, the extent and impact of confounding due population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of non-linear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. In order to identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on model-based multifactor dimensionality reduction (MB-MDR) approach for structured populations. We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and non-linear sample genetic similarity.Authors SummaryOne of the biggest challenges in human genetics is to understand the genetic basis of complex diseases such as cancer, diabetes, heart disease, depression, asthma, inflammatory bowel disease and hypertension, for instance via identifying genes, gene-gene and gene-environment interactions in association studies. Over the years, a more prominent role has been given to gene-gene interaction (epistasis) detection, in view of precision medicine and the hunt for novel drug targets and biomarkers. However, the increasing number of consortium-based epistasis studies that are marked by heterogeneous sample collections due to population structure or shared genetic ancestry are likely to be prone to spurious association and low power detection of associated or causal genes. In this work we introduced various strategies in epistasis studies with correction for confounding due to population structure. Based on extensive simulation studies we demonstrated the effect of genetic population structure on epistasis detection and investigated remedial measures to confounding by linear and nonlinear sample genetic similarity.

Download Full-text

A new efficient method to detect genetic interactions for lung cancer GWAS

BMC Medical Genomics ◽

10.1186/s12920-020-00807-9 ◽

2020 ◽

Vol 13 (1) ◽

Author(s):

Jennifer Luyapan ◽

Xuemei Ji ◽

Siting Li ◽

Xiangjun Xiao ◽

Dakai Zhu ◽

...

Keyword(s):

Lung Cancer ◽

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Disease Onset ◽

Genetic Interactions ◽

Survival Outcomes ◽

Type I ◽

Genome Wide Association Studies ◽

Data Set ◽

Genome Wide

Abstract Background Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset. Methods To demonstrate efficacy, we evaluated this method on two simulation data sets to estimate the type I error rate and power. Simulations showed that ES-MDR identified interactions using less computational workload and allowed for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data. Results Our experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR 1.24, P = 3.15 × 10–15), as the top marker to predict age of lung cancer onset. Conclusions From the results of our extensive simulations and analysis of a large GWAS study, we demonstrated that our method is an efficient algorithm that identified genetic interactions to include in our models to predict survival outcomes.

Download Full-text

A New Efficient Method to Detect Genetic Interactions for Lung Cancer GWAS

10.21203/rs.2.14850/v1 ◽

2019 ◽

Author(s):

Jennifer Luyapan ◽

Xuemei Ji ◽

Xiangjun Xiao ◽

Dakai Zhu ◽

Eric J. Duell ◽

...

Keyword(s):

Lung Cancer ◽

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Disease Onset ◽

Genetic Interactions ◽

Survival Outcomes ◽

Type I ◽

Genome Wide Association Studies ◽

Data Set ◽

Genome Wide

Abstract Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We address the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which uses Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset. To demonstrate efficacy, we evaluated this method on two simulation sets to estimate the type I error and power. Simulations show that ES-MDR identifies interactions using less computational workload and allows for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data. Our experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR = 1.24, P = 3.15 x 10-15), as the top marker to predict age of lung cancer onset. From the results of our extensive simulations and analysis of a large GWAS study, we demonstrate that our method is an efficient algorithm that identifies genetic interactions to include in our models to predict survival outcomes.

Download Full-text

A New Efficient Method to Detect Genetic Interactions for Lung Cancer GWAS

10.21203/rs.2.14850/v2 ◽

2020 ◽

Author(s):

Jennifer Luyapan ◽

Xuemei Ji ◽

Siting Li ◽

Xiangjun Xiao ◽

Dakai Zhu ◽

...

Keyword(s):

Lung Cancer ◽

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Disease Onset ◽

Genetic Interactions ◽

Survival Outcomes ◽

Type I ◽

Genome Wide Association Studies ◽

Data Set ◽

Genome Wide

Abstract Background: Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset. Methods: To demonstrate efficacy, we evaluated this method on two simulation data sets to estimate the type I error rate and power. Simulations showed that ES-MDR identified interactions using less computational workload and allowed for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data. Results: Our experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR = 1.24, P = 3.15 x 10 -15 ), as the top marker to predict age of lung cancer onset. Conclusions: From the results of our extensive simulations and analysis of a large GWAS study, we demonstrated that our method is an efficient algorithm that identified genetic interactions to include in our models to predict survival outcomes.

Download Full-text

A New Efficient Method to Detect Genetic Interactions for Lung Cancer GWAS

10.21203/rs.2.14850/v3 ◽

2020 ◽

Author(s):

Jennifer Luyapan ◽

Xuemei Ji ◽

Siting Li ◽

Xiangjun Xiao ◽

Dakai Zhu ◽

...

Keyword(s):

Lung Cancer ◽

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Disease Onset ◽

Genetic Interactions ◽

Survival Outcomes ◽

Type I ◽

Genome Wide Association Studies ◽

Data Set ◽

Genome Wide

Abstract Background: Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset.Methods: To demonstrate efficacy, we evaluated this method on two simulation data sets to estimate the type I error rate and power. Simulations showed that ES-MDR identified interactions using less computational workload and allowed for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data.Results: Our experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR = 1.24, P = 3.15 x 10-15), as the top marker to predict age of lung cancer onset.Conclusions: From the results of our extensive simulations and analysis of a large GWAS study, we demonstrated that our method is an efficient algorithm that identified genetic interactions to include in our models to predict survival outcomes.

Download Full-text