scholarly journals Nonlinear ridge regression improves cell-type-specific differential expression analysis

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Fumihiko Takeuchi ◽  
Norihiro Kato

Abstract Background Epigenome-wide association studies (EWAS) and differential gene expression analyses are generally performed on tissue samples, which consist of multiple cell types. Cell-type-specific effects of a trait, such as disease, on the omics expression are of interest but difficult or costly to measure experimentally. By measuring omics data for the bulk tissue, cell type composition of a sample can be inferred statistically. Subsequently, cell-type-specific effects are estimated by linear regression that includes terms representing the interaction between the cell type proportions and the trait. This approach involves two issues, scaling and multicollinearity. Results First, although cell composition is analyzed in linear scale, differential methylation/expression is analyzed suitably in the logit/log scale. To simultaneously analyze two scales, we applied nonlinear regression. Second, we show that the interaction terms are highly collinear, which is obstructive to ordinary regression. To cope with the multicollinearity, we applied ridge regularization. In simulated data, nonlinear ridge regression attained well-balanced sensitivity, specificity and precision. Marginal model attained the lowest precision and highest sensitivity and was the only algorithm to detect weak signal in real data. Conclusion Nonlinear ridge regression performed cell-type-specific association test on bulk omics data with well-balanced performance. The omicwas package for R implements nonlinear ridge regression for cell-type-specific EWAS, differential gene expression and QTL analyses. The software is freely available from https://github.com/fumi-github/omicwas

2020 ◽  
Author(s):  
Fumihiko Takeuchi ◽  
Norihiro Kato

AbstractBackgroundEpigenome-wide association studies (EWAS) and differential gene expression analyses are generally performed on tissue samples, which consist of multiple cell types. Cell-type-specific effects of a trait, such as disease, on the omics expression are of interest but difficult or costly to measure experimentally. By measuring omics data for the bulk tissue, cell type composition of a sample can be inferred statistically. Subsequently, cell-type-specific effects are estimated by linear regression that includes terms representing the interaction between the cell type proportions and the trait. This approach involves two issues, scaling and multicollinearity.ResultsFirst, although cell composition is analyzed in linear scale, differential methylation/expression is analyzed suitably in the logit/log scale. To simultaneously analyze two scales, we applied nonlinear regression. Second, we show that the interaction terms are highly collinear, which is obstructive to ordinary regression. To cope with the multicollinearity, we applied ridge regularization. In simulated data, nonlinear ridge regression attained well-balanced sensitivity, specificity and precision. Marginal model attained the lowest precision and highest sensitivity and was the only algorithm to detect weak signal in real data.ConclusionNonlinear ridge regression performed cell-type-specific association test on bulk omics data with well-balanced performance. The omicwas package for R implements nonlinear ridge regression for cell-type-specific EWAS, differential gene expression and QTL analyses. The software is freely available from https://github.com/fumi-github/omicwas


2021 ◽  
Author(s):  
Fumihiko Takeuchi ◽  
Norihiro Kato

Abstract Background: Epigenome-wide association studies (EWAS) and differential gene expression analyses are generally performed on tissue samples, which consist of multiple cell types. Cell-type-specific effects of a trait, such as disease, on the omics expression are of interest but difficult or costly to measure experimentally. By measuring omics data for the bulk tissue, cell type composition of a sample can be inferred statistically. Subsequently, cell-type-specific effects are estimated by linear regression that includes terms representing the interaction between the cell type proportions and the trait. This approach involves two issues, scaling and multicollinearity.Results: First, although cell composition is analyzed in linear scale, differential methylation/expression is analyzed suitably in the logit/log scale. To simultaneously analyze two scales, we applied nonlinear regression. Second, we show that the interaction terms are highly collinear, which is obstructive to ordinary regression. To cope with the multicollinearity, we applied ridge regularization. In simulated data, nonlinear ridge regression attained well-balanced sensitivity, specificity and precision. Marginal model attained the lowest precision and highest sensitivity and was the only algorithm to detect weak signal in real data.Conclusion: Nonlinear ridge regression performed cell-type-specific association test on bulk omics data with well-balanced performance. The omicwas package for R implements nonlinear ridge regression for cell-type-specific EWAS, differential gene expression and QTL analyses. The software is freely available from https://github.com/fumi-github/omicwas


2020 ◽  
Author(s):  
Fumihiko Takeuchi ◽  
Norihiro Kato

Abstract Background: Epigenome-wide association studies (EWAS) and differential gene expression analyses are generally performed on tissue samples, which consist of multiple cell types. Cell-type-specific effects of a trait, such as disease, on the omics expression are of interest but difficult or costly to measure experimentally. By measuring omics data for the bulk tissue, cell type composition of a sample can be inferred statistically. Subsequently, cell-type-specific effects are estimated by linear regression that includes terms representing the interaction between the cell type proportions and the trait. This approach involves two issues, scaling and multicollinearity.Results: First, although cell composition is analyzed in linear scale, differential methylation/expression is analyzed suitably in the logit/log scale. To simultaneously analyze two scales, we developed nonlinear regression. Second, we show that the interaction terms are highly collinear, which is obstructive to ordinary regression. To cope with the multicollinearity, we applied ridge regularization. In simulated and real data, the improvement was modest by nonlinear regression and substantial by ridge regularization. Conclusion: Nonlinear ridge regression performed cell-type-specific association test on bulk omics data more robustly than previous methods. The omicwas package for R implements nonlinear ridge regression for cell-type-specific EWAS, differential gene expression and QTL analyses. The software is freely available from https://github.com/fumi-github/omicwas


2020 ◽  
Author(s):  
Fumihiko Takeuchi ◽  
Norihiro Kato

Abstract Background: Epigenome-wide association studies (EWAS) and differential gene expression analyses are generally performed on tissue samples, which consist of multiple cell types. Cell-type-specific effects of a trait, such as disease, on the omics expression are of interest but difficult or costly to measure experimentally. By measuring omics data for the bulk tissue, cell type composition of a sample can be inferred statistically. Subsequently, cell-type-specific effects are estimated by linear regression that includes terms representing the interaction between the cell type proportions and the trait. This approach involves two issues, scaling and multicollinearity.Results: First, although cell composition is analyzed in linear scale, differential methylation/expression is analyzed suitably in the logit/log scale. To simultaneously analyze two scales, we applied nonlinear regression. Second, we show that the interaction terms are highly collinear, which is obstructive to ordinary regression. To cope with the multicollinearity, we applied ridge regularization. In simulated data, nonlinear ridge regression attained well-balanced sensitivity, specificity and precision. In real data, nonlinear ridge regression detected signals consistently over the examined cases.Conclusion: Nonlinear ridge regression performed cell-type-specific association test on bulk omics data more robustly than previous methods. The omicwas package for R implements nonlinear ridge regression for cell-type-specific EWAS, differential gene expression and QTL analyses. The software is freely available from https://github.com/fumi-github/omicwas


2021 ◽  
Author(s):  
Dylan M Cable ◽  
Evan Murray ◽  
Vignesh Shanmugam ◽  
Simon Zhang ◽  
Michael Z Diao ◽  
...  

Spatial transcriptomics enables spatially resolved gene expression measurements at near single-cell resolution. There is a pressing need for computational tools to enable the detection of genes that are differentially expressed across tissue context for cell types of interest. However, changes in cell type composition across space and the fact that measurement units often detect transcripts from more than one cell type introduce complex statistical challenges. Here, we introduce a statistical method, Robust Cell Type Differential Expression (RCTDE), that estimates cell type-specific patterns of differential gene expression while accounting for localization of other cell types. By using general log-linear models, we provide a unified framework for defining and identifying gene expression changes for a wide-range of relevant contexts: changes due to pathology, anatomical regions, physical proximity to specific cell types, and cellular microenvironment. Furthermore, our approach enables statistical inference across multiple samples and replicates when such data is available. We demonstrate, through simulations and validation experiments on Slide-seq and MERFISH datasets, that our approach accurately identifies cell type-specific differential gene expression and provides valid uncertainty quantification. Lastly, we apply our method to characterize spatially-localized tissue changes in the context of disease. In an Alzheimer's mouse model Slide-seq dataset, we identify plaque-dependent patterns of cellular immune activity. We also find a putative interaction between tumor cells and myeloid immune cells in a Slide-seq tumor dataset. We make our RCTDE method publicly available as part of the open source R package https://github.com/dmcable/spacexr.


Epigenomics ◽  
2020 ◽  
Author(s):  
Yen-Chen A Feng ◽  
Yichen Guo ◽  
Lucile Pain ◽  
G Mark Lathrop ◽  
Catherine Laprise ◽  
...  

Aim: To develop a method for estimating cell-specific effects in epigenomic association studies in the presence of cell type heterogeneity. Materials & methods: We utilized Monte Carlo Expectation-Maximization (MCEM) algorithm with Metropolis–Hastings sampler to reconstruct the ‘missing’ cell-specific methylations and to estimate their associations with phenotypes free of confounding by cell type proportions. Results: Simulations showed reliable performance of the method under various settings including when the cell type is rare. Application to a real dataset recapitulated the directly measured cell-specific methylation pattern in whole blood. Conclusion: This work provides a framework to identify important cell groups and account for cell type composition useful for studying the role of epigenetic changes in human traits and diseases.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Abigail L. Pfaff ◽  
Vivien J. Bubb ◽  
John P. Quinn ◽  
Sulev Koks

AbstractThe development of Parkinson’s disease (PD) involves a complex interaction of genetic and environmental factors. Genome-wide association studies using extensive single nucleotide polymorphism datasets have identified many loci involved in disease. However much of the heritability of Parkinson’s disease is still to be identified and the functional elements associated with the risk to be determined and understood. To investigate the component of PD that may involve complex genetic variants we characterised the hominid specific retrotransposon SINE-VNTR-Alus (SVAs) in the Parkinson’s Progression Markers Initiative cohort utilising whole genome sequencing. We identified 81 reference SVAs polymorphic for their presence/absence, seven of which were associated with the progression of the disease and with differential gene expression in whole blood RNA sequencing data. This study highlights the importance of addressing SVA variants and potentially other types of retrotransposons in PD genetics, furthermore, these SVA elements should be considered as regulatory domains that could play a role in disease progression.


2019 ◽  
Author(s):  
Kelly M. Bakulski ◽  
John F. Dou ◽  
Robert C. Thompson ◽  
Christopher Lee ◽  
Lauren Y. Middleton ◽  
...  

AbstractBackgroundLead (Pb) exposure is ubiquitous and has permanent developmental effects on childhood intelligence and behavior and adulthood risk of dementia. The hippocampus is a key brain region involved in learning and memory, and its cellular composition is highly heterogeneous. Pb acts on the hippocampus by altering gene expression, but the cell type-specific responses are unknown.ObjectiveExamine the effects of perinatal Pb treatment on adult hippocampus gene expression, at the level of individual cells, in mice.MethodsIn mice perinatally exposed to control water (n=4) or a human physiologically-relevant level (32 ppm in maternal drinking water) of Pb (n=4), two weeks prior to mating through weaning, we tested for gene expression and cellular differences in the hippocampus at 5-months of age. Analysis was performed using single cell RNA-sequencing of 5,258 cells from the hippocampus by 10x Genomics Chromium to 1) test for gene expression differences averaged across all cells by treatment; 2) compare cell cluster composition by treatment; and 3) test for gene expression and pathway differences within cell clusters by treatment.ResultsGene expression patterns revealed 12 cell clusters in the hippocampus, mapping to major expected cell types (e.g. microglia, astrocytes, neurons, oligodendrocytes). Perinatal Pb treatment was associated with 12.4% more oligodendrocytes (P=4.4×10−21) in adult mice. Across all cells, differential gene expression analysis by Pb treatment revealed cluster marker genes. Within cell clusters, differential gene expression with Pb treatment (q<0.05) was observed in endothelial, microglial, pericyte, and astrocyte cells. Pathways up-regulated with Pb treatment were protein folding in microglia (P=3.4×10−9) and stress response in oligodendrocytes (P=3.2×10−5).ConclusionBulk tissue analysis may be confounded by changes in cell type composition and may obscure effects within vulnerable cell types. This study serves as a biological reference for future single cell studies of toxicant or neuronal complications, to ultimately characterize the molecular basis by which Pb influences cognition and behavior.


2020 ◽  
Author(s):  
Carlos Ruiz-Arenas ◽  
Carles Hernandez-Ferrer ◽  
Marta Vives-Usano ◽  
Sergi Marí ◽  
Inés Quintela ◽  
...  

AbstractBackgroundThe identification of expression quantitative trait methylation (eQTMs), defined as correlations between gene expression and DNA methylation levels, might help the biological interpretation of epigenome-wide association studies (EWAS). We aimed to identify autosomal cis-eQTMs in child blood, using data from 832 children of the Human Early Life Exposome (HELIX) project.MethodsBlood DNA methylation and gene expression were measured with the Illumina 450K and the Affymetrix HTA v2 arrays, respectively. The relationship between methylation levels and expression of nearby genes (transcription start site (TSS) within a window of 1 Mb) was assessed by fitting 13.6 M linear regressions adjusting for sex, age, and cohort.ResultsWe identified 63,831 autosomal cis-eQTMs, representing 35,228 unique CpGs and 11,071 unique transcript clusters (TCs, genes). 74.3% of these cis-eQTMs were located at <250 kb, 60.0% showed an inverse relationship and 23.9% had at least one genetic variant associated with the methylation and expression levels. They were enriched for active blood regulatory regions. Adjusting for cellular composition decreased the number of cis-eQTMs to 37.7%, suggesting that some of them were cell type-specific. The overlap of child blood cis-eQTMs with those described in adults was small, and child and adult shared cis-eQTMs tended to be proximal to the TSS, enriched for genetic variants and with lower cell type specificity. Only half of the cis-eQTMs could be captured through annotation to the closest gene.ConclusionsThis catalogue of blood autosomal cis-eQTMs in children can help the biological interpretation of EWAS findings, and is publicly available at https://helixomics.isglobal.org/.


Sign in / Sign up

Export Citation Format

Share Document