scholarly journals Drug perturbation gene set enrichment analysis (dpGSEA): a new transcriptomic drug screening approach

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Mike Fang ◽  
Brian Richardson ◽  
Cheryl M. Cameron ◽  
Jean-Eudes Dazard ◽  
Mark J. Cameron

Abstract Background In this study, we demonstrate that our modified Gene Set Enrichment Analysis (GSEA) method, drug perturbation GSEA (dpGSEA), can detect phenotypically relevant drug targets through a unique transcriptomic enrichment that emphasizes biological directionality of drug-derived gene sets. Results We detail our dpGSEA method and show its effectiveness in detecting specific perturbation of drugs in independent public datasets by confirming fluvastatin, paclitaxel, and rosiglitazone perturbation in gastroenteropancreatic neuroendocrine tumor cells. In drug discovery experiments, we found that dpGSEA was able to detect phenotypically relevant drug targets in previously published differentially expressed genes of CD4+T regulatory cells from immune responders and non-responders to antiviral therapy in HIV-infected individuals, such as those involved with virion replication, cell cycle dysfunction, and mitochondrial dysfunction. dpGSEA is publicly available at https://github.com/sxf296/drug_targeting. Conclusions dpGSEA is an approach that uniquely enriches on drug-defined gene sets while considering directionality of gene modulation. We recommend dpGSEA as an exploratory tool to screen for possible drug targeting molecules.

2019 ◽  
Vol 8 (10) ◽  
pp. 1580 ◽  
Author(s):  
Kyoung Min Moon ◽  
Kyueng-Whan Min ◽  
Mi-Hye Kim ◽  
Dong-Hoon Kim ◽  
Byoung Kwan Son ◽  
...  

Ninety percent of patients with scrub typhus (SC) with vasculitis-like syndrome recover after mild symptoms; however, 10% can suffer serious complications, such as acute respiratory failure (ARF) and admission to the intensive care unit (ICU). Predictors for the progression of SC have not yet been established, and conventional scoring systems for ICU patients are insufficient to predict severity. We aimed to identify simple and robust indicators to predict aggressive behaviors of SC. We evaluated 91 patients with SC and 81 non-SC patients who were admitted to the ICU, and 32 cases from the public functional genomics data repository for gene expression analysis. We analyzed the relationships between several predictors and clinicopathological characteristics in patients with SC. We performed gene set enrichment analysis (GSEA) to identify SC-specific gene sets. The acid-base imbalance (ABI), measured 24 h before serious complications, was higher in patients with SC than in non-SC patients. A high ABI was associated with an increased incidence of ARF, leading to mechanical ventilation and worse survival. GSEA revealed that SC correlated to gene sets reflecting inflammation/apoptotic response and airway inflammation. ABI can be used to indicate ARF in patients with SC and assist with early detection.


Author(s):  
Konstantina Charmpi ◽  
Bernard Ycart

AbstractGene Set Enrichment Analysis (GSEA) is a basic tool for genomic data treatment. Its test statistic is based on a cumulated weight function, and its distribution under the null hypothesis is evaluated by Monte-Carlo simulation. Here, it is proposed to subtract to the cumulated weight function its asymptotic expectation, then scale it. Under the null hypothesis, the convergence in distribution of the new test statistic is proved, using the theory of empirical processes. The limiting distribution needs to be computed only once, and can then be used for many different gene sets. This results in large savings in computing time. The test defined in this way has been called Weighted Kolmogorov Smirnov (WKS) test. Using expression data from the GEO repository, tested against the MSig Database C2, a comparison between the classical GSEA test and the new procedure has been conducted. Our conclusion is that, beyond its mathematical and algorithmic advantages, the WKS test could be more informative in many cases, than the classical GSEA test.


2019 ◽  
Author(s):  
Rani K. Powers ◽  
Anthony Sun ◽  
James C. Costello

AbstractSummaryGSEA-InContext Explorer is a Shiny app that allows users to perform two methods of gene set enrichment analysis (GSEA). The first, GSEAPreranked, applies the GSEA algorithm in which statistical significance is estimated from a null distribution of enrichment scores generated for randomly permuted gene sets. The second, GSEA-InContext, incorporates a user-defined set of background experiments to define the null distribution and calculate statistical significance. GSEA-InContext Explorer allows the user to build custom background sets from a compendium of over 5,700 curated experiments, run both GSEAPreranked and GSEA-InContext on their own uploaded experiment, and explore the results using an interactive interface. This tool will allow researchers to visualize gene sets that are commonly enriched across experiments and identify gene sets that are uniquely significant in their experiment, thus complementing current methods for interpreting gene set enrichment results.Availability and implementationThe code for GSEA-InContext Explorer is available at: https://github.com/CostelloLab/GSEA-InContext_Explorer and the interactive tool is at: http://gsea-incontext_explorer.ngrok.io


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 2833-2833
Author(s):  
Xiao J. Yan ◽  
Daniel Kalenscher ◽  
Erin Boyle ◽  
Sophia Yancopoulos ◽  
Rajendra N Damle ◽  
...  

Abstract Abstract 2833 Introduction: In chronic lymphocytic leukemia (CLL), clonally expanded CD5+ B lymphocytes eventually overwhelm healthy immune cells, hindering normal immune function. To determine mechanisms fueling this expansion, gene expression data were gathered by microarray analysis of cells from CLL patients. Samples were grouped based on Ki-67 expression, an indicator of proliferation. To determine mechanisms correlating with B-cell proliferation and impacting on CLL B-cell biology, microarray profiles were compared using Gene Set Enrichment Analysis (GSEA) [Subramanian A, et al. PNAS 2005]. Methods: Samples were analyzed for intracellular expression of Ki-67 by flow cytometry and divided into 2 groups based on Ki-67 expression (cutoff at 5%). RNA was then purified from CD5+CD19+ CLL cells and gene expression microarray assays were performed using Illumina HumanHT12 beadchips. GSEA was carried out using a library of signatures by Dr. Louis Staudt [Shaffer AL, et al. Immunol Rev 2006] containing 305 gene sets encompassing 13, 564 genes biased towards hematopoietic signatures. Results: Of 61 cases, 14 were Ki-67high and 47 were Ki-67low. When time-to-first-treatment (TTFT) was compared between the groups, Ki67high patients had significantly shorter TTFT (2.76 yrs) compared to Ki-67low patients (23.46 yrs; P<0.0001). By GSEA, we determined 255/285 gene sets were upregulated in the Ki-67high group with 50 gene sets significantly enriched at a false discovery rate (FDR) <25%. For the Ki-67low group, 30/285 gene sets were upregulated with only one significant at FDR <25%. IGHV unmutated CLL (U-CLL) was enriched in only one gene set, termed CLLUNMUT-1, while mutated CLL (M-CLL) was only enriched in CLLMUT-1. CD38high and CD38low subsets were similarly enriched in these two gene sets, with 4 additional gene sets in the CD38high group, including MYD88UP-4 and IFN-2. Of the 50 significantly enriched gene sets in the Ki-67high group, 17 relate to signaling pathways, 16 to cellular differentiation, 6 to cellular processes, 4 to transcription factor targets, and the remaining 7 relate to cancer. Of these, the percentage of the signaling component is up 13% from its representation in the original Staudt library. The top 5 gene sets enriched in the Ki-67high group are: upregulated U-CLL compared to M-CLL (CLLUNMUT-1), myeloid tissue compared to other tissues (MYELOID-1), T cell cytokine induced proliferation (TCYTUP-8), BCR crosslinking CLL B cells (CLLBCRUP-1) and BDCA4+ dendritic cells compared to other hematopoietic cells (DC-1). The total number of genes enriched in these 50 sets is 769, with 217 genes shared in two or more gene sets. Twenty genes were enriched in the CLL BCR signature, CLLBCRUP-1 [Herishanu Y, et al. Blood 2011]. Of these, WARS, IRF4, MX1, OAS1, and NAMPT are also enriched in the T cell cytokine induced and T cell activation signatures. Only one gene set was enriched in the Ki-67low group, CLLMUT-1, upregulated in M-CLL compared to U-CLL. CD274 (PD-L1) was consistently elevated in the Ki-67low group in all the patients, irrespective of IGHV mutation status. Discussion: The observed GSEA profiles in Ki-67high patients correlated with gene signatures biased towards BCR signaling, signal transduction, and hematopoietic cancer, consistent with the Ki-67high group containing more (recently) proliferating cells influenced at least in part by BCR signaling. The profiles also suggest that additional cells (T lymphocytes and dendritic cells) may be involved. It is notable these gene sets were not observed for CLL patients subgrouped by IGHV mutation status or by CD38, and that these other subsets did not show as pronounced a distinction by GSEA profiling. Disclosures: No relevant conflicts of interest to declare.


2019 ◽  
Author(s):  
Ludwig Geistlinger ◽  
Gergely Csaba ◽  
Mara Santarelli ◽  
Marcel Ramos ◽  
Lucas Schiffer ◽  
...  

AbstractBackgroundAlthough gene set enrichment analysis has become an integral part of high-throughput gene expression data analysis, the assessment of enrichment methods remains rudimentary and ad hoc. In the absence of suitable gold standards, evaluations are commonly restricted to selected data sets and biological reasoning on the relevance of resulting enriched gene sets. However, this is typically incomplete and biased towards the goals of individual investigations.ResultsWe present a general framework for standardized and structured benchmarking of enrichment methods based on defined criteria for applicability, gene set prioritization, and detection of relevant processes. This framework incorporates a curated compendium of 75 expression data sets investigating 42 different human diseases. The compendium features microarray and RNA-seq measurements, and each dataset is associated with a precompiled GO/KEGG relevance ranking for the corresponding disease under investigation. We perform a comprehensive assessment of 10 major enrichment methods on the benchmark compendium, identifying significant differences in (i) runtime and applicability to RNA-seq data, (ii) fraction of enriched gene sets depending on the type of null hypothesis tested, and (iii) recovery of the a priori defined relevance rankings. Based on these findings, we make practical recommendations on (i) how methods originally developed for microarray data can efficiently be applied to RNA-seq data, (ii) how to interpret results depending on the type of gene set test conducted, and (iii) which methods are best suited to effectively prioritize gene sets with high relevance for the phenotype investigated.ConclusionWe carried out a systematic assessment of existing enrichment methods, and identified best performing methods, but also general shortcomings in how gene set analysis is currently conducted. We provide a directly executable benchmark system for straightforward assessment of additional enrichment methods.Availabilityhttp://bioconductor.org/packages/GSEABenchmarkeR


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 3448-3448
Author(s):  
Harumi Kato ◽  
Kazuhito Yamamoto ◽  
Kennosuke Karube ◽  
Miyuki Katayama ◽  
Shinobu Tsuzuki ◽  
...  

Abstract Abstract 3448 Age-related EBV-associated B-cell lymphoproliferative disorder (AR-EBLPD) is classified as a subtype of diffuse large cell lymphoma (DLBCL) according to the WHO classification. However, molecular genetic characterization of AR-EBLPD remains largely unknown. We studied expression profiles of 5 AR-EBLPD and 8 EB-negative DLBCL samples using the Agilent 44K human oligonucleotide microarray. Total RNA was extracted from fresh-frozen tumor samples. Each microarray slide was converted into datasets using the Agilent Micro Array Scanner and Feature extractions. Data was standardized with Z-scores. Differences in mRNA expression levels between two sample groups were calculated using a two-sided t-test. A total of 1973 probes showed a p-value less than 0.05 with less than a 25% false discovery rate (FDR). These probes included 1688 genes. The number of probes showing high expression in AR-EBLPD and EB-negative DLBCL was 804 (693 genes) and 1169 (995 genes), respectively. First, we selected the top 300 differentially expressed genes. Genes highly expressed in AR-EBLPD included IL6, TNFAIP3, HOPX, and SLAMF1. IL6 is known as a gene encoding a cytokine which functions in inflammation and the maturation of B lymphocytes, and TNFAIP3 is known as a negative regulatory gene of the NF-kB pathway. HOPX and SLAMF1 are reported as genes related to lymphocyte function or the immune system (Schwartzberg et al. Nature immunology 2009, Hawiger et al. Nature immunology 2011). For better characterization, we next performed Gene Ontology Analysis using the WEB-based GEne SeT AnaLysis Toolkit and found that categories of external stimulus and inflammatory responses were enriched in AR-EBLPD. The Kyoto Encyclopedia of Genes and Genomes (KEGG)-signaling analyses showed that pathways of the NOD-like receptor (p-value =1.30e-06), JAK-STAT (p-value =9.01e-06), and Toll-like receptor (p-value =0.0002) were characteristic of AR-EBLPD. These results implied that inflammation would be prominent in AR-EBLPD cases. For validation, we next performed Gene Set Enrichment Analysis (GSEA) using all the database of KEGG pathways (186 gene sets). Dominant gene sets in AR-EBLPD included the cytokine-cytokine receptor interaction [Normalized Enrichment Score (NES) =2.66, p-value<0.001], NOD-like receptor pathway (NES =2.26, p-value<0.001), TOLL-like receptor pathway (NES =2.14, p-value<0.001), and JAK-STAT pathway (NES =1.79, p-value<0.001). Since all the pathways were related to the NF-kB pathway, inflammatory responses were suggested to activate the NF-kB pathway or vice versa. For confirmation, we finally performed GSEA using gene sets of the NF-kB pathway, which were obtained from a gene set reported by an NIH group (Puente et al. Nature 2011) and 30 gene sets in the GSEA database, and found that the gene sets of the NF-kB pathway were enriched in AR-EBLPD (Figure 1). Our results suggested that the inflammatory and immune-related genes were enriched in AR-EBLPD and that activation of the genes may be associated with NF-kB activation. Aberrant immune and inflammatory responses could define the clinical presentations of AR-EBLPD cases. (Figure 1) Gene Set Enrichment Analysis of 5 AR-EBLPD and 8 EB-negative DLBCL samples. The NF-kB signature reported from an NIH group (Puente et al. Nature 2011) was enriched in AR-EBLPD [Normalized Enrichment Score (NES) =2.20, p-value<0.001]. Disclosures: No relevant conflicts of interest to declare.


2020 ◽  
Author(s):  
Roger A Sunde

ABSTRACT Background Better biomarkers of selenium (Se) status and a better understanding of toxic Se biochemistry are needed to set safe dietary upper limits. In previous studies, differential expression (DE) of individual liver transcripts in rats and turkeys failed to identify a single transcript that was consistently and significantly (q &lt; 0.05) altered by high Se. Objectives To evaluate the effect of Se status on rat liver transcript expression data at the level of gene sets, and to compare transcript expression in rats with that in turkeys to identify common regulated transcripts. Methods Gene set enrichment analysis (GSEA) was conducted on liver from weanling rats fed an Se-deficient basal diet (0.005 μg Se/g) supplemented with 0, 0.24 (Se-adequate), 2, or 5 μg Se/g diet as selenite for 28 d. In addition, transcript expression was compared with liver expression in turkeys fed 0, 0.4, 2, or 5 μg Se/g diet as selenite. Results Se deficiency significantly downregulated the rat selenoprotein gene set but also upregulated gene sets for a variety of pathways, processes, and disease states. GSEA of 2 compared with 0.24 μg Se/g found no significantly up- or downregulated gene sets, showing that 2 μg Se/g is not particularly toxic to the rat. GSEA analysis of 5 compared with 0.24 μg Se/g transcripts, however, found 27 significantly upregulated gene sets for a wide variety of conditions. Cross-species GSEA comparison of transcript expression, however, identified no common gene sets significantly and consistently regulated by high Se in rats and turkeys. In addition, comparison of individual marginally significant (unadjusted P &lt; 0.05) DE transcripts between rats and turkeys also failed to find common transcripts. Conclusions The dramatic increase in significant liver transcript DE and GSEA gene sets in rats fed 5 compared with 2 μg Se/g clearly appears to be a biomarker for Se toxicity, albeit not Se-specific. These analyses, however, failed to identify specific transcripts or pathways, biological states, or processes that were directly linked with high Se status, strongly indicating that adaptation to high Se lies outside transcriptional regulation.


2020 ◽  
Author(s):  
Menglan Cai ◽  
Canh Hao Nguyen ◽  
Hiroshi Mamitsuka ◽  
Limin Li

AbstractGene set enrichment analysis (GSEA) has been widely used to identify gene sets with statistically significant difference between cases and controls against a large gene set. GSEA needs both phenotype labels and expression of genes. However, gene expression are assessed more often for model organisms than minor species. More importantly, gene expression could not be measured under specific conditions for human, due to high healthy risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus predicting enrichment significance (on a phenotype) of a given gene set of a species (target, say human), by using gene expression measured under the same phenotype of the other species (source, say mouse) is a vital and challenging problem, which we call CROSS-species Gene Set Enrichment Problem (XGSEP). For XGSEP, we propose XGSEA (Cross-species Gene Set Enrichment Analysis), with three steps of: 1) running GSEA for a source species to obtain enrichment scores and p-values of source gene sets; 2) representing the relation between source and target gene sets by domain adaptation; and 3) using regression to predict p-values of target gene sets, based on the representation in 2). We extensively validated XGSEA by using four real data sets under various settings, proving that XGSEA significantly outperformed three baseline methods. A case study of identifying important human pathways for T cell dysfunction and reprogramming from mouse ATAC-Seq data further confirmed the reliability of XGSEA. Source code is available through https://github.com/LiminLi-xjtu/XGSEAAuthor summaryGene set enrichment analysis (GSEA) is a powerful tool in the gene sets differential analysis given a ranked gene list. GSEA requires complete data, gene expression with phenotype labels. However, gene expression could not be measured under specific conditions for human, due to high risk of direct experiments, such as non-approved treatment or gene knockout, and then often substituted by mouse. Thus no availability of gene expression leads to more challenging problem, CROSS-species Gene Set Enrichment Problem (XGSEP), in which enrichment significance (on a phenotype) of a given gene set of a species (target, say human) is predicted by using gene expression measured under the same phenotype of the other species (source, say mouse). In this work, we propose XGSEA (Cross-species Gene Set Enrichment Analysis) for XGSEP, with three steps of: 1) GSEA; 2) domain adaptation; and 3) regression. The results of four real data sets and a case study indicate that XGSEA significantly outperformed three baseline methods and confirmed the reliability of XGSEA.


2016 ◽  
Author(s):  
Yan Tan ◽  
Jernej Godec ◽  
Felix Wu ◽  
Pablo Tamayo ◽  
Jill P. Mesirov ◽  
...  

AbstractGene set enrichment analysis (GSEA) is a widely employed method for analyzing gene expression profiles. The approach uses annotated sets of genes, identifies those that are coordinately up‐ or down-regulated in a biological comparison of interest, and thereby elucidates underlying biological processes relevant to the comparison. As the number of gene sets available in various collections for enrichment analysis has grown, the resulting lists of significant differentially regulated gene sets may also become larger, leading to the need for additional downstream analysis of GSEA results. Here we present a method that allows the rapid identification of a small number of co-regulated groups of genes – “leading edge metagenes” (LEMs) - from high scoring sets in GSEA results. LEM are sub-signatures which are common to multiple gene sets and that “explain” their enrichment specific to the experimental dataset of interest. We show that LEMs contain more refined lists of context-dependent and biologically meaningful genes than the parental gene sets. LEM analysis of the human vaccine response using a large database of immune signatures identified core biological processes induced by five different vaccines in datasets from human peripheral blood mononuclear cells (PBMC). Further study of these biological processes over time following vaccination showed that at day 3 post-vaccination, vaccines derived from viruses or viral subunits exhibit patterns of biological processes that are distinct from protein conjugate vaccines; however, by day 7 these differences were less pronounced. This suggests that the immune response to diverse vaccines eventually converge to a common transcriptional response. LEM analysis can significantly reduce the dimensionality of enriched gene sets, improve the identification of core biological processes active in a comparison of interest, and simplify the biological interpretation of GSEA results.Author SummaryGenome-wide expression profiling is a widely used tool to identify biological mechanisms in a comparison of interest. One analytic method, Gene set enrichment analysis (GSEA) uses annotated sets of genes and identifies those that are coordinately up‐ or down-regulated in a biological comparison of interest. This approach capitalizes on the fact that alternations in biological processes often cause the coordinated change of a large number of genes. However, as the number of gene sets available in various collections for enrichment analysis has grown, the resulting lists of significant differentially regulated gene sets may also become larger, leading to the need for additional downstream analysis of GSEA results. Here we present a method that allows the identification of a small number of co-regulated groups of genes – “leading edge metagenes” (LEMs) – from high scoring sets in GSEA results. We show that LEMs contain more refined lists of context-dependent biologically meaningful genes than the parental gene sets and demonstrate the utility of this approach in analyzing the transcriptional response to vaccination. LEM analysis can significantly reduce the dimensionality of enriched gene sets, improve the identification of core biological processes active in a comparison of interest, and facilitate the biological interpretation of GSEA results.


2018 ◽  
Author(s):  
Rani K. Powers ◽  
Andrew Goodspeed ◽  
Harrison Pielke-Lombardo ◽  
Aik-Choon Tan ◽  
James C. Costello

AbstractMotivationGene Set Enrichment Analysis (GSEA) is routinely used to analyze and interpret coordinate changes in transcriptomics experiments. For an experiment where less than seven samples per condition are compared, GSEA employs a competitive null hypothesis to test significance. A gene set enrichment score is tested against a null distribution of enrichment scores generated from permuted gene sets, where genes are randomly selected from the input experiment. Looking across a variety of biological conditions, however, genes are not randomly distributed with many showing consistent patterns of up- or down-regulation. As a result, common patterns of positively and negatively enriched gene sets are observed across experiments. Placing a single experiment into the context of a relevant set of background experiments allows us to identify both the common and experiment-specific patterns of gene set enrichment.ResultsWe compiled a compendium of 442 small molecule transcriptomic experiments and used GSEA to characterize common patterns of positively and negatively enriched gene sets. To identify experiment-specific gene set enrichment, we developed the GSEA-InContext method that accounts for gene expression patterns within a user-defined background set of experiments to identify statistically significantly enriched gene sets. We evaluated GSEA-InContext on experiments using small molecules with known targets and show that it successfully prioritizes gene sets that are specific to each experiment, thus providing valuable insights that complement standard GSEA analysis.Availability and ImplementationGSEA-InContext is implemented in Python. Code, the background expression compendium, and results are available at: https://github.com/CostelloLab/GSEA-InContext


Sign in / Sign up

Export Citation Format

Share Document