Gene Set to Diseases (GS2D): disease enrichment analysis on human gene sets with literature data

Large sets of candidate genes derived from high-throughput biological experiments can be characterized by functional enrichment analysis. The analysis consists of comparing the functions of one gene set against that of a background gene set. Then, functions related to a significant number of genes in the gene set are expected to be relevant. Web tools offering disease enrichment analysis on gene sets are often based on gene-disease associations from manually curated or experimental data that is accurate but does not cover all diseases discussed in the literature. Using associations automatically derived from literature data could be a cost effective method to improve the coverage of diseases for enrichment analysis at comparable levels of accuracy. We have implemented a method named Gene set to Diseases, GS2D, as a web tool performing disease enrichment analysis on human protein coding gene sets. It uses an automatically built dataset of more than 63 thousand gene-disease associations defined as statistically significant co-occurrences of genes and diseases in annotations of biomedical citations from PubMed. The dataset covers more diseases for enrichment analysis than the largest comparable curated database, Comparative Toxicogenomics Database, and its performance compared favourably to similar approaches based on manually curated or experimental data. Graphical and programmatic interfaces are available at http://cbdm.uni-mainz.de/geneset2diseases.

Download Full-text

MISIM v2.0: a web server for inferring microRNA functional similarity based on microRNA-disease associations

Nucleic Acids Research ◽

10.1093/nar/gkz328 ◽

2019 ◽

Vol 47 (W1) ◽

pp. W536-W541 ◽

Cited By ~ 12

Author(s):

Jianwei Li ◽

Shan Zhang ◽

Yanping Wan ◽

Yingshu Zhao ◽

Jiangcheng Shi ◽

...

Keyword(s):

Web Server ◽

Enrichment Analysis ◽

Fold Increase ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

Rna Molecules ◽

Novel Mirna ◽

Non Coding Rna ◽

Disease Associations ◽

Data Content

Abstract MicroRNAs (miRNAs) are one class of important small non-coding RNA molecules and play critical roles in health and disease. Therefore, it is important and necessary to evaluate the functional relationship of miRNAs and then predict novel miRNA-disease associations. For this purpose, here we developed the updated web server MISIM (miRNA similarity) v2.0. Besides a 3-fold increase in data content compared with MISIM v1.0, MISIM v2.0 improved the original MISIM algorithm by implementing both positive and negative miRNA-disease associations. That is, the MISIM v2.0 scores could be positive or negative, whereas MISIM v1.0 only produced positive scores. Moreover, MISIM v2.0 achieved an algorithm for novel miRNA-disease prediction based on MISIM v2.0 scores. Finally, MISIM v2.0 provided network visualization and functional enrichment analysis for functionally paired miRNAs. The MISIM v2.0 web server is freely accessible at http://www.lirmed.com/misim/.

Download Full-text

Improved cancer biomarkers identification using network-constrained infinite latent feature selection

PLoS ONE ◽

10.1371/journal.pone.0246668 ◽

2021 ◽

Vol 16 (2) ◽

pp. e0246668

Author(s):

Lihua Cai ◽

Honglong Wu ◽

Ke Zhou

Keyword(s):

Feature Selection ◽

Expression Profiles ◽

Biological Significance ◽

Feature Selection Method ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Functional Enrichment ◽

Gene Sets ◽

Significant Gene

Identifying biomarkers that are associated with different types of cancer is an important goal in the field of bioinformatics. Different researcher groups have analyzed the expression profiles of many genes and found some certain genetic patterns that can promote the improvement of targeted therapies, but the significance of some genes is still ambiguous. More reliable and effective biomarkers identification methods are then needed to detect candidate cancer-related genes. In this paper, we proposed a novel method that combines the infinite latent feature selection (ILFS) method with the functional interaction (FIs) network to rank the biomarkers. We applied the proposed method to the expression data of five cancer types. The experiments indicated that our network-constrained ILFS (NCILFS) provides an improved prediction of the diagnosis of the samples and locates many more known oncogenes than the original ILFS and some other existing methods. We also performed functional enrichment analysis by inspecting the over-represented gene ontology (GO) biological process (BP) terms and applying the gene set enrichment analysis (GSEA) method on selected biomarkers for each feature selection method. The enrichments analysis reports show that our network-constraint ILFS can produce more biologically significant gene sets than other methods. The results suggest that network-constrained ILFS can identify cancer-related genes with a higher discriminative power and biological significance.

Download Full-text

Graph analytics for phenome-genome associations inference

10.1101/682229 ◽

2019 ◽

Author(s):

Davide Cirillo ◽

Dario Garcia-Gasulla ◽

Ulises Cortés ◽

Alfonso Valencia

Keyword(s):

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

Specific Gene ◽

Proteus Syndrome ◽

Supporting Evidence ◽

Human Phenotype ◽

Biological Ontologies ◽

Statistical Framework ◽

Gene Sets

AbstractMotivationBiological ontologies, such as the Human Phenotype Ontology (HPO) and the Gene Ontology (GO), are extensively used in biomedical research to find enrichment in the annotations of specific gene sets. However, the interpretation of the encoded information would greatly benefit from methods that effectively interoperate between multiple ontologies providing molecular details of disease-related features.ResultsIn this work, we present a statistical framework based on graph theory to infer direct associations between HPO and GO terms that do not share co-annotated genes. The method enables to map genotypic features to phenotypic features thus providing a valid tool for bridging functional and pathological annotations. We validated the results by (a) supporting evidence of known drug-target associations (PanDrugs), protein-protein physical and functional interactions (BioGRID and STRING), and common pathways (Reactome); (b) comparing relationships inferred from early ontology releases with knowledge contained in the latest versions.ApplicationsWe applied our method to improve the interpretation of molecular processes involved in pathological conditions, illustrating the applicability of our predictions with a number of biological examples. In particular, we applied our method to expand the list of relevant genes from standard functional enrichment analysis of high-throughput experimental results in the context of comorbidities between Alzheimer’s disease, Lung Cancer and Glioblastoma. Moreover, we analyzed pathways linked to predicted phenotype-genotype associations getting insights into the molecular actors of cellular senescence in Proteus syndrome.Availabilityhttps://github.com/dariogarcia/phenotype-genotype_graph_characterization

Download Full-text

Toward comprehensive functional analysis of gene lists weighted by gene essentiality scores

10.1101/2021.04.26.441450 ◽

2021 ◽

Author(s):

Rui Fan ◽

Qinghua Cui

Keyword(s):

Functional Analysis ◽

Gene List ◽

Lung Squamous Cell Carcinoma ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

Hypergeometric Test ◽

Exact Test ◽

Gene Set ◽

Gene Functional Analysis

ABSTRACTGene functional enrichment analysis represents one of the most popular bioinformatics methods for annotating the pathways and function categories of a given gene list. Current algorithms for enrichment computation such as Fisher’s exact test and hypergeometric test totally depend on the category count numbers of the gene list and one gene set. In this case, whatever the genes are, they were treated equally. However, actually genes show different scores in their essentiality in a gene list and in a gene set. It is thus hypothesized that the essentiality scores could be important and should be considered in gene functional analysis. For this purpose, here we proposed WEAT (https://www.cuilab.cn/weat/), a weighted gene set enrichment algorithm and online tool by weighting genes using essentiality scores. We confirmed the usefulness of WEAT using two case studies, the functional analysis of one aging-related gene list and one gene list involved in Lung Squamous Cell Carcinoma (LUSC). Finally, we believe that the WEAT method and tool could provide more possibilities for further exploring the functions of given gene lists.

Download Full-text

Six-long Non-coding RNA Signature to Predict the Prognosis of Lung Adenocarcinoma

10.21203/rs.3.rs-56767/v1 ◽

2020 ◽

Author(s):

Yang Wang ◽

Chengping Hu

Keyword(s):

Lung Adenocarcinoma ◽

Cox Regression ◽

Survival Rates ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

The Cancer Genome Atlas ◽

Functional Enrichment ◽

Protein Coding ◽

Cox Regression Analysis ◽

Protein Coding Genes

Abstract Background: Long non-coding RNAs (lncRNAs) have been reported to play essential roles in tumorigenesis and cancers prognosis, and they can be a potential cancer prognostic markers. However, in lung adenocarcinoma(LUAD), how lncRNA signatures predict the survival of patients is poorly understood. Our study aims to explore lncRNA signatures and prognostic function in LUAD.Methods: The expression and prognosis data of lncRNAs in LUAD patients was collected from the Cancer Genome Atlas (TCGA) data. All analyses were performed using the R package (version 3.6.2). Metascape, STRING and Cytoscape were used for enrichment analysis and function prediction of the lncRNA co-expressed protein-coding genes.Results: We have collected lncRNA expression data in 466 LUAD tumors, and a six-lncRNA signature(RP11-79H23.3, RP11-309M7.1, CTD-2357A8.3, RP11-108P20.4, U47924.29, LHFPL3-AS2) has been shown to be significantly related to LUAD patients’ overall survival. According to the lncRNA signatures, the high-risk and low-risk groups were divided in LUAD patients with different survival rates. Further multivariable cox regression analysis showed that the prognostic value of this signature was independent of clinical factors. The potential functional roles and hub co-expressed protein-coding genes in the six prognostic lncRNAs are shown in the functional enrichment analysis.Conclusions: These results showed that these six lncRNAs could be independent predicted prognostic biomarkers in LUAD patients.

Download Full-text

Detection of pathways affected by positive selection in primate lineages ancestral to humans

10.1101/044941 ◽

2016 ◽

Cited By ~ 2

Author(s):

J.T. Daub ◽

S. Moretti ◽

I. I. Davidov ◽

L. Excoffier ◽

M. Robinson-Rechavi

Keyword(s):

Positive Selection ◽

Functional Enrichment ◽

Gene Trees ◽

Gene Set Enrichment ◽

Protein Coding ◽

Gene Set ◽

Branch Site ◽

Gene Sets ◽

Site Test ◽

Polygenic Selection

AbstractGene set enrichment approaches have been increasingly successful in finding signals of recent polygenic selection in the human genome. In this study, we aim at detecting biological pathways affected by positive selection in more ancient human evolutionary history. Focusing on four branches of the primate tree that lead to modern humans, we tested all available protein coding gene trees of the Primates clade for signals of adaptation in these branches, using the likelihood-based branch site test of positive selection. The results of these locus-specific tests were then used as input for a gene set enrichment test, where whole pathways are globally scored for a signal of positive selection, instead of focusing only on outlier “significant” genes. We identified signals of positive selection in several pathways that are mainly involved in immune response, sensory perception, metabolism, and energy production. These pathway-level results are highly significant, even though there is no functional enrichment when only focusing on top scoring genes. Interestingly, several gene sets are found significant at multiple levels in the phylogeny, but different genes are responsible for the selection signal in the different branches. This suggests that the same function has been optimized in different ways at different times in primate evolution.

Download Full-text

Identification and Functional Enrichment Analysis of Differentially Expressed Genes in Osteoarthritis

10.21203/rs.3.rs-31843/v1 ◽

2020 ◽

Author(s):

Chen Xu ◽

Ling-bing Meng ◽

Yu Xiao ◽

Yong Qiu ◽

Ying-jue Du ◽

...

Keyword(s):

Hydrogen Peroxide ◽

Differentially Expressed Genes ◽

Cellular Response ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Functional Enrichment ◽

Ppi Network ◽

Gene Set Enrichment ◽

Gene Set

Abstract Background Osteoarthritis (OA) is a chronic, progressive, inflammatory, degenerative disease, which has become an osteoarthropathy that seriously affects physical health and quality of life of elderly people. However, the etiology and pathogenesis of OA remains unclear. Therefore, the study purposed to utilize bioinformatics technology to perform identification and functional enrichment analysis of differentially expressed genes in osteoarthritis. Method The main methods of this study consist of access to microarray data (GSE82107 and GSE55235), identification of differently expressed genes (DEGs) by GEO2R between OA and normal synovium samples, enrichment analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) by Gene Set Enrichment Analysis (GSEA), construction and analysis of protein-protein interaction (PPI) network, significant module and hub genes. Result A total of 300 DEGs were identified, consisting of 64 up-regulated genes and 11 down-regulated genes in OA samples compared to normal synovium tissues. Gene set enrichment analysis of DEGs provided a comprehensive overview of some major pathophysiological mechanisms in OA: cellular response to hydrogen peroxide, P53 signaling pathway and so on. The study also built the PPI network, and a total of 10 key genes were identified: CYR61, PENK, GOLM1, DUSP1, ATF3, STC2, FOSB, PRSS23, TF, and TNC. Conclusion DEGs exists between OA patients and normal cartilage tissue, which may be involved in the related mechanism of OA development, especially cellular response to hydrogen peroxide and CYR61.

Download Full-text

NetGen: a novel network-based probabilistic generative model for gene set functional enrichment analysis

BMC Systems Biology ◽

10.1186/s12918-017-0456-7 ◽

2017 ◽

Vol 11 (S4) ◽

Cited By ~ 2

Author(s):

Duanchen Sun ◽

Yinliang Liu ◽

Xiang-Sun Zhang ◽

Ling-Yun Wu

Keyword(s):

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Generative Model ◽

Functional Enrichment ◽

Gene Set ◽

Probabilistic Generative Model

Download Full-text

Systems Level Analysis and Identification of Pathways and Key Genes Associated with Delirium

Genes ◽

10.3390/genes11101225 ◽

2020 ◽

Vol 11 (10) ◽

pp. 1225 ◽

Cited By ~ 1

Author(s):

Yukiko Takahashi ◽

Tomoyoshi Terada ◽

Yoshinori Muto

Keyword(s):

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

High Tendency ◽

Diagnostic Biomarkers ◽

Protein Protein Interaction ◽

Gene Sets ◽

Network Modules ◽

Complex Detection ◽

Human Accelerated Regions

Delirium is a complex pathophysiological process, and multiple contributing mechanisms have been identified. However, it is largely unclear how the genes associated with delirium contribute and which of them play key roles. In this study, the genes associated with delirium were retrieved from the Comparative Toxicogenomics Database (CTD) and integrated through a protein–protein interaction (PPI) network. Delirium-associated genes formed a highly interconnected PPI subnetwork, indicating a high tendency to interact and agglomerate. Using the Molecular Complex Detection (MCODE) algorithm, we identified the top two delirium-relevant network modules, M1 and M5, that have the most significant enrichments for the delirium-related gene sets. Functional enrichment analysis showed that genes related to neurotransmitter receptor activity were enriched in both modules. Moreover, analyses with genes located in human accelerated regions (HARs) provided evidence that HAR-Brain genes were overrepresented in the delirium-relevant network modules. We found that four of the HAR-Brain genes, namely APP, PLCB1, NPY, and HTR2A, in the M1 module were highly connected and appeared to exhibit hub properties, which might play vital roles in delirium development. Further understanding of the function of the identified modules and member genes could help to identify therapeutic intervention targets and diagnostic biomarkers for delirium.

Download Full-text

CEA: Combination-based gene set functional enrichment analysis

Scientific Reports ◽

10.1038/s41598-018-31396-4 ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 2

Author(s):

Duanchen Sun ◽

Yinliang Liu ◽

Xiang-Sun Zhang ◽

Ling-Yun Wu

Keyword(s):

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Functional Enrichment ◽

Gene Set

Download Full-text