Graphlet Laplacians: graphlet-based neighbourhoods highlight topology-function and topology-disease relationships

Mapping Intimacies ◽

10.1101/460964 ◽

2018 ◽

Author(s):

Sam F. L. Windels ◽

Noël Malod-Dognin ◽

Nataša Pržulj

Keyword(s):

Biological Networks ◽

Spectral Clustering ◽

Biological Information ◽

Driver Genes ◽

Topological Information ◽

Cancer Driver ◽

Laplacian Matrices ◽

And Topology ◽

Local Topology ◽

Pan Cancer

AbstractMotivationLaplacian matrices capture the global structure of networks and are widely used to study biological networks. However, the local structure of the network around a node can also capture biological information. Local wiring patterns are typically quantified by counting how often a node touches different graphlets (small, connected, induced sub-graphs). Currently available graphlet-based methods do not consider whether nodes are in the same network neighbourhood.ContributionTo combine graphlet-based topological information and membership of nodes to the same network neighbourhood, we generalize the Laplacian to the Graphlet Laplacian, by considering a pair of nodes to be ‘adjacent’ if they simultaneously touch a given graphlet.ResultsWe utilize Graphlet Laplacians to generalize spectral embedding, spectral clustering and network diffusion. Applying our generalization of spectral clustering to model networks and biological networks shows that Graphlet Laplacians capture different local topology corresponding to the underlying graphlet. In biological networks, clusters obtained by using different Graphlet Laplacians capture complementary sets of biological functions. By diffusing pan-cancer gene mutation scores based on different Graphlet Laplacians, we find complementary sets of cancer driver genes. Hence, we demonstrate that Graphlet Laplacians capture topology-function and topology-disease relationships in biological networks

Download Full-text

Graphlet Laplacians for topology-function and topology-disease relationships

Bioinformatics ◽

10.1093/bioinformatics/btz455 ◽

2019 ◽

Vol 35 (24) ◽

pp. 5226-5234 ◽

Cited By ~ 2

Author(s):

Sam F L Windels ◽

Noël Malod-Dognin ◽

Nataša Pržulj

Keyword(s):

Biological Networks ◽

Spectral Clustering ◽

Biological Information ◽

Supplementary Information ◽

Biological Functions ◽

Topological Information ◽

Laplacian Matrices ◽

And Topology ◽

Spectral Embedding ◽

Pan Cancer

Abstract Motivation Laplacian matrices capture the global structure of networks and are widely used to study biological networks. However, the local structure of the network around a node can also capture biological information. Local wiring patterns are typically quantified by counting how often a node touches different graphlets (small, connected, induced sub-graphs). Currently available graphlet-based methods do not consider whether nodes are in the same network neighbourhood. To combine graphlet-based topological information and membership of nodes to the same network neighbourhood, we generalize the Laplacian to the Graphlet Laplacian, by considering a pair of nodes to be ‘adjacent’ if they simultaneously touch a given graphlet. Results We utilize Graphlet Laplacians to generalize spectral embedding, spectral clustering and network diffusion. Applying Graphlet Laplacian-based spectral embedding, we visually demonstrate that Graphlet Laplacians capture biological functions. This result is quantified by applying Graphlet Laplacian-based spectral clustering, which uncovers clusters enriched in biological functions dependent on the underlying graphlet. We explain the complementarity of biological functions captured by different Graphlet Laplacians by showing that they capture different local topologies. Finally, diffusing pan-cancer gene mutation scores based on different Graphlet Laplacians, we find complementary sets of cancer-related genes. Hence, we demonstrate that Graphlet Laplacians capture topology-function and topology-disease relationships in biological networks. Availability and implementation http://www0.cs.ucl.ac.uk/staff/natasa/graphlet-laplacian/index.html Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ATAC-seq identifies thousands of extrachromosomal circular DNA in cancer and cell lines

Science Advances ◽

10.1126/sciadv.aba2489 ◽

2020 ◽

Vol 6 (20) ◽

pp. eaba2489 ◽

Cited By ~ 4

Author(s):

Pankaj Kumar ◽

Shashi Kiran ◽

Shekhar Saha ◽

Zhangli Su ◽

Teressa Paulsen ◽

...

Keyword(s):

Cell Lines ◽

Inverse Pcr ◽

Driver Genes ◽

Circular Dna ◽

Cancer Driver ◽

Genome Wide ◽

Number Variation ◽

Circular Dnas ◽

Tumor Types ◽

Pan Cancer

Extrachromosomal circular DNAs (eccDNAs) are somatically mosaic and contribute to intercellular heterogeneity in normal and tumor cells. Because short eccDNAs are poorly chromatinized, we hypothesized that they are sequenced by tagmentation in ATAC-seq experiments without any enrichment of circular DNA. Indeed, ATAC-seq identified thousands of eccDNAs in cell lines that were validated by inverse PCR and by metaphase FISH. ATAC-seq in gliomas and glioblastomas identify hundreds of eccDNAs, including one containing the well-known EGFR gene amplicon from chr7. More than 18,000 eccDNAs, many carrying known cancer driver genes, are identified in a pan-cancer analysis of ATAC-seq libraries from 23 tumor types. Somatically mosaic eccDNAs are identified by ATAC-seq even before amplification is recognized by genome-wide copy number variation measurements. Thus, ATAC-seq is a sensitive method to detect eccDNA present in a tumor at the pre-amplification stage and can be used to predict resistance to therapy.

Download Full-text

A pan-cancer transcriptome analysis of exitron splicing identifies novel cancer driver genes and neoepitopes

Molecular Cell ◽

10.1016/j.molcel.2021.03.028 ◽

2021 ◽

Author(s):

Ting-You Wang ◽

Qi Liu ◽

Yanan Ren ◽

Sk. Kayum Alam ◽

Li Wang ◽

...

Keyword(s):

Transcriptome Analysis ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Transcriptome ◽

Cancer Driver Genes ◽

Pan Cancer

Download Full-text

ModulOmics: Integrating Multi-Omics Data to Identify Cancer Driver Modules

10.1101/288399 ◽

2018 ◽

Cited By ~ 1

Author(s):

Dana Silverbush ◽

Simona Cristea ◽

Gali Yanovich ◽

Tamar Geiger ◽

Niko Beerenwinkel ◽

...

Keyword(s):

Cancer Progression ◽

Protein Interactions ◽

Molecular Mechanisms ◽

De Novo ◽

Optimization Procedure ◽

Biological Information ◽

Data Types ◽

Driver Genes ◽

Cancer Subtypes ◽

Cancer Driver

AbstractThe identification of molecular pathways driving cancer progression is a fundamental unsolved problem in tumorigenesis, which can substantially further our understanding of cancer mechanisms and inform the development of targeted therapies. Most current approaches to address this problem use primarily somatic mutations, not fully exploiting additional layers of biological information. Here, we describe ModulOmics, a method to de novo identify cancer driver pathways, or modules, by integrating multiple data types (protein-protein interactions, mutual exclusivity of mutations or copy number alterations, transcriptional co-regulation, and RNA co-expression) into a single probabilistic model. To efficiently search the exponential space of candidate modules, ModulOmics employs a two-step optimization procedure that combines integer linear programming with stochastic search. Across several cancer types, ModulOmics identifies highly functionally connected modules enriched with cancer driver genes, outperforming state-of-the-art methods. For breast cancer subtypes, the inferred modules recapitulate known molecular mechanisms and suggest novel subtype-specific functionalities. These findings are supported by an independent patient cohort, as well as independent proteomic and phosphoproteomic datasets.

Download Full-text

Integration of multiple networks and pathways identifies cancer driver genes in pan-cancer analysis

BMC Genomics ◽

10.1186/s12864-017-4423-x ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 19

Author(s):

Claudia Cava ◽

Gloria Bertoli ◽

Antonio Colaprico ◽

Catharina Olsen ◽

Gianluca Bontempi ◽

...

Keyword(s):

Driver Genes ◽

Cancer Driver ◽

Multiple Networks ◽

Cancer Driver Genes ◽

Pan Cancer

Download Full-text

LOTUS: a Single- and Multitask Machine Learning Algorithm for the Prediction of Cancer Driver Genes

10.1101/398537 ◽

2018 ◽

Cited By ~ 1

Author(s):

Olivier Collier ◽

Véronique Stoven ◽

Jean-Philippe Vert

Keyword(s):

Machine Learning ◽

Biological Networks ◽

Learning Strategy ◽

Gene Prediction ◽

Scoring Function ◽

Cancer Genes ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Types ◽

Cancer Driver Genes

AbstractCancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types.In this paper we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including informations about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types.We empirically show that LOTUS outperforms three other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types.Author summaryCancer development is driven by mutations and dysfunction of important, so-called cancer driver genes, that could be targeted by targeted therapies. While a number of such cancer genes have already been identified, it is believed that many more remain to be discovered. To help prioritize experimental investigations of candidate genes, several computational methods have been proposed to rank promising candidates based on their mutations in large cohorts of cancer cases, or on their interactions with known driver genes in biological networks. We propose LOTUS, a new computational approach to identify genes with high oncogenic potential. LOTUS implements a machine learning approach to learn an oncogenic potential score from known driver genes, and brings two novelties compared to existing methods. First, it allows to easily combine heterogeneous informations into the scoring function, which we illustrate by learning a scoring function from both known mutations in large cancer cohorts and interactions in biological networks. Second, using a multitask learning strategy, it can predict different driver genes for different cancer types, while sharing information between them to improve the prediction for every type. We provide experimental results showing that LOTUS significantly outperforms several state-of-the-art cancer gene prediction softwares.

Download Full-text

MEXCOWalk: Mutual Exclusion and Coverage Based Random Walk to Identify Cancer Modules

10.1101/547653 ◽

2019 ◽

Author(s):

Rafsan Ahmed ◽

Ilyes Baali ◽

Cesim Erten ◽

Evis Hoxha ◽

Hilal Kazan

Keyword(s):

Random Walk ◽

Mutual Exclusion ◽

Risk Scores ◽

Cancer Genes ◽

Multiple Cancer ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Data ◽

Cancer Types ◽

Pan Cancer

AbstractMotivationGenomic analyses from large cancer cohorts have revealed the mutational heterogeneity problem which hinders the identification of driver genes based only on mutation profiles. One way to tackle this problem is to incorporate the fact that genes act together in functional modules. The connectivity knowledge present in existing protein-protein interaction networks together with mutation frequencies of genes and the mutual exclusivity of cancer mutations can be utilized to increase the accuracy of identifying cancer driver modules.ResultsWe present a novel edge-weighted random walk-based approach that incorporates connectivity information in the form of protein-protein interactions, mutual exclusion, and coverage to identify cancer driver modules. MEXCOWalk outperforms several state-of-the-art computational methods on TCGA pan-cancer data in terms of recovering known cancer genes, providing modules that are capable of classifying normal and tumor samples, and that are enriched for mutations in specific cancer types. Furthermore, the risk scores determined with output modules can stratify patients into low-risk and high-risk groups in multiple cancer types. MEXCOwalk identifies modules containing both well-known cancer genes and putative cancer genes that are rarely mutated in the pan-cancer data. The data, the source code, and useful scripts are available at:https://github.com/abu-compbio/[email protected]

Download Full-text

DriveWays: A Method for Identifying Possibly Overlapping Driver Pathways in Cancer

10.1101/2020.04.01.015388 ◽

2020 ◽

Author(s):

Ilyes Baali ◽

Cesim Erten ◽

Hilal Kazan

Keyword(s):

Optimization Problem ◽

Network Connectivity ◽

Supplementary Information ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Data ◽

Cancer Drivers ◽

Definition Of ◽

Almost All ◽

Pan Cancer

AbstractMotivationThe majority of the previous methods for identifying cancer driver modules output non-overlapping modules. This assumption is biologically inaccurate as genes can participate in multiple molecular pathways. This is particularly true for cancer-associated genes as many of them are network hubs connecting functionally distinct set of genes. It is important to provide combinatorial optimization problem definitions modeling this biological phenomenon and to suggest efficient algorithms for its solution.ResultsWe provide a formal definition of the Overlapping Driver Module Identification in Cancer (ODMIC) problem. We show that the problem is NP-hard. We propose a seed-and-extend based heuristic named DriveWays that identifies overlapping cancer driver modules from the graph built from the IntAct PPI network. DriveWays incorporates mutual exclusivity, coverage, and the network connectivity information of the genes.We show that DriveWays outperforms the state-of-the-art methods in recovering well-known cancer driver genes performed on TCGA pan-cancer data. Additionally, DriveWays’s output modules show a stronger enrichment for the reference pathways in almost all cases. Overall, we show that enabling modules to overlap improves the recovery of functional pathways filtered with known cancer drivers, which essentially constitute the reference set of cancer-related pathways.AvailabilityThe data, the source code, and useful scripts are available at: https://github.com/abu-compbio/DriveWaysSupplementary informationSupplementary data are available at Biorxiv.

Download Full-text

A Pan-cancer catalogue of driver protein interaction interfaces

10.1101/015883 ◽

2015 ◽

Cited By ~ 1

Author(s):

Eduard Porta-Pardo ◽

Thomas Hrabe ◽

Adam Godzik

Keyword(s):

Protein Interaction ◽

Specific Protein ◽

The Cancer Genome Atlas ◽

Driver Genes ◽

Cancer Driver ◽

Protein Protein Interaction ◽

Cancer Mutations ◽

Cancer Genome Atlas ◽

Mutation Pattern ◽

Pan Cancer

Despite their critical importance in maintaining the integrity of all cellular pathways, the specific role of mutations on protein-protein interaction (PPI) interfaces as cancer drivers, though known for some specific examples, has not been systematically studied. We analyzed missense somatic mutations in a pan-cancer cohort of 5,989 tumors from 23 projects of The Cancer Genome Atlas (TCGA) for enrichment on PPI interfaces using e-Driver, an algorithm to analyze the mutation pattern of specific protein regions such as PPI interfaces. We identified 128 PPI interfaces enriched in somatic cancer mutations. Our results support the notion that many mutations in well-established cancer driver genes, particularly those in critical network positions, act by altering PPI interfaces. Finally, focusing on individual interfaces we are also able to show how tumors driven by the same gene can have different behaviors, including patient outcomes, depending on whether specific interfaces are mutated or not.

Download Full-text

DriveWays: a method for identifying possibly overlapping driver pathways in cancer

Scientific Reports ◽

10.1038/s41598-020-78852-8 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Ilyes Baali ◽

Cesim Erten ◽

Hilal Kazan

Keyword(s):

Optimization Problem ◽

State Of The Art ◽

Network Connectivity ◽

Driver Genes ◽

Cancer Driver ◽

Cancer Data ◽

Cancer Drivers ◽

Definition Of ◽

Almost All ◽

Pan Cancer

AbstractThe majority of the previous methods for identifying cancer driver modules output nonoverlapping modules. This assumption is biologically inaccurate as genes can participate in multiple molecular pathways. This is particularly true for cancer-associated genes as many of them are network hubs connecting functionally distinct set of genes. It is important to provide combinatorial optimization problem definitions modeling this biological phenomenon and to suggest efficient algorithms for its solution. We provide a formal definition of the Overlapping Driver Module Identification in Cancer (ODMIC) problem. We show that the problem is NP-hard. We propose a seed-and-extend based heuristic named DriveWays that identifies overlapping cancer driver modules from the graph built from the IntAct PPI network. DriveWays incorporates mutual exclusivity, coverage, and the network connectivity information of the genes. We show that DriveWays outperforms the state-of-the-art methods in recovering well-known cancer driver genes performed on TCGA pan-cancer data. Additionally, DriveWay’s output modules show a stronger enrichment for the reference pathways in almost all cases. Overall, we show that enabling modules to overlap improves the recovery of functional pathways filtered with known cancer drivers, which essentially constitute the reference set of cancer-related pathways.

Download Full-text