scholarly journals Graphlet Laplacians: graphlet-based neighbourhoods highlight topology-function and topology-disease relationships

2018 ◽  
Author(s):  
Sam F. L. Windels ◽  
Noël Malod-Dognin ◽  
Nataša Pržulj

AbstractMotivationLaplacian matrices capture the global structure of networks and are widely used to study biological networks. However, the local structure of the network around a node can also capture biological information. Local wiring patterns are typically quantified by counting how often a node touches different graphlets (small, connected, induced sub-graphs). Currently available graphlet-based methods do not consider whether nodes are in the same network neighbourhood.ContributionTo combine graphlet-based topological information and membership of nodes to the same network neighbourhood, we generalize the Laplacian to the Graphlet Laplacian, by considering a pair of nodes to be ‘adjacent’ if they simultaneously touch a given graphlet.ResultsWe utilize Graphlet Laplacians to generalize spectral embedding, spectral clustering and network diffusion. Applying our generalization of spectral clustering to model networks and biological networks shows that Graphlet Laplacians capture different local topology corresponding to the underlying graphlet. In biological networks, clusters obtained by using different Graphlet Laplacians capture complementary sets of biological functions. By diffusing pan-cancer gene mutation scores based on different Graphlet Laplacians, we find complementary sets of cancer driver genes. Hence, we demonstrate that Graphlet Laplacians capture topology-function and topology-disease relationships in biological networks

2019 ◽  
Vol 35 (24) ◽  
pp. 5226-5234 ◽  
Author(s):  
Sam F L Windels ◽  
Noël Malod-Dognin ◽  
Nataša Pržulj

Abstract Motivation Laplacian matrices capture the global structure of networks and are widely used to study biological networks. However, the local structure of the network around a node can also capture biological information. Local wiring patterns are typically quantified by counting how often a node touches different graphlets (small, connected, induced sub-graphs). Currently available graphlet-based methods do not consider whether nodes are in the same network neighbourhood. To combine graphlet-based topological information and membership of nodes to the same network neighbourhood, we generalize the Laplacian to the Graphlet Laplacian, by considering a pair of nodes to be ‘adjacent’ if they simultaneously touch a given graphlet. Results We utilize Graphlet Laplacians to generalize spectral embedding, spectral clustering and network diffusion. Applying Graphlet Laplacian-based spectral embedding, we visually demonstrate that Graphlet Laplacians capture biological functions. This result is quantified by applying Graphlet Laplacian-based spectral clustering, which uncovers clusters enriched in biological functions dependent on the underlying graphlet. We explain the complementarity of biological functions captured by different Graphlet Laplacians by showing that they capture different local topologies. Finally, diffusing pan-cancer gene mutation scores based on different Graphlet Laplacians, we find complementary sets of cancer-related genes. Hence, we demonstrate that Graphlet Laplacians capture topology-function and topology-disease relationships in biological networks. Availability and implementation http://www0.cs.ucl.ac.uk/staff/natasa/graphlet-laplacian/index.html Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 6 (20) ◽  
pp. eaba2489 ◽  
Author(s):  
Pankaj Kumar ◽  
Shashi Kiran ◽  
Shekhar Saha ◽  
Zhangli Su ◽  
Teressa Paulsen ◽  
...  

Extrachromosomal circular DNAs (eccDNAs) are somatically mosaic and contribute to intercellular heterogeneity in normal and tumor cells. Because short eccDNAs are poorly chromatinized, we hypothesized that they are sequenced by tagmentation in ATAC-seq experiments without any enrichment of circular DNA. Indeed, ATAC-seq identified thousands of eccDNAs in cell lines that were validated by inverse PCR and by metaphase FISH. ATAC-seq in gliomas and glioblastomas identify hundreds of eccDNAs, including one containing the well-known EGFR gene amplicon from chr7. More than 18,000 eccDNAs, many carrying known cancer driver genes, are identified in a pan-cancer analysis of ATAC-seq libraries from 23 tumor types. Somatically mosaic eccDNAs are identified by ATAC-seq even before amplification is recognized by genome-wide copy number variation measurements. Thus, ATAC-seq is a sensitive method to detect eccDNA present in a tumor at the pre-amplification stage and can be used to predict resistance to therapy.


2021 ◽  
Author(s):  
Ting-You Wang ◽  
Qi Liu ◽  
Yanan Ren ◽  
Sk. Kayum Alam ◽  
Li Wang ◽  
...  

2018 ◽  
Author(s):  
Dana Silverbush ◽  
Simona Cristea ◽  
Gali Yanovich ◽  
Tamar Geiger ◽  
Niko Beerenwinkel ◽  
...  

AbstractThe identification of molecular pathways driving cancer progression is a fundamental unsolved problem in tumorigenesis, which can substantially further our understanding of cancer mechanisms and inform the development of targeted therapies. Most current approaches to address this problem use primarily somatic mutations, not fully exploiting additional layers of biological information. Here, we describe ModulOmics, a method to de novo identify cancer driver pathways, or modules, by integrating multiple data types (protein-protein interactions, mutual exclusivity of mutations or copy number alterations, transcriptional co-regulation, and RNA co-expression) into a single probabilistic model. To efficiently search the exponential space of candidate modules, ModulOmics employs a two-step optimization procedure that combines integer linear programming with stochastic search. Across several cancer types, ModulOmics identifies highly functionally connected modules enriched with cancer driver genes, outperforming state-of-the-art methods. For breast cancer subtypes, the inferred modules recapitulate known molecular mechanisms and suggest novel subtype-specific functionalities. These findings are supported by an independent patient cohort, as well as independent proteomic and phosphoproteomic datasets.


BMC Genomics ◽  
2018 ◽  
Vol 19 (1) ◽  
Author(s):  
Claudia Cava ◽  
Gloria Bertoli ◽  
Antonio Colaprico ◽  
Catharina Olsen ◽  
Gianluca Bontempi ◽  
...  

2018 ◽  
Author(s):  
Olivier Collier ◽  
Véronique Stoven ◽  
Jean-Philippe Vert

AbstractCancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types.In this paper we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including informations about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types.We empirically show that LOTUS outperforms three other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types.Author summaryCancer development is driven by mutations and dysfunction of important, so-called cancer driver genes, that could be targeted by targeted therapies. While a number of such cancer genes have already been identified, it is believed that many more remain to be discovered. To help prioritize experimental investigations of candidate genes, several computational methods have been proposed to rank promising candidates based on their mutations in large cohorts of cancer cases, or on their interactions with known driver genes in biological networks. We propose LOTUS, a new computational approach to identify genes with high oncogenic potential. LOTUS implements a machine learning approach to learn an oncogenic potential score from known driver genes, and brings two novelties compared to existing methods. First, it allows to easily combine heterogeneous informations into the scoring function, which we illustrate by learning a scoring function from both known mutations in large cancer cohorts and interactions in biological networks. Second, using a multitask learning strategy, it can predict different driver genes for different cancer types, while sharing information between them to improve the prediction for every type. We provide experimental results showing that LOTUS significantly outperforms several state-of-the-art cancer gene prediction softwares.


2019 ◽  
Author(s):  
Rafsan Ahmed ◽  
Ilyes Baali ◽  
Cesim Erten ◽  
Evis Hoxha ◽  
Hilal Kazan

AbstractMotivationGenomic analyses from large cancer cohorts have revealed the mutational heterogeneity problem which hinders the identification of driver genes based only on mutation profiles. One way to tackle this problem is to incorporate the fact that genes act together in functional modules. The connectivity knowledge present in existing protein-protein interaction networks together with mutation frequencies of genes and the mutual exclusivity of cancer mutations can be utilized to increase the accuracy of identifying cancer driver modules.ResultsWe present a novel edge-weighted random walk-based approach that incorporates connectivity information in the form of protein-protein interactions, mutual exclusion, and coverage to identify cancer driver modules. MEXCOWalk outperforms several state-of-the-art computational methods on TCGA pan-cancer data in terms of recovering known cancer genes, providing modules that are capable of classifying normal and tumor samples, and that are enriched for mutations in specific cancer types. Furthermore, the risk scores determined with output modules can stratify patients into low-risk and high-risk groups in multiple cancer types. MEXCOwalk identifies modules containing both well-known cancer genes and putative cancer genes that are rarely mutated in the pan-cancer data. The data, the source code, and useful scripts are available at:https://github.com/abu-compbio/[email protected]


2020 ◽  
Author(s):  
Ilyes Baali ◽  
Cesim Erten ◽  
Hilal Kazan

AbstractMotivationThe majority of the previous methods for identifying cancer driver modules output non-overlapping modules. This assumption is biologically inaccurate as genes can participate in multiple molecular pathways. This is particularly true for cancer-associated genes as many of them are network hubs connecting functionally distinct set of genes. It is important to provide combinatorial optimization problem definitions modeling this biological phenomenon and to suggest efficient algorithms for its solution.ResultsWe provide a formal definition of the Overlapping Driver Module Identification in Cancer (ODMIC) problem. We show that the problem is NP-hard. We propose a seed-and-extend based heuristic named DriveWays that identifies overlapping cancer driver modules from the graph built from the IntAct PPI network. DriveWays incorporates mutual exclusivity, coverage, and the network connectivity information of the genes.We show that DriveWays outperforms the state-of-the-art methods in recovering well-known cancer driver genes performed on TCGA pan-cancer data. Additionally, DriveWays’s output modules show a stronger enrichment for the reference pathways in almost all cases. Overall, we show that enabling modules to overlap improves the recovery of functional pathways filtered with known cancer drivers, which essentially constitute the reference set of cancer-related pathways.AvailabilityThe data, the source code, and useful scripts are available at: https://github.com/abu-compbio/DriveWaysSupplementary informationSupplementary data are available at Biorxiv.


2015 ◽  
Author(s):  
Eduard Porta-Pardo ◽  
Thomas Hrabe ◽  
Adam Godzik

Despite their critical importance in maintaining the integrity of all cellular pathways, the specific role of mutations on protein-protein interaction (PPI) interfaces as cancer drivers, though known for some specific examples, has not been systematically studied. We analyzed missense somatic mutations in a pan-cancer cohort of 5,989 tumors from 23 projects of The Cancer Genome Atlas (TCGA) for enrichment on PPI interfaces using e-Driver, an algorithm to analyze the mutation pattern of specific protein regions such as PPI interfaces. We identified 128 PPI interfaces enriched in somatic cancer mutations. Our results support the notion that many mutations in well-established cancer driver genes, particularly those in critical network positions, act by altering PPI interfaces. Finally, focusing on individual interfaces we are also able to show how tumors driven by the same gene can have different behaviors, including patient outcomes, depending on whether specific interfaces are mutated or not.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Ilyes Baali ◽  
Cesim Erten ◽  
Hilal Kazan

AbstractThe majority of the previous methods for identifying cancer driver modules output nonoverlapping modules. This assumption is biologically inaccurate as genes can participate in multiple molecular pathways. This is particularly true for cancer-associated genes as many of them are network hubs connecting functionally distinct set of genes. It is important to provide combinatorial optimization problem definitions modeling this biological phenomenon and to suggest efficient algorithms for its solution. We provide a formal definition of the Overlapping Driver Module Identification in Cancer (ODMIC) problem. We show that the problem is NP-hard. We propose a seed-and-extend based heuristic named DriveWays that identifies overlapping cancer driver modules from the graph built from the IntAct PPI network. DriveWays incorporates mutual exclusivity, coverage, and the network connectivity information of the genes. We show that DriveWays outperforms the state-of-the-art methods in recovering well-known cancer driver genes performed on TCGA pan-cancer data. Additionally, DriveWay’s output modules show a stronger enrichment for the reference pathways in almost all cases. Overall, we show that enabling modules to overlap improves the recovery of functional pathways filtered with known cancer drivers, which essentially constitute the reference set of cancer-related pathways.


Sign in / Sign up

Export Citation Format

Share Document