Computational Inference of DNA Folding Principles: From Data Management to Machine Learning

Special Topics in Information Technology - SpringerBriefs in Applied Sciences and Technology ◽

10.1007/978-3-030-85918-3_7 ◽

2022 ◽

pp. 79-88

Author(s):

Luca Nanni

Keyword(s):

Complex Analysis ◽

Hierarchical Structures ◽

Representation Learning ◽

Research Problem ◽

Graph Representation ◽

Chromatin Interaction ◽

Biological Research ◽

Chromatin Conformation ◽

Computational Framework ◽

Computational Resources

AbstractDNA is the molecular basis of life and would total about three meters if linearly untangled. To fit in the cell nucleus at the micrometer scale, DNA has, therefore, to fold itself into several layers of hierarchical structures, which are thought to be associated with functional compartmentalization of genomic features like genes and their regulatory elements. For this reason, understanding the mechanisms of genome folding is a major biological research problem. Studying chromatin conformation requires high computational resources and complex data analyses pipelines. In this chapter, we first present the PyGMQL software for interactive and scalable data exploration for genomic data. PyGMQL allows the user to inspect genomic datasets and design complex analysis pipelines. The software presents itself as a easy-to-use Python library and interacts seamlessly with other data analysis packages. We then use the software for the study of chromatin conformation data. We focus on the epigenetic determinants of Topologically Associating Domains (TADs), which are region of high self chromatin interaction. The results of this study highlight the existence of a “grammar of genome folding” which dictates the formation of TADs and boundaries, which is based on the CTCF insulator protein. Finally we focus on the relationship between chromatin conformation and gene expression, designing a graph representation learning model for the prediction of gene co-expression from gene topological features obtained from chromatin conformation data. We demonstrate a correlation between chromatin topology and co-expression, shedding a new light on this debated topic and providing a novel computational framework for the study of co-expression networks.

Download Full-text

Tree Structure-Aware Graph Representation Learning via Integrated Hierarchical Aggregation and Relational Metric Learning

2020 IEEE International Conference on Data Mining (ICDM) ◽

10.1109/icdm50108.2020.00052 ◽

2020 ◽

Author(s):

Ziyue Qiao ◽

Pengyang Wang ◽

Yanjie Fu ◽

Yi Du ◽

Pengfei Wang ◽

...

Keyword(s):

Metric Learning ◽

Representation Learning ◽

Tree Structure ◽

Graph Representation ◽

Hierarchical Aggregation

Download Full-text

Hyperspectral Image Classification with Localized Graph Convolutional Filtering

Remote Sensing ◽

10.3390/rs13030526 ◽

2021 ◽

Vol 13 (3) ◽

pp. 526

Author(s):

Shengliang Pu ◽

Yuanfeng Wu ◽

Xu Sun ◽

Xiaotong Sun

Keyword(s):

Hyperspectral Image ◽

Principal Component ◽

Representation Learning ◽

Classification Performance ◽

Hyperspectral Data ◽

Spectral Graph Theory ◽

Feature Reduction ◽

Graph Representation ◽

Novel Method ◽

Local Graph

The nascent graph representation learning has shown superiority for resolving graph data. Compared to conventional convolutional neural networks, graph-based deep learning has the advantages of illustrating class boundaries and modeling feature relationships. Faced with hyperspectral image (HSI) classification, the priority problem might be how to convert hyperspectral data into irregular domains from regular grids. In this regard, we present a novel method that performs the localized graph convolutional filtering on HSIs based on spectral graph theory. First, we conducted principal component analysis (PCA) preprocessing to create localized hyperspectral data cubes with unsupervised feature reduction. These feature cubes combined with localized adjacent matrices were fed into the popular graph convolution network in a standard supervised learning paradigm. Finally, we succeeded in analyzing diversified land covers by considering local graph structure with graph convolutional filtering. Experiments on real hyperspectral datasets demonstrated that the presented method offers promising classification performance compared with other popular competitors.

Download Full-text

Gate-Level Graph Representation Learning: A Step Towards the Improved Stuck-at Faults Analysis

2021 22nd International Symposium on Quality Electronic Design (ISQED) ◽

10.1109/isqed51717.2021.9424256 ◽

2021 ◽

Author(s):

Aneesh Balakrishnan ◽

Dan Alexandrescu ◽

Maksim Jenihhin ◽

Thomas Lange ◽

Maximilien Glorieux

Keyword(s):

Representation Learning ◽

Graph Representation

Download Full-text

A Novel Method to Predict Drug-Target Interactions Based on Large-Scale Graph Representation Learning

Cancers ◽

10.3390/cancers13092111 ◽

2021 ◽

Vol 13 (9) ◽

pp. 2111

Author(s):

Bo-Wei Zhao ◽

Zhu-Hong You ◽

Lun Hu ◽

Zhen-Hao Guo ◽

Lei Wang ◽

...

Keyword(s):

Drug Target ◽

Large Scale ◽

Computational Models ◽

Structural Information ◽

Characteristic Curve ◽

Representation Learning ◽

Graph Representation ◽

Convolutional Network ◽

Novel Method

Identification of drug-target interactions (DTIs) is a significant step in the drug discovery or repositioning process. Compared with the time-consuming and labor-intensive in vivo experimental methods, the computational models can provide high-quality DTI candidates in an instant. In this study, we propose a novel method called LGDTI to predict DTIs based on large-scale graph representation learning. LGDTI can capture the local and global structural information of the graph. Specifically, the first-order neighbor information of nodes can be aggregated by the graph convolutional network (GCN); on the other hand, the high-order neighbor information of nodes can be learned by the graph embedding method called DeepWalk. Finally, the two kinds of feature are fed into the random forest classifier to train and predict potential DTIs. The results show that our method obtained area under the receiver operating characteristic curve (AUROC) of 0.9455 and area under the precision-recall curve (AUPR) of 0.9491 under 5-fold cross-validation. Moreover, we compare the presented method with some existing state-of-the-art methods. These results imply that LGDTI can efficiently and robustly capture undiscovered DTIs. Moreover, the proposed model is expected to bring new inspiration and provide novel perspectives to relevant researchers.

Download Full-text

Understanding Negative Sampling in Graph Representation Learning

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ◽

10.1145/3394486.3403218 ◽

2020 ◽

Cited By ~ 2

Author(s):

Zhen Yang ◽

Ming Ding ◽

Chang Zhou ◽

Hongxia Yang ◽

Jingren Zhou ◽

...

Keyword(s):

Representation Learning ◽

Graph Representation

Download Full-text

Graph Representation Learning for Single Cell Biology

Current Opinion in Systems Biology ◽

10.1016/j.coisb.2021.05.008 ◽

2021 ◽

Author(s):

Leon Hetzel ◽

David S. Fischer ◽

Stephan Günnemann ◽

Fabian J. Theis

Keyword(s):

Single Cell ◽

Cell Biology ◽

Representation Learning ◽

Graph Representation

Download Full-text

Community Detection Based on Graph Representation Learning in Evolutionary Networks

Applied Sciences ◽

10.3390/app11104497 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4497

Author(s):

Dongming Chen ◽

Mingshuo Nie ◽

Jie Wang ◽

Yun Kong ◽

Dongqi Wang ◽

...

Keyword(s):

Community Detection ◽

Network Structure ◽

Clustering Algorithm ◽

Laplacian Matrix ◽

Representation Learning ◽

Detection Algorithm ◽

Graph Representation ◽

Time Slice ◽

Current Time ◽

Evolutionary Networks

Aiming at analyzing the temporal structures in evolutionary networks, we propose a community detection algorithm based on graph representation learning. The proposed algorithm employs a Laplacian matrix to obtain the node relationship information of the directly connected edges of the network structure at the previous time slice, the deep sparse autoencoder learns to represent the network structure under the current time slice, and the K-means clustering algorithm is used to partition the low-dimensional feature matrix of the network structure under the current time slice into communities. Experiments on three real datasets show that the proposed algorithm outperformed the baselines regarding effectiveness and feasibility.

Download Full-text