Learning Continuous Time Bayesian Networks in Non-stationary Domains

Journal of Artificial Intelligence Research ◽

10.1613/jair.5126 ◽

2016 ◽

Vol 57 ◽

pp. 1-37 ◽

Cited By ~ 2

Author(s):

Simone Villa ◽

Fabio Stella

Keyword(s):

Bayesian Networks ◽

Continuous Time ◽

Learning Algorithm ◽

State Of The Art ◽

Synthetic Data ◽

Score Function ◽

Dynamic Bayesian Networks ◽

Continuous Time Bayesian Networks ◽

Real World Datasets ◽

Transition Times

Non-stationary continuous time Bayesian networks are introduced. They allow the parents set of each node to change over continuous time. Three settings are developed for learning non-stationary continuous time Bayesian networks from data: known transition times, known number of epochs and unknown number of epochs. A score function for each setting is derived and the corresponding learning algorithm is developed. A set of numerical experiments on synthetic data is used to compare the effectiveness of non-stationary continuous time Bayesian networks to that of non-stationary dynamic Bayesian networks. Furthermore, the performance achieved by non-stationary continuous time Bayesian networks is compared to that achieved by state-of-the-art algorithms on four real-world datasets, namely drosophila, saccharomyces cerevisiae, songbird and macroeconomics.

Download Full-text

G-Tric: generating three-way synthetic datasets with triclustering solutions

BMC Bioinformatics ◽

10.1186/s12859-020-03925-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

João Lobo ◽

Rui Henriques ◽

Sara C. Madeira

Keyword(s):

State Of The Art ◽

Synthetic Data ◽

Ground Truth ◽

Real Data ◽

Three Dimensions ◽

Additional Advantage ◽

Urban Dynamics ◽

Data Generator ◽

Real World Datasets ◽

Synthetic Datasets

Abstract Background Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations $$\times$$ × features $$\times$$ × contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. Results G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. Conclusions Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric’s potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.

Download Full-text

TransET: Knowledge Graph Embedding with Entity Types

Electronics ◽

10.3390/electronics10121407 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1407

Author(s):

Peng Wang ◽

Jing Zhou ◽

Yuzhang Liu ◽

Xingchen Zhou

Keyword(s):

Link Prediction ◽

State Of The Art ◽

Score Function ◽

Graph Embedding ◽

Vector Spaces ◽

Knowledge Graph ◽

Semantic Features ◽

Knowledge Graphs ◽

Real World Datasets ◽

Low Dimensional

Knowledge graph embedding aims to embed entities and relations into low-dimensional vector spaces. Most existing methods only focus on triple facts in knowledge graphs. In addition, models based on translation or distance measurement cannot fully represent complex relations. As well-constructed prior knowledge, entity types can be employed to learn the representations of entities and relations. In this paper, we propose a novel knowledge graph embedding model named TransET, which takes advantage of entity types to learn more semantic features. More specifically, circle convolution based on the embeddings of entity and entity types is utilized to map head entity and tail entity to type-specific representations, then translation-based score function is used to learn the presentation triples. We evaluated our model on real-world datasets with two benchmark tasks of link prediction and triple classification. Experimental results demonstrate that it outperforms state-of-the-art models in most cases.

Download Full-text

Gromov-Wasserstein optimal transport to align single-cell multi-omics data

10.1101/2020.04.28.066787 ◽

2020 ◽

Cited By ~ 2

Author(s):

Pinar Demetci ◽

Rebecca Santorella ◽

Björn Sandstede ◽

William Stafford Noble ◽

Ritambhara Singh

Keyword(s):

Single Cell ◽

Optimal Transport ◽

Learning Algorithm ◽

State Of The Art ◽

Single Cells ◽

Wasserstein Distance ◽

Cell Alignment ◽

Shared Space ◽

Real World Datasets ◽

Unsupervised Algorithms

AbstractData integration of single-cell measurements is critical for understanding cell development and disease, but the lack of correspondence between different types of measurements makes such efforts challenging. Several unsupervised algorithms can align heterogeneous single-cell measurements in a shared space, enabling the creation of mappings between single cells in different data domains. However, these algorithms require hyperparameter tuning for high-quality alignments, which is difficult in an unsupervised setting without correspondence information for validation. We present Single-Cell alignment using Optimal Transport (SCOT), an unsupervised learning algorithm that uses Gromov Wasserstein-based optimal transport to align single-cell multi-omics datasets. We compare the alignment performance of SCOT with state-of-the-art algorithms on four simulated and two real-world datasets. SCOT performs on par with state-of-the-art methods but is faster and requires tuning fewer hyperparameters. Furthermore, we provide an algorithm for SCOT to use Gromov Wasserstein distance to guide the parameter selection. Thus, unlike previous methods, SCOT aligns well without using any orthogonal correspondence information to pick the hyperparameters. Our source code and scripts for replicating the results are available at https://github.com/rsinghlab/SCOT.

Download Full-text

Label Distribution for Learning with Noisy Labels

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/356 ◽

2020 ◽

Author(s):

Yun-Peng Liu ◽

Ning Xu ◽

Yu Zhang ◽

Xin Geng

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Learning Algorithm ◽

State Of The Art ◽

Confidence Estimation ◽

Novel Method ◽

Real World Datasets ◽

Label Distribution ◽

Noisy Labels

The performances of deep neural networks (DNNs) crucially rely on the quality of labeling. In some situations, labels are easily corrupted, and therefore some labels become noisy labels. Thus, designing algorithms that deal with noisy labels is of great importance for learning robust DNNs. However, it is difficult to distinguish between clean labels and noisy labels, which becomes the bottleneck of many methods. To address the problem, this paper proposes a novel method named Label Distribution based Confidence Estimation (LDCE). LDCE estimates the confidence of the observed labels based on label distribution. Then, the boundary between clean labels and noisy labels becomes clear according to confidence scores. To verify the effectiveness of the method, LDCE is combined with the existing learning algorithm to train robust DNNs. Experiments on both synthetic and real-world datasets substantiate the superiority of the proposed algorithm against state-of-the-art methods.

Download Full-text

Mean Field Analysis for Continuous Time Bayesian Networks

Communications in Computer and Information Science - New Frontiers in Quantitative Methods in Informatics ◽

10.1007/978-3-319-91632-3_12 ◽

2018 ◽

pp. 156-169

Author(s):

Davide Cerotti ◽

Daniele Codetta-Raiteri

Keyword(s):

Bayesian Networks ◽

Continuous Time ◽

Mean Field ◽

Field Analysis ◽

Continuous Time Bayesian Networks

Download Full-text

Modeling Approaches Reveal New Regulatory Networks in Aspergillus fumigatus Metabolism

Journal of Fungi ◽

10.3390/jof6030108 ◽

2020 ◽

Vol 6 (3) ◽

pp. 108

Author(s):

Enzo Acerbi ◽

Marcela Hortova-Kohoutkova ◽

Tsokyi Choera ◽

Nancy Keller ◽

Jan Fric ◽

...

Keyword(s):

Bayesian Networks ◽

Continuous Time ◽

Regulatory Networks ◽

Time Course ◽

Metabolic Networks ◽

De Novo ◽

Tryptophan Metabolism ◽

Reconstruction Method ◽

Fungal Virulence ◽

Continuous Time Bayesian Networks

Systems biology approaches are extensively used to model and reverse-engineer gene regulatory networks from experimental data. Indoleamine 2,3-dioxygenases (IDOs)—belonging in the heme dioxygenase family—degrade l-tryptophan to kynurenines. These enzymes are also responsible for the de novo synthesis of nicotinamide adenine dinucleotide (NAD+). As such, they are expressed by a variety of species, including fungi. Interestingly, Aspergillus may degrade l-tryptophan not only via IDO but also via alternative pathways. Deciphering the molecular interactions regulating tryptophan metabolism is particularly critical for novel drug target discovery designed to control pathogen determinants in invasive infections. Using continuous time Bayesian networks over a time-course gene expression dataset, we inferred the global regulatory network controlling l-tryptophan metabolism. The method unravels a possible novel approach to target fungal virulence factors during infection. Furthermore, this study represents the first application of continuous-time Bayesian networks as a gene network reconstruction method in Aspergillus metabolism. The experiment showed that the applied computational approach may improve the understanding of metabolic networks over traditional pathways.

Download Full-text

Continuous Time Bayesian Networks for Gene Network Reconstruction: A Comparative Study on Time Course Data

Bioinformatics Research and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-319-08171-7_16 ◽

2014 ◽

pp. 176-187 ◽

Cited By ~ 4

Author(s):

Enzo Acerbi ◽

Fabio Stella

Keyword(s):

Comparative Study ◽

Bayesian Networks ◽

Continuous Time ◽

Gene Network ◽

Time Course ◽

Network Reconstruction ◽

Continuous Time Bayesian Networks ◽

Gene Network Reconstruction ◽

Time Course Data

Download Full-text

Dynamic fault tree analysis based on continuous-time Bayesian networks under fuzzy numbers

Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability ◽

10.1177/1748006x15588446 ◽

2015 ◽

Vol 229 (6) ◽

pp. 530-541 ◽

Cited By ~ 17

Author(s):

Yan-Feng Li ◽

Jinhua Mi ◽

Yu Liu ◽

Yuan-Jian Yang ◽

Hong-Zhong Huang

Keyword(s):

Bayesian Networks ◽

Continuous Time ◽

Fuzzy Numbers ◽

Fault Tree ◽

Fault Tree Analysis ◽

Tree Analysis ◽

Dynamic Fault Tree ◽

Continuous Time Bayesian Networks

Download Full-text

Intrusion Detection using Continuous Time Bayesian Networks

Journal of Artificial Intelligence Research ◽

10.1613/jair.3050 ◽

2010 ◽

Vol 39 ◽

pp. 745-774 ◽

Cited By ~ 29

Author(s):

J. Xu ◽

C. R. Shelton

Keyword(s):

Intrusion Detection ◽

Bayesian Networks ◽

Continuous Time ◽

Particle Filtering ◽

Generative Models ◽

Training Data ◽

Continuous Time Model ◽

Time Model ◽

General Technique ◽

Continuous Time Bayesian Networks

Intrusion detection systems (IDSs) fall into two high-level categories: network-based systems (NIDS) that monitor network behaviors, and host-based systems (HIDS) that monitor system calls. In this work, we present a general technique for both systems. We use anomaly detection, which identifies patterns not conforming to a historic norm. In both types of systems, the rates of change vary dramatically over time (due to burstiness) and over components (due to service difference). To efficiently model such systems, we use continuous time Bayesian networks (CTBNs) and avoid specifying a fixed update interval common to discrete-time models. We build generative models from the normal training data, and abnormal behaviors are flagged based on their likelihood under this norm. For NIDS, we construct a hierarchical CTBN model for the network packet traces and use Rao-Blackwellized particle filtering to learn the parameters. We illustrate the power of our method through experiments on detecting real worms and identifying hosts on two publicly available network traces, the MAWI dataset and the LBNL dataset. For HIDS, we develop a novel learning method to deal with the finite resolution of system log file time stamps, without losing the benefits of our continuous time model. We demonstrate the method by detecting intrusions in the DARPA 1998 BSM dataset.

Download Full-text

A GSPN based tool to inference Generalized Continuous Time Bayesian Networks

Proceedings of the 7th International Conference on Performance Evaluation Methodologies and Tools ◽

10.4108/icst.valuetools.2013.254400 ◽

2014 ◽

Author(s):

Daniele Codetta Raiteri ◽

Luigi Portinale

Keyword(s):

Bayesian Networks ◽

Continuous Time ◽

Continuous Time Bayesian Networks

Download Full-text