scholarly journals A Comparative Study of Marginalized Graph Kernel and Message Passing Neural Network

Author(s):  
Yan Xiang ◽  
Yu-Hang Tang ◽  
Guang Lin ◽  
Huai Sun

<p>This work presents a state-of-the-art hybrid kernel for molecular property predictions. The hybrid kernel consists of a marginalized graph kernel that operates on molecular graphs and radial basis function kernels that operate on global molecular features. Direct message passing neural network (D-MPNN) with global molecular features is used as strong baselines. After using Bayesian optimization to find the optimal hyperparameters, we benchmark the models on 11 publicly available data sets. Our results show that the prediction of the graph kernel is correlated to the prediction of D-MPNN, which indicates that the molecular representation learned from D-MPNN is very close to the reproducing kernel Hilbert space generated by the hybrid kernel. These results may provide clues for research on the interpretability of graph neural networks. In addition, ensembling the graph kernel models with D-MPNN is the best. The advantage of D-MPNN lies in computational efficiency, and the advantage of the graph kernel model lies in the inherent uncertainty qualification of Gaussian process regression. All codes for graph kernel machines used in this work can be found at <a href="https://github.com/Xiangyan93/Chem-Graph-Kernel-Machine">https://github.com/Xiangyan93/Chem-Graph-Kernel-Machine</a>.</p>

2021 ◽  
Author(s):  
Yan Xiang ◽  
Yu-Hang Tang ◽  
Guang Lin ◽  
Huai Sun

<p>This work presents a state-of-the-art hybrid kernel for molecular property predictions. The hybrid kernel consists of a marginalized graph kernel that operates on molecular graphs and radial basis function kernels that operate on global molecular features. Direct message passing neural network (D-MPNN) with global molecular features is used as strong baselines. After using Bayesian optimization to find the optimal hyperparameters, we benchmark the models on 11 publicly available data sets. Our results show that the prediction of the graph kernel is correlated to the prediction of D-MPNN, which indicates that the molecular representation learned from D-MPNN is very close to the reproducing kernel Hilbert space generated by the hybrid kernel. These results may provide clues for research on the interpretability of graph neural networks. In addition, ensembling the graph kernel models with D-MPNN is the best. The advantage of D-MPNN lies in computational efficiency, and the advantage of the graph kernel model lies in the inherent uncertainty qualification of Gaussian process regression. All codes for graph kernel machines used in this work can be found at <a href="https://github.com/Xiangyan93/Chem-Graph-Kernel-Machine">https://github.com/Xiangyan93/Chem-Graph-Kernel-Machine</a>.</p>


2016 ◽  
Vol 2 ◽  
pp. e50 ◽  
Author(s):  
Nicolas Durrande ◽  
James Hensman ◽  
Magnus Rattray ◽  
Neil D. Lawrence

We consider the problem of detecting and quantifying the periodic component of a function given noise-corrupted observations of a limited number of input/output tuples. Our approach is based on Gaussian process regression, which provides a flexible non-parametric framework for modelling periodic data. We introduce a novel decomposition of the covariance function as the sum of periodic and aperiodic kernels. This decomposition allows for the creation of sub-models which capture the periodic nature of the signal and its complement. To quantify the periodicity of the signal, we derive a periodicity ratio which reflects the uncertainty in the fitted sub-models. Although the method can be applied to many kernels, we give a special emphasis to the Matérn family, from the expression of the reproducing kernel Hilbert space inner product to the implementation of the associated periodic kernels in a Gaussian process toolkit. The proposed method is illustrated by considering the detection of periodically expressed genes in thearabidopsisgenome.


2021 ◽  
Author(s):  
Yan Xiang ◽  
Yu-Hang Tang ◽  
Hongyi Liu ◽  
Guang Lin ◽  
Huai Sun

<p>This work presents a Gaussian process regression (GPR) model on top of a novel graph representation of chemical molecules that predicts thermodynamic properties of pure substances in single, double, and triple phases. A transferable molecular graph representation is proposed as the input for a marginalized graph kernel, which is the major component of the covariance function in our GPR models. Radial basis function kernels of temperature and pressure are also incorporated into the covariance function when necessary. We predicted three types of representative properties of pure substances in single, double, and triple phases, i.e., critical temperature, vapor-liquid equilibrium (VLE) density, and pressure-temperature density. The data is collected from Knovel Data Analysis Beta: NIST ThermoDynamics Pure Compounds. The accuracy of the models is nearly identical to the precision of the experimental measurements. Moreover, the reliability of our predictions can be quantified on a per-sample basis using the posterior uncertainty of the GPR model. We compare our model against Morgan fingerprints and a graph neural network to further demonstrate the advantage of the proposed method. The marginalized graph kernel is computed using GraphDot package at <a href="https://github.com/yhtang/GraphDot">https://github.com/yhtang/GraphDot</a>. All codes used in this work can be found at <a href="https://github.com/Xiangyan93/Chem-Graph-Kernel-Machine">https://github.com/Xiangyan93/Chem-Graph-Kernel-Machine</a>.</p>


2016 ◽  
Author(s):  
Nicolas Durrande ◽  
James Hensman ◽  
Magnus Rattray ◽  
Neil D Lawrence

We consider the problem of detecting and quantifying the periodic component of a function given noise-corrupted observations of a limited number of input/output tuples. Our approach is based on Gaussian process regression which provides a flexible non-parametric framework for modelling periodic data. We introduce a novel decomposition of the covariance function as the sum of periodic and aperiodic kernels. This decomposition allows for the creation of sub-models which capture the periodic nature of the signal and its complement. To quantify the periodicity of the signal, we derive a periodicity ratio which reflects the uncertainty in the fitted sub-models. Although the method can be applied to many kernels, we give a special emphasis to the Matérn family, from the expression of the reproducing kernel Hilbert space inner product to the implementation of the associated periodic kernels in a Gaussian process toolkit. The proposed method is illustrated by considering the detection of periodically expressed genes in the arabidopsis genome.


2016 ◽  
Author(s):  
Nicolas Durrande ◽  
James Hensman ◽  
Magnus Rattray ◽  
Neil D Lawrence

We consider the problem of detecting and quantifying the periodic component of a function given noise-corrupted observations of a limited number of input/output tuples. Our approach is based on Gaussian process regression which provides a flexible non-parametric framework for modelling periodic data. We introduce a novel decomposition of the covariance function as the sum of periodic and aperiodic kernels. This decomposition allows for the creation of sub-models which capture the periodic nature of the signal and its complement. To quantify the periodicity of the signal, we derive a periodicity ratio which reflects the uncertainty in the fitted sub-models. Although the method can be applied to many kernels, we give a special emphasis to the Matérn family, from the expression of the reproducing kernel Hilbert space inner product to the implementation of the associated periodic kernels in a Gaussian process toolkit. The proposed method is illustrated by considering the detection of periodically expressed genes in the arabidopsis genome.


2019 ◽  
Vol 9 (13) ◽  
pp. 2660 ◽  
Author(s):  
Shaojian Qiu ◽  
Hao Xu ◽  
Jiehan Deng ◽  
Siyu Jiang ◽  
Lu Lu

Cross-project defect prediction (CPDP) is a practical solution that allows software defect prediction (SDP) to be used earlier in the software lifecycle. With the CPDP technique, the software defect predictor trained by labeled data of mature projects can be applied for the prediction task of a new project. Most previous CPDP approaches ignored the semantic information in the source code, and existing semantic-feature-based SDP methods do not take into account the data distribution divergence between projects. These limitations may weaken defect prediction performance. To solve these problems, we propose a novel approach, the transfer convolutional neural network (TCNN), to mine the transferable semantic (deep-learning (DL)-generated) features for CPDP tasks. Specifically, our approach first parses the source file into integer vectors as the network inputs. Next, to obtain the TCNN model, a matching layer is added into convolutional neural network where the hidden representations of the source and target project-specific data are embedded into a reproducing kernel Hilbert space for distribution matching. By simultaneously minimizing classification error and distribution divergence between projects, the constructed TCNN could extract the transferable DL-generated features. Finally, without losing the information contained in handcrafted features, we combine them with transferable DL-generated features to form the joint features for CPDP performing. Experiments based on 10 benchmark projects (with 90 pairs of CPDP tasks) showed that the proposed TCNN method is superior to the reference methods.


Author(s):  
Luting Yang ◽  
Jianyi Yang ◽  
Shaolei Ren

Contextual bandit is a classic multi-armed bandit setting, where side information (i.e., context) is available before arm selection. A standard assumption is that exact contexts are perfectly known prior to arm selection and only single feedback is returned. In this work, we focus on multi-feedback bandit learning with probabilistic contexts, where a bundle of contexts are revealed to the agent along with their corresponding probabilities at the beginning of each round. This models such scenarios as where contexts are drawn from the probability output of a neural network and the reward function is jointly determined by multiple feedback signals. We propose a kernelized learning algorithm based on upper confidence bound to choose the optimal arm in reproducing kernel Hilbert space for each context bundle. Moreover, we theoretically establish an upper bound on the cumulative regret with respect to an oracle that knows the optimal arm given probabilistic contexts, and show that the bound grows sublinearly with time. Our simula- tion on machine learning model recommendation further validates the sub-linearity of our cumulative regret and demonstrates that our algorithm outper- forms the approach that selects arms based on the most probable context.


2021 ◽  
Author(s):  
Yan Xiang ◽  
Yu-Hang Tang ◽  
Hongyi Liu ◽  
Guang Lin ◽  
Huai Sun

<p>This work presents a Gaussian process regression (GPR) model on top of a novel graph representation of chemical molecules that predicts thermodynamic properties of pure substances in single, double, and triple phases. A transferable molecular graph representation is proposed as the input for a marginalized graph kernel, which is the major component of the covariance function in our GPR models. Radial basis function kernels of temperature and pressure are also incorporated into the covariance function when necessary. We predicted three types of representative properties of pure substances in single, double, and triple phases, i.e., critical temperature, vapor-liquid equilibrium (VLE) density, and pressure-temperature density. The data is collected from Knovel Data Analysis Beta: NIST ThermoDynamics Pure Compounds. The accuracy of the models is nearly identical to the precision of the experimental measurements. Moreover, the reliability of our predictions can be quantified on a per-sample basis using the posterior uncertainty of the GPR model. We compare our model against Morgan fingerprints and a graph neural network to further demonstrate the advantage of the proposed method. The marginalized graph kernel is computed using GraphDot package at <a href="https://github.com/yhtang/GraphDot">https://github.com/yhtang/GraphDot</a>. All codes used in this work can be found at <a href="https://github.com/Xiangyan93/Chem-Graph-Kernel-Machine">https://github.com/Xiangyan93/Chem-Graph-Kernel-Machine</a>.</p>


Sign in / Sign up

Export Citation Format

Share Document