Building and analysis of protein-protein interactions related to diabetes mellitus using support vector machine, biomedical text mining and network analysis

Protein-protein interactions (PPIs) play a crucial role in cellular processes. In the present work, a new approach is proposed to construct a PPI predictor training a support vector machine model through a mutual information filter-wrapper parallel feature selection algorithm and an iterative and hierarchical clustering to select a relevance negative training set. By means of a selected suboptimum set of features, the constructed support vector machine model is able to classify PPIs with high accuracy in any positive and negative datasets.

Download Full-text

Effect of training datasets on support vector machine prediction of protein-protein interactions

PROTEOMICS ◽

10.1002/pmic.200401118 ◽

2005 ◽

Vol 5 (4) ◽

pp. 876-884 ◽

Cited By ~ 51

Author(s):

Siaw Ling Lo ◽

Cong Zhong Cai ◽

Yu Zong Chen ◽

Maxey C. M. Chung

Keyword(s):

Support Vector Machine ◽

Protein Interactions ◽

Support Vector ◽

Protein Protein Interactions

Download Full-text

Inferring Protein-Protein Interactions Using a Hybrid Genetic Algorithm/Support Vector Machine Method

Protein and Peptide Letters ◽

10.2174/092986610791760379 ◽

2010 ◽

Vol 17 (9) ◽

pp. 1079-1084 ◽

Cited By ~ 5

Author(s):

Bing Wang ◽

Peng Chen ◽

Jun Zhang ◽

Guangxin Zhao ◽

Xiang Zhang

Keyword(s):

Genetic Algorithm ◽

Support Vector Machine ◽

Protein Interactions ◽

Hybrid Genetic Algorithm ◽

Support Vector ◽

Protein Protein Interactions ◽

Machine Method ◽

Support Vector Machine Method

Download Full-text

Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences

Nucleic Acids Research ◽

10.1093/nar/gkn159 ◽

2008 ◽

Vol 36 (9) ◽

pp. 3025-3030 ◽

Cited By ~ 327

Author(s):

Yanzhi Guo ◽

Lezheng Yu ◽

Zhining Wen ◽

Menglong Li

Keyword(s):

Support Vector Machine ◽

Protein Interactions ◽

Protein Sequences ◽

Support Vector ◽

Protein Protein Interactions ◽

Auto Covariance

Download Full-text

2P2I HUNTER : a tool for filtering orthosteric protein–protein interaction modulators via a dedicated support vector machine

Journal of The Royal Society Interface ◽

10.1098/rsif.2013.0860 ◽

2014 ◽

Vol 11 (90) ◽

pp. 20130860 ◽

Cited By ~ 25

Author(s):

Véronique Hamon ◽

Raphael Bourgeas ◽

Pierre Ducrot ◽

Isabelle Theret ◽

Laura Xuereb ◽

...

Keyword(s):

Support Vector Machine ◽

Protein Interactions ◽

High Throughput Screening ◽

Chemical Space ◽

Support Vector ◽

Protein Protein Interactions ◽

Target Class ◽

Chemical Library ◽

Protein Protein Interaction ◽

Svm Model

Over the last 10 years, protein–protein interactions (PPIs) have shown increasing potential as new therapeutic targets. As a consequence, PPIs are today the most screened target class in high-throughput screening (HTS). The development of broad chemical libraries dedicated to these particular targets is essential; however, the chemical space associated with this ‘high-hanging fruit’ is still under debate. Here, we analyse the properties of 40 non-redundant small molecules present in the 2P2I database ( http://2p2idb.cnrs-mrs.fr/ ) to define a general profile of orthosteric inhibitors and propose an original protocol to filter general screening libraries using a support vector machine (SVM) with 11 standard D ragon molecular descriptors. The filtering protocol has been validated using external datasets from PubChem BioAssay and results from in-house screening campaigns . This external blind validation demonstrated the ability of the SVM model to reduce the size of the filtered chemical library by eliminating up to 96% of the compounds as well as enhancing the proportion of active compounds by up to a factor of 8. We believe that the resulting chemical space identified in this paper will provide the scientific community with a concrete support to search for PPI inhibitors during HTS campaigns.

Download Full-text

Combining protein-protein interactions information with support vector machine to identify chronic obstructive pulmonary disease related genes

Molecular Biology ◽

10.1134/s0026893314020101 ◽

2014 ◽

Vol 48 (2) ◽

pp. 287-296 ◽

Cited By ~ 1

Author(s):

Lin Hua ◽

Ping Zhou

Keyword(s):

Chronic Obstructive Pulmonary Disease ◽

Support Vector Machine ◽

Pulmonary Disease ◽

Protein Interactions ◽

Support Vector ◽

Chronic Obstructive ◽

Protein Protein Interactions ◽

Obstructive Pulmonary Disease ◽

Disease Related Genes

Download Full-text

Text mining for modeling of protein complexes enhanced by machine learning

Bioinformatics ◽

10.1093/bioinformatics/btaa823 ◽

2020 ◽

Author(s):

Varsha D Badal ◽

Petras J Kundrotas ◽

Ilya A Vakser

Keyword(s):

Machine Learning ◽

Text Mining ◽

Protein Interactions ◽

Full Text ◽

Protein Complexes ◽

Protein Docking ◽

Supplementary Information ◽

Support Vector ◽

Learning Approaches ◽

Protein Protein Interactions

Abstract Motivation Procedures for structural modeling of protein-protein complexes (protein docking) produce a number of models which need to be further analyzed and scored. Scoring can be based on independently determined constraints on the structure of the complex, such as knowledge of amino acids essential for the protein interaction. Previously, we showed that text mining of residues in freely available PubMed abstracts of papers on studies of protein-protein interactions may generate such constraints. However, absence of post-processing of the spotted residues reduced usability of the constraints, as a significant number of the residues were not relevant for the binding of the specific proteins. Results We explored filtering of the irrelevant residues by two machine learning approaches, Deep Recursive Neural Network (DRNN) and Support Vector Machine (SVM) models with different training/testing schemes. The results showed that the DRNN model is superior to the SVM model when training is performed on the PMC-OA full-text articles and applied to classification (interface or non-interface) of the residues spotted in the PubMed abstracts. When both training and testing is performed on full-text articles or on abstracts, the performance of these models is similar. Thus, in such cases, there is no need to utilize computationally demanding DRNN approach, which is computationally expensive especially at the training stage. The reason is that SVM success is often determined by the similarity in data/text patterns in the training and the testing sets, whereas the sentence structures in the abstracts are, in general, different from those in the full text articles. Availability The code and the datasets generated in this study are available at https://gitlab.ku.edu/vakser-lab-public/text-mining/-/tree/2020-09-04. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text