DeepEnzyPred: A Bi-Layered Deep Learning Framework for prediction of Bacteriophage Enzymes and their Sub-Hydrolases Enzymes via Novel Multi Level- Multi Thresholds Feature Selection technique
Abstract BackgroundBacteriophage or phage is a type of virus that replicates itself inside bacteria. It consist of genetic material surrounded by a protein structure. Bacteriophage plays a vital role in the domain of phage therapy and genetic engineering. Phage and hydrolases enzyme proteins have a significant impact on the cure of pathogenic bacterial infections and disease treatment. Accurate identification of bacteriophage proteins is important in the host subcellular localization for further understanding of the interaction between phage, hydrolases, and in designing antibacterial drugs. Looking at the significance of Bacteriophage proteins, besides wet laboratory-based methods several computational models have been developed so far. However, the performance was not considerable due to inefficient feature schemes, redundancy, noise, and lack of an intelligent learning engine. Therefore we have developed an anovative bi-layered model name DeepEnzyPred. A Hybrid feature vector was obtained via a novel Multi-Level Multi-Threshold subset feature selection (MLMT-SFS) algorithm. A two-dimensional convolutional neural network was adopted as a baseline classifier.ResultsA conductive hybrid feature was obtained via a serial combination of CTD and KSAACGP features. The optimum feature was selected via a Novel Multi-Level Multi-Threshold Subset Feature selection algorithm. Over 5-fold jackknife cross-validation, an accuracy of 91.6 %, Sensitivity of 63.39%, Specificity 95.72%, MCC of 0.6049, and ROC value of 0.8772 over Layer-1 were recorded respectively. Similarly, the underline model obtained an Accuracy of 96.05%, Sensitivity of 96.22%, Specificity of 95.91%, MCC of 0.9219, and ROC value of 0.9899 over layer-2 respectivily.ConclusionThis paper presents a robust and effective classification model was developed for bacteriophage and their types. Primitive features were extracted via CTD and KSAACGP. A novel method (MLMT-SFS ) was devised for yielding optimum hybrid feature space out of primitive features. The result drew over hybrid feature space and 2D-CNN shown an excellent classification. Based on the recorded results, we believe that the developed predictor will be a valuable resource for large scale discrimination of unknown Phage and hydrolase enzymes in particular and new antibacterial drug design in pharmaceutical companies in general.