Background:
Cell lytic enzyme is a kind of highly evolved protein, which can destroy
the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause
serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic
enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using
antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for
curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and
autolysin and the difference between them is the purpose of the break of cell wall. The
identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes.
Objective:
In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic
enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme.
However, it is time consuming to detect the type of cell lytic enzyme by experimental methods.
Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in
our work.
Method:
We propose a computational method for the prediction of endolysin and autolysin. First,
a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by
tripeptides composition. The features are selected with larger confidence degree. At last, the
classifier is trained by the labeled vectors based on support vector machine. The learned classifier is
used to predict the type of cell lytic enzyme.
Results:
Following the proposed method, the experimental results show that the overall accuracy
can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method
improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our
proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy
of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic
PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is
improved by nearly 18% when using the tripeptides optimal feature set.
Conclusion:
The paper proposed an efficient method for identifying endolysin and autolysin. In
this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental
results show that the overall accuracy of the proposed method is 94.12%, which is better than some
existing methods. In conclusion, the selected 44 features can improve the overall accuracy for
identification of the type of cell lytic enzyme. Support vector machine performs better than other
classifiers when using the selected feature set on the benchmark data set.