protein sequence classification
Recently Published Documents


TOTAL DOCUMENTS

58
(FIVE YEARS 7)

H-INDEX

12
(FIVE YEARS 1)

Feature Extraction from protein sequence is a very important task in bioinformatics. The main focus of that work is protein sequences classification that can be used to improve drug discovery and identification of diseases for treating patients in the early stages of diagnosis. In this paper, we proposed a method which is used for feature extraction i.e. converting the protein sequence of hemoglobin in to feature vectors. The feature vectors are then given to the ensemble classifier as an input which uses various classifier to provide better result/performance as compared to any constituent learning algorithm alone.


2019 ◽  
Author(s):  
Brandon Carter ◽  
Maxwell L. Bileschi ◽  
Jamie Smith ◽  
Theo Sanderson ◽  
Drew Bryant ◽  
...  

In many application domains, neural networks are highly accurate and have been deployed at large scale. However, users often do not have good tools for understanding how these models arrive at their predictions. This has hindered adoption in fields such as the life and medical sciences, where researchers require that models base their decisions on underlying biological phenomena rather than peculiarities of the dataset introduced. In response, we propose a set of methods for critiquing deep learning models and demonstrate their application for protein family classification, a task for which high-accuracy models have considerable potential impact. Our methods extend the sufficient input subsets technique, which we use to identify subsets of features (SIS) in each protein sequence that are alone sufficient for classification. Our suite of tools analyzes these subsets to shed light on the decision-making criteria employed by models trained on this task. These tools expose that while deep models may perform classification for biologically-relevant reasons, their behavior varies considerably across choice of network architecture and parameter initialization. While the techniques that we develop are specific to the protein sequence classification task, the approach taken generalizes to a broad set of scientific contexts in which model interpretability is essential.


Sign in / Sign up

Export Citation Format

Share Document