proteins classification
Recently Published Documents


TOTAL DOCUMENTS

10
(FIVE YEARS 0)

H-INDEX

5
(FIVE YEARS 0)

2018 ◽  
Author(s):  
Mathilde Carpentier ◽  
Jacques Chomilier

ABSTRACTFacing the huge increase of information about proteins, classification has reached the level of a compulsory task, essential for assigning a function to a given sequence, by means of comparison to existing data. Multiple sequence alignment programs have been proven to be very useful and they have already been evaluated. In this paper we wished to evaluate the added value provided by taking into account structures. We compared the multiple alignments resulting from 24 programs, either based on sequence, structure, or both, to reference alignments deposited in five databases. Reference databases, on their side, can be split in two: more automatic ones, and more manually ones. Scores have been attributed to each program. As a global rule of thumb, five groups of methods emerge, with the lead to two of the structure-based programs. This advantage is increased at low levels of sequence identity among aligned proteins, or for residues in regular secondary structures or buried. Concerning gap management, sequence-based programs place less gaps than structure-based programs. Concerning the databases, the alignments from the manually built databases are the more challenging for the programs.


2017 ◽  
Vol 10 (2) ◽  
pp. 229-252 ◽  
Author(s):  
Md. Sarwar Kamal ◽  
Md. Golam Sarowar ◽  
Nilanjan Dey ◽  
Amira S. Ashour ◽  
Shamim H. Ripon ◽  
...  

Planta ◽  
2016 ◽  
Vol 244 (5) ◽  
pp. 971-997 ◽  
Author(s):  
Tiina A. Salminen ◽  
Kristina Blomqvist ◽  
Johan Edqvist

PLoS ONE ◽  
2012 ◽  
Vol 7 (5) ◽  
pp. e36634 ◽  
Author(s):  
Davide De Lucrezia ◽  
Debora Slanzi ◽  
Irene Poli ◽  
Fabio Polticelli ◽  
Giovanni Minervini

Author(s):  
Ricco Rakotomalala ◽  
Faouzi Mhamdi

In this chapter, we are interested in proteins classification starting from their primary structures. The goal is to automatically affect proteins sequences to their families. The main originality of the approach is that we directly apply the text categorization framework for the protein classification with very minor modifications. The main steps of the task are clearly identified: we must extract features from the unstructured dataset, we use the fixed length n-grams descriptors; we select and combine the most relevant one for the learning phase; and then, we select the most promising learning algorithm in order to produce accurate predictive model. We obtain essentially two main results. First, the approach is credible, giving accurate results with only 2-grams descriptors length. Second, in our context where many irrelevant descriptors are automatically generated, we must combine aggressive feature selection algorithms and low variance classifiers such as SVM (Support Vector Machine).


Sign in / Sign up

Export Citation Format

Share Document