The influence of gapped positions in multiple sequence alignments on secondary structure prediction methods

2004 ◽  
Vol 28 (5-6) ◽  
pp. 351-366 ◽  
Author(s):  
V.A. Simossis ◽  
J. Heringa
Author(s):  
Fabian Sievers ◽  
Desmond G Higgins

Abstract Motivation Secondary structure prediction accuracy (SSPA) in the QuanTest benchmark can be used to measure accuracy of a multiple sequence alignment. SSPA correlates well with the sum-of-pairs score, if the results are averaged over many alignments but not on an alignment-by-alignment basis. This is due to a sub-optimal selection of reference and non-reference sequences in QuanTest. Results We develop an improved strategy for selecting reference and non-reference sequences for a new benchmark, QuanTest2. In QuanTest2, SSPA and SP correlate better on an alignment-by-alignment basis than in QuanTest. Guide-trees for QuanTest2 are more balanced with respect to reference sequences than in QuanTest. QuanTest2 scores correlate well with other well-established benchmarks. Availability and implementation QuanTest2 is available at http://bioinf.ucd.ie/quantest2.tar, comprises of reference and non-reference sequence sets and a scoring script. Supplementary information Supplementary data are available at Bioinformatics online


2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Marc-André Bossanyi ◽  
Valentin Carpentier ◽  
Jean-Pierre S Glouzon ◽  
Aïda Ouangraoua ◽  
Yoann Anselmetti

Abstract Predicting RNA structure is crucial for understanding RNA’s mechanism of action. Comparative approaches for the prediction of RNA structures can be classified into four main strategies. The three first—align-and-fold, align-then-fold and fold-then-align—exploit multiple sequence alignments to improve the accuracy of conserved RNA-structure prediction. Align-and-fold methods perform generally better, but are also typically slower than the other alignment-based methods. The fourth strategy—alignment-free—consists in predicting the conserved RNA structure without relying on sequence alignment. This strategy has the advantage of being the faster, while predicting accurate structures through the use of latent representations of the candidate structures for each sequence. This paper presents aliFreeFoldMulti, an extension of the aliFreeFold algorithm. This algorithm predicts a representative secondary structure of multiple RNA homologs by using a vector representation of their suboptimal structures. aliFreeFoldMulti improves on aliFreeFold by additionally computing the conserved structure for each sequence. aliFreeFoldMulti is assessed by comparing its prediction performance and time efficiency with a set of leading RNA-structure prediction methods. aliFreeFoldMulti has the lowest computing times and the highest maximum accuracy scores. It achieves comparable average structure prediction accuracy as other methods, except TurboFoldII which is the best in terms of average accuracy but with the highest computing times. We present aliFreeFoldMulti as an illustration of the potential of alignment-free approaches to provide fast and accurate RNA-structure prediction methods.


2014 ◽  
Vol 70 (12) ◽  
pp. 3110-3123 ◽  
Author(s):  
Alexander Ulrich ◽  
Markus C. Wahl

Cwc27 is a spliceosomal cyclophilin-type peptidyl-prolylcis–transisomerase (PPIase). Here, the crystal structure of a relatively protease-resistant N-terminal fragment of human Cwc27 containing the PPIase domain was determined at 2.0 Å resolution. The fragment exhibits a C-terminal appendix and resides in a reduced state compared with the previous oxidized structure of a similar fragment. By combining multiple sequence alignments spanning the eukaryotic tree of life and secondary-structure prediction, Cwc27 proteins across the entire eukaryotic kingdom were identified. This analysis revealed the specific loss of a crucial active-site residue in higher eukaryotic Cwc27 proteins, suggesting that the protein evolved from a prolyl isomerase to a pure proline binder. Noting a fungus-specific insertion in the PPIase domain, the 1.3 Å resolution crystal structure of the PPIase domain of Cwc27 fromChaetomium thermophilumwas also determined. Although structurally highly similar in the core domain, theC. thermophilumprotein displayed a higher thermal stability than its human counterpart, presumably owing to the combined effect of several amino-acid exchanges that reduce the number of long side chains with strained conformations and create new intramolecular interactions, in particular increased hydrogen-bond networks.


2019 ◽  
Author(s):  
Mark Chonofsky ◽  
Saulo H. P. de Oliveira ◽  
Konrad Krawczyk ◽  
Charlotte M. Deane

AbstractOver the last few years, the field of protein structure prediction has been transformed by increasingly-accurate contact prediction software. These methods are based on the detection of coevolutionary relationships between residues from multiple sequence alignments. However, despite speculation, there is little evidence of a link between contact prediction and the physico-chemical interactions which drive amino-acid coevolution. Furthermore, existing protocols predict only a fraction of all protein contacts and it is not clear why some contacts are favoured over others.Using a dataset of 863 protein domains, we assessed the physico-chemical interactions of contacts predicted by CCMpred, MetaPSICOV, and DNCON2, as examples of direct coupling analysis, meta-prediction, and deep learning, respectively. To further investigate what sets these predicted contacts apart, we considered correctly-predicted contacts and compared their properties against the protein contacts that were not predicted.We found that predicted contacts tend to form more bonds than non-predicted contacts, which suggests these contacts may be more important. Comparing the contacts predicted by each method, we found that metaPSICOV and DNCON2 favour accuracy whereas CCMPred detects contacts with more bonds. This suggests that the push for higher accuracy may lead to a loss of physico-chemically important contacts.These results underscore the connection between protein physico-chemistry and the coevolutionary couplings that can be derived from multiple sequence alignments. This relationship is likely to be relevant to protein structure prediction and functional analysis of protein structure and may be key to understanding their utility for different problems in structural biology.Author summaryAccurate contact prediction has allowed scientists to predict protein structures with unprecedented levels of accuracy. The success of contact prediction methods, which are based on inferring correlations between amino acids in protein multiple sequence alignments, has prompted a great deal of work to improve the quality of contact prediction, leading to the development of several different methods for detecting amino acids in proximity.In this paper, we investigate the properties of these contact prediction methods. We find that contacts which are predicted differ from the other contacts in the protein, in particular they have more physico-chemical bonds, and the predicted contacts are more strongly conserved than other contacts across protein families. We also compared the properties of different contact prediction methods and found that the characteristics of the predicted sets depend on the prediction method used.Our results point to a link between physico-chemical bonding interactions and the evolutionary history of proteins, a connection which is reflected in their amino acid sequences.


Sign in / Sign up

Export Citation Format

Share Document