scholarly journals ragp: Pipeline for mining of plant hydroxyproline-rich glycoproteins with implementation in R

Glycobiology ◽  
2019 ◽  
Vol 30 (1) ◽  
pp. 19-35 ◽  
Author(s):  
Milan B Dragićević ◽  
Danijela M Paunović ◽  
Milica D Bogdanović ◽  
Sladjana I .Todorović ◽  
Ana D Simonović

Abstract Hydroxyproline-rich glycoproteins (HRGPs) are one of the most complex families of macromolecules found in plants, due to the diversity of glycans decorating the protein backbone, as well as the heterogeneity of the protein backbones. While this diversity is responsible for a wide array of physiological functions associated with HRGPs, it hinders attempts for homology-based identification. Current approaches, based on identifying sequences with characteristic motifs and biased amino acid composition, are limited to prototypical sequences. Ragp is an R package for mining and analysis of HRGPs, with emphasis on arabinogalactan proteins. The ragp filtering pipeline exploits one of the HRGPs key features, the presence of hydroxyprolines which represent glycosylation sites. Main package features include prediction of proline hydroxylation sites, amino acid motif and bias analyses, efficient communication with web servers for prediction of N-terminal signal peptides, glycosylphosphatidylinositol modification sites and disordered regions and the ability to annotate sequences through hmmscan and subsequent GO enrichment, based on predicted Pfam domains. As such, ragp extends R’s rich ecosystem for high-throughput sequence data analyses. The ragp R package is available under the MIT Open Source license and is freely available to download from GitHub at: https://github.com/missuse/ragp.

2005 ◽  
Vol 33 (1) ◽  
pp. 105-107 ◽  
Author(s):  
R.L. Jack ◽  
A. Dubini ◽  
T. Palmer ◽  
F. Sargent

A subset of bacterial periplasmic enzymes are transported from the cytoplasm by the twin-arginine transport apparatus. Such proteins contain distinctive N-terminal signal peptides containing a conserved SRRXFLK ‘twin-arginine’ amino acid motif and often bind complex cofactors before the transport event. It is important that assembly of complex cofactor-containing, and often multi-subunit, enzymes is complete before export. Studies of the unrelated [NiFe] hydrogenase, DMSO reductase and trimethylamine N-oxide reductase systems from Escherichia coli have enabled us to define a chaperone-mediated ‘proofreading’ mechanism involved in co-ordinating assembly and export of twin-arginine transport-dependent enzymes.


Author(s):  
Felix Teufel ◽  
José Juan Almagro Armenteros ◽  
Alexander Rosenberg Johansen ◽  
Magnús Halldór Gíslason ◽  
Silas Irby Pihl ◽  
...  

AbstractSignal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data.


1991 ◽  
Vol 275 (2) ◽  
pp. 529-534 ◽  
Author(s):  
I B Wilson ◽  
Y Gavel ◽  
G von Heijne

To study the sequence requirements for addition of O-linked N-acetylgalactosamine to proteins, amino acid distributions around 174 O-glycosylation sites were compared with distributions around non-glycosylated sites. In comparison with non-glycosylated serine and threonine residues, the most prominent feature in the vicinity of O-glycosylated sites is a significantly increased frequency of proline residues, especially at positions -1 and +3 relative to the glycosylated residues. Alanine, serine and threonine are also significantly increased. The high serine and threonine content of O-glycosylated regions is due to the presence of clusters of several closely spaced glycosylated hydroxy amino acids in many O-glycosylated proteins. Such clusters can be predicted from the primary sequence in some cases, but there is no apparent possibility of predicting isolated O-glycosylation sites from primary sequence data.


1990 ◽  
Vol 269 (3) ◽  
pp. 691-696 ◽  
Author(s):  
M Prabhakaran

Signal peptides play a major role in an as-yet-undefined way in the translocation of proteins across membranes. The sequential arrangement of the chemical, physical and conformational properties of the signal and nascent amino acid sequences of the translocated proteins has been compiled and analysed in the present study. The sequence data of 126 signal peptides of length between 18 and 21 residues form the basis of this study. The statistical distribution of the following properties was studied hydrophobicity, Mr, bulkiness, chromatographic index and preference for adopting alpha-helical, β-sheet and turn structures. The contribution of each property to the sequence arrangement was derived. A hydrophobic core sequence was found in all signal peptides investigated. The structural arrangement of the cleavage site was also clearly revealed by this study. Most of the physical properties of the individual sequences correlated (correlation coefficient approximately 0.4) very well with the average distribution. The preferred occupancy of amino acid residues in the signal and nascent sequences was also calculated and correlated with their property distribution. The periodic behaviour of the signal and nascent chains was revealed by calculating their hydrophobic moments for various repetitive conformations. A graphical analysis of average hydrophobic moments versus average hydrophobicity of peptides revealed the transmembrane characteristics of signal peptides and globular characteristics of the nascent peptides.


1982 ◽  
Vol 207 (1) ◽  
pp. 91-95 ◽  
Author(s):  
C W Ward ◽  
T C Elleman ◽  
A A Azad

The amino acid sequence of the Pronase-released heads of neuraminidase subtype N2 from the A/Tokyo/3/67 strain of influenza virus was determined by a combination of peptide and nucleic acid sequence analysis. The results show that the Pronase-released heads contain 396 amino acid residues and extend from residue 74 in the original protein to the C-terminus at residue 469. The heads contain five potential glycosylation sites at asparagine residues 86, 146, 200, 234 and 402, but only the first four are glycosylated. The sequence homology with the corresponding region of the previously published sequence of the neuraminidase subtype N1 [Fields, Winter & Brownlee (1981) Nature (London) 290, 213-217] is 45%. Detailed evidence for the sequence data has been deposited as Supplementary Publication SUP 50116 (14 pages) at the British Library Lending Division, Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1981) 193, 5.


2020 ◽  
Vol 15 ◽  
Author(s):  
Affan Alim ◽  
Abdul Rafay ◽  
Imran Naseem

Background: Proteins contribute significantly in every task of cellular life. Their functions encompass the building and repairing of tissues in human bodies and other organisms. Hence they are the building blocks of bones, muscles, cartilage, skin, and blood. Similarly, antifreeze proteins are of prime significance for organisms that live in very cold areas. With the help of these proteins, the cold water organisms can survive below zero temperature and resist the water crystallization process which may cause the rupture in the internal cells and tissues. AFP’s have attracted attention and interest in food industries and cryopreservation. Objective: With the increase in the availability of genomic sequence data of protein, an automated and sophisticated tool for AFP recognition and identification is in dire need. The sequence and structures of AFP are highly distinct, therefore, most of the proposed methods fail to show promising results on different structures. A consolidated method is proposed to produce the competitive performance on highly distinct AFP structure. Methods: In this study, we propose to use machine learning-based algorithms Principal Component Analysis (PCA) followed by Gradient Boosting (GB) for antifreeze protein identification. To analyze the performance and validation of the proposed model, various combinations of two segments composition of amino acid and dipeptide are used. PCA, in particular, is proposed to dimension reduction and high variance retaining of data which is followed by an ensemble method named gradient boosting for modelling and classification. Results: The proposed method obtained the superfluous performance on PDB, Pfam and Uniprot dataset as compared with the RAFP-Pred method. In experiment-3, by utilizing only 150 PCA components a high accuracy of 89.63 was achieved which is superior to the 87.41 utilizing 300 significant features reported for the RAFP-Pred method. Experiment-2 is conducted using two different dataset such that non-AFP from the PISCES server and AFPs from Protein data bank. In this experiment-2, our proposed method attained high sensitivity of 79.16 which is 12.50 better than state-of-the-art the RAFP-pred method. Conclusion: AFPs have a common function with distinct structure. Therefore, the development of a single model for different sequences often fails to AFPs. A robust results have been shown by our proposed model on the diversity of training and testing dataset. The results of the proposed model outperformed compared to the previous AFPs prediction method such as RAFP-Pred. Our model consists of PCA for dimension reduction followed by gradient boosting for classification. Due to simplicity, scalability properties and high performance result our model can be easily extended for analyzing the proteomic and genomic dataset.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Charith B. Karunarathna ◽  
Jinko Graham

Abstract Background A perfect phylogeny is a rooted binary tree that recursively partitions sequences. The nested partitions of a perfect phylogeny provide insight into the pattern of ancestry of genetic sequence data. For example, sequences may cluster together in a partition indicating that they arise from a common ancestral haplotype. Results We present an R package to reconstruct the local perfect phylogenies underlying a sample of binary sequences. The package enables users to associate the reconstructed partitions with a user-defined partition. We describe and demonstrate the major functionality of the package. Conclusion The package should be of use to researchers seeking insight into the ancestral structure of their sequence data. The reconstructed partitions have many applications, including the mapping of trait-influencing variants.


2017 ◽  
Vol 21 (9) ◽  
pp. 2117-2128 ◽  
Author(s):  
Shui-yi Hu ◽  
Qiu-hua Gu ◽  
Jia Wang ◽  
Miao Wang ◽  
Xiao-yu Jia ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document