ragp: Pipeline for mining of plant hydroxyproline-rich glycoproteins with implementation in R

Milan B Dragićević; Danijela M Paunović; Milica D Bogdanović; Sladjana I .Todorović; Ana D Simonović

doi:10.1093/glycob/cwz072

ragp: Pipeline for mining of plant hydroxyproline-rich glycoproteins with implementation in R

Glycobiology ◽

10.1093/glycob/cwz072 ◽

2019 ◽

Vol 30 (1) ◽

pp. 19-35 ◽

Cited By ~ 3

Author(s):

Milan B Dragićević ◽

Danijela M Paunović ◽

Milica D Bogdanović ◽

Sladjana I .Todorović ◽

Ana D Simonović

Keyword(s):

Amino Acid ◽

Sequence Data ◽

R Package ◽

Signal Peptides ◽

Amino Acid Motif ◽

Glycosylation Sites ◽

Go Enrichment ◽

Efficient Communication ◽

Protein Backbones ◽

Proline Hydroxylation

Abstract Hydroxyproline-rich glycoproteins (HRGPs) are one of the most complex families of macromolecules found in plants, due to the diversity of glycans decorating the protein backbone, as well as the heterogeneity of the protein backbones. While this diversity is responsible for a wide array of physiological functions associated with HRGPs, it hinders attempts for homology-based identification. Current approaches, based on identifying sequences with characteristic motifs and biased amino acid composition, are limited to prototypical sequences. Ragp is an R package for mining and analysis of HRGPs, with emphasis on arabinogalactan proteins. The ragp filtering pipeline exploits one of the HRGPs key features, the presence of hydroxyprolines which represent glycosylation sites. Main package features include prediction of proline hydroxylation sites, amino acid motif and bias analyses, efficient communication with web servers for prediction of N-terminal signal peptides, glycosylphosphatidylinositol modification sites and disordered regions and the ability to annotate sequences through hmmscan and subsequent GO enrichment, based on predicted Pfam domains. As such, ragp extends R’s rich ecosystem for high-throughput sequence data analyses. The ragp R package is available under the MIT Open Source license and is freely available to download from GitHub at: https://github.com/missuse/ragp.

Download Full-text

Common principles in the biosynthesis of diverse enzymes

Biochemical Society Transactions ◽

10.1042/bst0330105 ◽

2005 ◽

Vol 33 (1) ◽

pp. 105-107 ◽

Cited By ~ 7

Author(s):

R.L. Jack ◽

A. Dubini ◽

T. Palmer ◽

F. Sargent

Keyword(s):

Escherichia Coli ◽

Amino Acid ◽

Signal Peptides ◽

Amino Acid Motif ◽

Dmso Reductase ◽

Arginine Transport

A subset of bacterial periplasmic enzymes are transported from the cytoplasm by the twin-arginine transport apparatus. Such proteins contain distinctive N-terminal signal peptides containing a conserved SRRXFLK ‘twin-arginine’ amino acid motif and often bind complex cofactors before the transport event. It is important that assembly of complex cofactor-containing, and often multi-subunit, enzymes is complete before export. Studies of the unrelated [NiFe] hydrogenase, DMSO reductase and trimethylamine N-oxide reductase systems from Escherichia coli have enabled us to define a chaperone-mediated ‘proofreading’ mechanism involved in co-ordinating assembly and export of twin-arginine transport-dependent enzymes.

Download Full-text

SignalP 6.0 predicts all five types of signal peptides using protein language models

Nature Biotechnology ◽

10.1038/s41587-021-01156-3 ◽

2022 ◽

Author(s):

Felix Teufel ◽

José Juan Almagro Armenteros ◽

Alexander Rosenberg Johansen ◽

Magnús Halldór Gíslason ◽

Silas Irby Pihl ◽

...

Keyword(s):

Machine Learning ◽

Amino Acid ◽

Sequence Data ◽

Amino Acid Sequences ◽

Language Models ◽

Metagenomic Data ◽

Signal Peptides ◽

Machine Learning Model ◽

Living Organisms ◽

Control Protein

AbstractSignal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms are unable to detect all known types of SPs. We introduce SignalP 6.0, a machine learning model that detects all five SP types and is applicable to metagenomic data.

Download Full-text

Amino acid distributions around O-linked glycosylation sites

Biochemical Journal ◽

10.1042/bj2750529 ◽

1991 ◽

Vol 275 (2) ◽

pp. 529-534 ◽

Cited By ~ 200

Author(s):

I B Wilson ◽

Y Gavel ◽

G von Heijne

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Sequence Data ◽

Prominent Feature ◽

Primary Sequence ◽

Hydroxy Amino ◽

Glycosylation Sites ◽

Sequence Requirements ◽

Primary Sequence Data ◽

Glycosylated Proteins

To study the sequence requirements for addition of O-linked N-acetylgalactosamine to proteins, amino acid distributions around 174 O-glycosylation sites were compared with distributions around non-glycosylated sites. In comparison with non-glycosylated serine and threonine residues, the most prominent feature in the vicinity of O-glycosylated sites is a significantly increased frequency of proline residues, especially at positions -1 and +3 relative to the glycosylated residues. Alanine, serine and threonine are also significantly increased. The high serine and threonine content of O-glycosylated regions is due to the presence of clusters of several closely spaced glycosylated hydroxy amino acids in many O-glycosylated proteins. Such clusters can be predicted from the primary sequence in some cases, but there is no apparent possibility of predicting isolated O-glycosylation sites from primary sequence data.

Download Full-text

The distribution of physical, chemical and conformational properties in signal and nascent peptides

Biochemical Journal ◽

10.1042/bj2690691 ◽

1990 ◽

Vol 269 (3) ◽

pp. 691-696 ◽

Cited By ~ 23

Author(s):

M Prabhakaran

Keyword(s):

Amino Acid ◽

Sequence Data ◽

Graphical Analysis ◽

Amino Acid Sequences ◽

Signal Peptides ◽

Hydrophobic Core ◽

Amino Acid Residues ◽

Conformational Properties ◽

The Individual ◽

Translocated Proteins

Signal peptides play a major role in an as-yet-undefined way in the translocation of proteins across membranes. The sequential arrangement of the chemical, physical and conformational properties of the signal and nascent amino acid sequences of the translocated proteins has been compiled and analysed in the present study. The sequence data of 126 signal peptides of length between 18 and 21 residues form the basis of this study. The statistical distribution of the following properties was studied hydrophobicity, Mr, bulkiness, chromatographic index and preference for adopting alpha-helical, β-sheet and turn structures. The contribution of each property to the sequence arrangement was derived. A hydrophobic core sequence was found in all signal peptides investigated. The structural arrangement of the cleavage site was also clearly revealed by this study. Most of the physical properties of the individual sequences correlated (correlation coefficient approximately 0.4) very well with the average distribution. The preferred occupancy of amino acid residues in the signal and nascent sequences was also calculated and correlated with their property distribution. The periodic behaviour of the signal and nascent chains was revealed by calculating their hydrophobic moments for various repetitive conformations. A graphical analysis of average hydrophobic moments versus average hydrophobicity of peptides revealed the transmembrane characteristics of signal peptides and globular characteristics of the nascent peptides.

Download Full-text

Amino acid sequence of the Pronase-released heads of neuraminidase subtype N2 from the Asian strain A/Tokyo/3/67 of influenza virus

Biochemical Journal ◽

10.1042/bj2070091 ◽

1982 ◽

Vol 207 (1) ◽

pp. 91-95 ◽

Cited By ~ 34

Author(s):

C W Ward ◽

T C Elleman ◽

A A Azad

Keyword(s):

Influenza Virus ◽

Amino Acid ◽

Amino Acid Sequence ◽

Sequence Data ◽

British Library ◽

Amino Acid Residues ◽

Asian Strain ◽

C Terminus ◽

Glycosylation Sites ◽

Neuraminidase Subtype

The amino acid sequence of the Pronase-released heads of neuraminidase subtype N2 from the A/Tokyo/3/67 strain of influenza virus was determined by a combination of peptide and nucleic acid sequence analysis. The results show that the Pronase-released heads contain 396 amino acid residues and extend from residue 74 in the original protein to the C-terminus at residue 469. The heads contain five potential glycosylation sites at asparagine residues 86, 146, 200, 234 and 402, but only the first four are glycosylated. The sequence homology with the corresponding region of the previously published sequence of the neuraminidase subtype N1 [Fields, Winter & Brownlee (1981) Nature (London) 290, 213-217] is 45%. Detailed evidence for the sequence data has been deposited as Supplementary Publication SUP 50116 (14 pages) at the British Library Lending Division, Boston Spa, Wetherby, West Yorkshire LS23 7BQ, U.K., from whom copies may be obtained on the terms given in Biochem. J. (1981) 193, 5.

Download Full-text

PoGB-Pred: Prediction of Antifreeze Proteins Sequences using Amino Acid Composition with Feature Selection followed by a Sequential based Ensemble Approach

Current Bioinformatics ◽

10.2174/1574893615999200707141926 ◽

2020 ◽

Vol 15 ◽

Author(s):

Affan Alim ◽

Abdul Rafay ◽

Imran Naseem

Keyword(s):

Amino Acid ◽

Dimension Reduction ◽

Protein Identification ◽

Cold Water ◽

Genomic Sequence ◽

Sequence Data ◽

Antifreeze Proteins ◽

Building Blocks ◽

Gradient Boosting ◽

Proposed Model

Background: Proteins contribute significantly in every task of cellular life. Their functions encompass the building and repairing of tissues in human bodies and other organisms. Hence they are the building blocks of bones, muscles, cartilage, skin, and blood. Similarly, antifreeze proteins are of prime significance for organisms that live in very cold areas. With the help of these proteins, the cold water organisms can survive below zero temperature and resist the water crystallization process which may cause the rupture in the internal cells and tissues. AFP’s have attracted attention and interest in food industries and cryopreservation. Objective: With the increase in the availability of genomic sequence data of protein, an automated and sophisticated tool for AFP recognition and identification is in dire need. The sequence and structures of AFP are highly distinct, therefore, most of the proposed methods fail to show promising results on different structures. A consolidated method is proposed to produce the competitive performance on highly distinct AFP structure. Methods: In this study, we propose to use machine learning-based algorithms Principal Component Analysis (PCA) followed by Gradient Boosting (GB) for antifreeze protein identification. To analyze the performance and validation of the proposed model, various combinations of two segments composition of amino acid and dipeptide are used. PCA, in particular, is proposed to dimension reduction and high variance retaining of data which is followed by an ensemble method named gradient boosting for modelling and classification. Results: The proposed method obtained the superfluous performance on PDB, Pfam and Uniprot dataset as compared with the RAFP-Pred method. In experiment-3, by utilizing only 150 PCA components a high accuracy of 89.63 was achieved which is superior to the 87.41 utilizing 300 significant features reported for the RAFP-Pred method. Experiment-2 is conducted using two different dataset such that non-AFP from the PISCES server and AFPs from Protein data bank. In this experiment-2, our proposed method attained high sensitivity of 79.16 which is 12.50 better than state-of-the-art the RAFP-pred method. Conclusion: AFPs have a common function with distinct structure. Therefore, the development of a single model for different sequences often fails to AFPs. A robust results have been shown by our proposed model on the diversity of training and testing dataset. The results of the proposed model outperformed compared to the previous AFPs prediction method such as RAFP-Pred. Our model consists of PCA for dimension reduction followed by gradient boosting for classification. Due to simplicity, scalability properties and high performance result our model can be easily extended for analyzing the proteomic and genomic dataset.

Download Full-text

A carboxyl-terminal four-amino acid motif is required for secretion of the metalloprotease PrtG through the Erwinia chrysanthemi protease secretion pathway.

Journal of Biological Chemistry ◽

10.1016/s0021-9258(17)37064-3 ◽

1994 ◽

Vol 269 (12) ◽

pp. 8979-8985 ◽

Cited By ~ 1

Author(s):

J.M. Ghigo ◽

C. Wandersman

Keyword(s):

Amino Acid ◽

Erwinia Chrysanthemi ◽

Amino Acid Motif ◽

Secretion Pathway ◽

Carboxyl Terminal ◽

Protease Secretion

Download Full-text

Human HSP27 is phosphorylated at serines 78 and 82 by heat shock and mitogen-activated kinases that recognize the same amino acid motif as S6 kinase II.

Journal of Biological Chemistry ◽

10.1016/s0021-9258(18)48354-8 ◽

1992 ◽

Vol 267 (2) ◽

pp. 794-803 ◽

Cited By ~ 12

Author(s):

J Landry ◽

H Lambert ◽

M Zhou ◽

J N Lavoie ◽

E Hickey ◽

...

Keyword(s):

Amino Acid ◽

Heat Shock ◽

Amino Acid Motif ◽

S6 Kinase

Download Full-text

perfectphyloR: An R package for reconstructing perfect phylogenies

BMC Bioinformatics ◽

10.1186/s12859-019-3313-4 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Charith B. Karunarathna ◽

Jinko Graham

Keyword(s):

Binary Tree ◽

Sequence Data ◽

R Package ◽

Binary Sequences ◽

Ancestral Haplotype ◽

Perfect Phylogeny ◽

Nested Partitions ◽

Genetic Sequence ◽

Insight Into ◽

Rooted Binary Tree

Abstract Background A perfect phylogeny is a rooted binary tree that recursively partitions sequences. The nested partitions of a perfect phylogeny provide insight into the pattern of ancestry of genetic sequence data. For example, sequences may cluster together in a partition indicating that they arise from a common ancestral haplotype. Results We present an R package to reconstruct the local perfect phylogenies underlying a sample of binary sequences. The package enables users to associate the reconstructed partitions with a user-defined partition. We describe and demonstrate the major functionality of the package. Conclusion The package should be of use to researchers seeking insight into the ancestral structure of their sequence data. The reconstructed partitions have many applications, including the mapping of trait-influencing variants.

Download Full-text

The pathogenicity of T cell epitopes on human Goodpasture antigen and its critical amino acid motif

Journal of Cellular and Molecular Medicine ◽

10.1111/jcmm.13134 ◽

2017 ◽

Vol 21 (9) ◽

pp. 2117-2128 ◽

Cited By ~ 4

Author(s):

Shui-yi Hu ◽

Qiu-hua Gu ◽

Jia Wang ◽

Miao Wang ◽

Xiao-yu Jia ◽

...

Keyword(s):

Amino Acid ◽

T Cell ◽

T Cell Epitopes ◽

Amino Acid Motif ◽

Critical Amino Acid ◽

Goodpasture Antigen

Download Full-text