Refining pairwise sequence alignments of membrane proteins by the incorporation of anchors

René Staritzbichler; Edoardo Sarti; Emily Yaklich; Antoniya Aleksandrova; Marcus Stamm; Kamil Khafizov; Lucy R. Forrest

doi:10.1371/journal.pone.0239881

Refining pairwise sequence alignments of membrane proteins by the incorporation of anchors

PLoS ONE ◽

10.1371/journal.pone.0239881 ◽

2021 ◽

Vol 16 (4) ◽

pp. e0239881

Author(s):

René Staritzbichler ◽

Edoardo Sarti ◽

Emily Yaklich ◽

Antoniya Aleksandrova ◽

Marcus Stamm ◽

...

Keyword(s):

Membrane Proteins ◽

Sequence Alignment ◽

Ad Hoc ◽

Low Complexity ◽

Pairwise Sequence Alignment ◽

Sequence Alignments ◽

Alignment Procedure ◽

Alignment Tool ◽

Hydrophobic Amino Acids ◽

Optimum Alignment

The alignment of primary sequences is a fundamental step in the analysis of protein structure, function, and evolution, and in the generation of homology-based models. Integral membrane proteins pose a significant challenge for such sequence alignment approaches, because their evolutionary relationships can be very remote, and because a high content of hydrophobic amino acids reduces their complexity. Frequently, biochemical or biophysical data is available that informs the optimum alignment, for example, indicating specific positions that share common functional or structural roles. Currently, if those positions are not correctly matched by a standard pairwise sequence alignment procedure, the incorporation of such information into the alignment is typically addressed in an ad hoc manner, with manual adjustments. However, such modifications are problematic because they reduce the robustness and reproducibility of the aligned regions either side of the newly matched positions. Previous studies have introduced restraints as a means to impose the matching of positions during sequence alignments, originally in the context of genome assembly. Here we introduce position restraints, or “anchors” as a feature in our alignment tool AlignMe, providing an aid to pairwise global sequence alignment of alpha-helical membrane proteins. Applying this approach to realistic scenarios involving distantly-related and low complexity sequences, we illustrate how the addition of anchors can be used to modify alignments, while still maintaining the reproducibility and rigor of the rest of the alignment. Anchored alignments can be generated using the online version of AlignMe available at www.bioinfo.mpg.de/AlignMe/.

Download Full-text

Refining pairwise sequence alignments of membrane proteins by the incorporation of anchors

10.1101/2020.09.16.299453 ◽

2020 ◽

Author(s):

René Staritzbichler ◽

Edoardo Sarti ◽

Emily Yaklich ◽

Antoniya Aleksandrova ◽

Markus Stamm ◽

...

Keyword(s):

Membrane Proteins ◽

Sequence Alignment ◽

Ad Hoc ◽

Pairwise Alignment ◽

Low Complexity ◽

Pairwise Sequence Alignment ◽

Sequence Alignments ◽

Alignment Procedure ◽

Alignment Tool ◽

Optimum Alignment

AbstractThe alignment of primary sequences is a fundamental step in the analysis of protein structure, function, and evolution. Integral membrane proteins pose a significant challenge for such sequence alignment approaches, because their evolutionary relationships can be very remote, and because a high content of hydrophobic amino acids reduces their complexity. Frequently, biochemical or biophysical data is available that informs the optimum alignment, for example, indicating specific positions that share common functional or structural roles. Currently, if those positions are not correctly aligned by a standard pairwise alignment procedure, the incorporation of such information into the alignment is typically addressed in an ad hoc manner, with manual adjustments. However, such modifications are problematic because they reduce the robustness and reproducibility of the alignment. An alternative approach is the use of restraints, or anchors, to incorporate such position-matching explicitly during alignment. Here we introduce position anchoring in the alignment tool AlignMe as an aid to pairwise sequence alignment of membrane proteins. Applying this approach to realistic scenarios involving distantly-related and low complexity sequences, we illustrate how the addition of even a single anchor can dramatically improve the accuracy of the alignments, while maintaining the reproducibility and rigor of the overall alignment.

Download Full-text

Fast and SNP-aware short read alignment with SALT

BMC Bioinformatics ◽

10.1186/s12859-021-04088-6 ◽

2021 ◽

Vol 22 (S9) ◽

Author(s):

Wei Quan ◽

Bo Liu ◽

Yadong Wang

Keyword(s):

Sequence Alignment ◽

Genetic Variants ◽

High Throughput Sequencing ◽

Reference Genome ◽

Graph Model ◽

Sequence Alignments ◽

Short Read ◽

Read Alignment ◽

Short Read Alignment ◽

Alignment Tool

Abstract Background DNA sequence alignment is a common first step in most applications of high-throughput sequencing technologies. The accuracy of sequence alignments directly affects the accuracy of downstream analyses, such as variant calling and quantitative analysis of transcriptome; therefore, rapidly and accurately mapping reads to a reference genome is a significant topic in bioinformatics. Conventional DNA read aligners map reads to a linear reference genome (such as the GRCh38 primary assembly). However, such a linear reference genome represents the genome of only one or a few individuals and thus lacks information on variations in the population. This limitation can introduce bias and impact the sensitivity and accuracy of mapping. Recently, a number of aligners have begun to map reads to populations of genomes, which can be represented by a reference genome and a large number of genetic variants. However, compared to linear reference aligners, an aligner that can store and index all genetic variants has a high cost in memory (RAM) space and leads to extremely long run time. Aligning reads to a graph-model-based index that includes all types of variants is ultimately an NP-hard problem in theory. By contrast, considering only single nucleotide polymorphism (SNP) information will reduce the complexity of the index and improve the speed of sequence alignment. Results The SNP-aware alignment tool (SALT) is a fast, memory-efficient, and SNP-aware short read alignment tool. SALT uses 5.8 GB of RAM to index a human reference genome (GRCh38) and incorporates 12.8M UCSC common SNPs. Compared with a state-of-the-art aligner, SALT has a similar speed but higher accuracy. Conclusions Herein, we present an SNP-aware alignment tool (SALT) that aligns reads to a reference genome that incorporates an SNP database. We benchmarked SALT using simulated and real datasets. The results demonstrate that SALT can efficiently map reads to the reference genome with significantly improved accuracy. Incorporating SNP information can improve the accuracy of read alignment and can reveal novel variants. The source code is freely available at https://github.com/weiquan/SALT.

Download Full-text

Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments

BMC Bioinformatics ◽

10.1186/1471-2105-11-146 ◽

2010 ◽

Vol 11 (1) ◽

pp. 146 ◽

Cited By ~ 7

Author(s):

Michael L Sierk ◽

Michael E Smoot ◽

Ellen J Bass ◽

William R Pearson

Keyword(s):

Sequence Alignment ◽

Protein Sequence ◽

Alignment Accuracy ◽

Pairwise Sequence Alignment ◽

Sequence Alignments

Download Full-text

Binary integer programming for Multiple Sequence Alignment

10.1101/854786 ◽

2019 ◽

Author(s):

S. Ali Lajevardy ◽

Mehrdad Kargari

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Optimal Solution ◽

Binary Integer Programming ◽

Pairwise Sequence Alignment ◽

Sequence Alignments ◽

Optimal Method ◽

Multiple Sequence ◽

The Past ◽

Genetic Algorithm Method

AbstractMolecular biology advances in the past few decades have contributed to the rapid increase in genome sequencing of various organisms; sequence alignment is usually considered as the first step in understanding the molecular function of a sequence. An optimal alignment adjusts two or more sequences in a way that it could compare the maximum number of identical or similar residues. The two sequence alignments types are: Pairwise Sequence Alignment (PSA) and Multiple Sequence Alignment (MSA). While dynamic programming (DP) technique is used in PSA to provide the optimal method, it will lead to more complexity if used in MSA. So, the MSA mainly uses heuristic and approximation methods. This paper presents a mathematical model for MSA that can be used as a basis for optimal solution in different ways. In order to obtain the results, the model is implemented using Genetic Algorithm method on the web.

Download Full-text

GSAlign – an efficient sequence alignment tool for intra-species genomes

10.1101/782193 ◽

2019 ◽

Author(s):

Hsin-Nan Lin ◽

Wen-Lian Hsu

Keyword(s):

Sequence Alignment ◽

State Of The Art ◽

Genome Comparison ◽

Sequence Variants ◽

Sequence Alignments ◽

Large Genome ◽

Alignment Tool ◽

Sequence Variations ◽

Efficient Sequence ◽

Alignment Result

AbstractPersonal genomics and comparative genomics are becoming more important in clinical practice and genome research. Both fields require sequence alignment to discover sequence conservation and variation. Though many methods have been developed, some are designed for small genome comparison while some are not efficient for large genome comparison. Moreover, most existing genome comparison tools have not been evaluated the correctness of sequence alignments systematically. A wrong sequence alignment would produce false sequence variants. In this study, we present GSAlign that handles large genome sequence alignment efficiently and identifies sequence variants from the alignment result. GSAlign is an efficient sequence alignment tool for intra-species genomes. It identifies sequence variations from the sequence alignments. We estimate performance by measuring the correctness of predicted sequence variations. The experiment results demonstrated that GSAlign is not only faster than most existing state-of-the-art methods, but also identifies sequence variants with high accuracy.

Download Full-text

Hubsm: A Novel Amino Acid Substitution Matrix for Comparing Hub Proteins

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.53 ◽

2017 ◽

Vol 7 (8) ◽

pp. 212

Author(s):

Renganayaki G. ◽

Achuthsankar S. Nair

Keyword(s):

Amino Acid ◽

Amino Acid Substitution ◽

Low Complexity ◽

Database Search ◽

Substitution Matrix ◽

Compositional Bias ◽

Sequence Alignments ◽

Amino Acid Substitution Matrix ◽

Alignment Algorithms ◽

Hub Proteins

Sequence alignment algorithms and database search methods use BLOSUM and PAM substitution matrices constructed from general proteins. These de facto matrices are not optimal to align sequences accurately, for the proteins with markedly different compositional bias in the amino acid. In this work, a new amino acid substitution matrix is calculated for the disorder and low complexity rich region of Hub proteins, based on residue characteristics. Insights into the amino acid background frequencies and the substitution scores obtained from the Hubsm unveils the residue substitution patterns which differs from commonly used scoring matrices .When comparing the Hub protein sequences for detecting homologs, the use of this Hubsm matrix yields better results than PAM and BLOSUM matrices. Usage of Hubsm matrix can be optimal in database search and for the construction of more accurate sequence alignments of Hub proteins.

Download Full-text

The Conservation of Low Complexity Regions in Bacterial Proteins Depends on the Pathogenicity of the Strain and Subcellular Location of the Protein

Genes ◽

10.3390/genes12030451 ◽

2021 ◽

Vol 12 (3) ◽

pp. 451

Author(s):

Pablo Mier ◽

Miguel A. Andrade-Navarro

Keyword(s):

Membrane Proteins ◽

Outer Membrane ◽

Bacterial Species ◽

Outer Membrane Proteins ◽

Subcellular Location ◽

Low Complexity ◽

Extracellular Proteins ◽

Bacterial Strains ◽

Bacterial Proteins ◽

Protein Subcellular Location

Low complexity regions (LCRs) in proteins are characterized by amino acid frequencies that differ from the average. These regions evolve faster and tend to be less conserved between homologs than globular domains. They are not common in bacteria, as compared to their prevalence in eukaryotes. Studying their conservation could help provide hypotheses about their function. To obtain the appropriate evolutionary focus for this rapidly evolving feature, here we study the conservation of LCRs in bacterial strains and compare their high variability to the closeness of the strains. For this, we selected 20 taxonomically diverse bacterial species and obtained the completely sequenced proteomes of two strains per species. We calculated all orthologous pairs for each of the 20 strain pairs. Per orthologous pair, we computed the conservation of two types of LCRs: compositionally biased regions (CBRs) and homorepeats (polyX). Our results show that, in bacteria, Q-rich CBRs are the most conserved, while A-rich CBRs and polyA are the most variable. LCRs have generally higher conservation when comparing pathogenic strains. However, this result depends on protein subcellular location: LCRs accumulate in extracellular and outer membrane proteins, with conservation increased in the extracellular proteins of pathogens, and decreased for polyX in the outer membrane proteins of pathogens. We conclude that these dependencies support the functional importance of LCRs in host–pathogen interactions.

Download Full-text

Cryo-EM structure of amyloid fibrils formed by the entire low complexity domain of TDP-43

Nature Communications ◽

10.1038/s41467-021-21912-y ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Qiuye Li ◽

W. Michael Babinchak ◽

Witold K. Surewicz

Keyword(s):

Amyloid Fibrils ◽

Low Complexity ◽

Structural Features ◽

Protein Fragments ◽

Hydrophobic Residues ◽

Backbone Conformation ◽

Hydrophobic Amino Acids ◽

Insight Into ◽

Lateral Sclerosis

AbstractAmyotrophic lateral sclerosis and several other neurodegenerative diseases are associated with brain deposits of amyloid-like aggregates formed by the C-terminal fragments of TDP-43 that contain the low complexity domain of the protein. Here, we report the cryo-EM structure of amyloid formed from the entire TDP-43 low complexity domain in vitro at pH 4. This structure reveals single protofilament fibrils containing a large (139-residue), tightly packed core. While the C-terminal part of this core region is largely planar and characterized by a small proportion of hydrophobic amino acids, the N-terminal region contains numerous hydrophobic residues and has a non-planar backbone conformation, resulting in rugged surfaces of fibril ends. The structural features found in these fibrils differ from those previously found for fibrils generated from short protein fragments. The present atomic model for TDP-43 LCD fibrils provides insight into potential structural perturbations caused by phosphorylation and disease-related mutations.

Download Full-text

Profiles from structure based sequence alignment of porins can identify stranded integral membrane proteins

Bioinformatics ◽

10.1093/bioinformatics/16.9.839 ◽

2000 ◽

Vol 16 (9) ◽

pp. 839-842 ◽

Cited By ~ 22

Author(s):

T. V. Gnanasekaran ◽

S. Peri ◽

A. Arockiasamy ◽

S. Krishnaswamy

Keyword(s):

Membrane Proteins ◽

Sequence Alignment ◽

Integral Membrane Proteins

Download Full-text

Molecular homology and multiple-sequence alignment: an analysis of concepts and practice

Australian Systematic Botany ◽

10.1071/sb15001 ◽

2015 ◽

Vol 28 (1) ◽

pp. 46 ◽

Cited By ~ 20

Author(s):

David A. Morrison ◽

Matthew J. Morgan ◽

Scot A. Kelchner

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Molecular Data ◽

Simple Relationship ◽

Sequence Alignments ◽

Multiple Sequence ◽

Molecular Change ◽

Nucleotide Homology ◽

Tree Building ◽

Molecular Homology

Sequence alignment is just as much a part of phylogenetics as is tree building, although it is often viewed solely as a necessary tool to construct trees. However, alignment for the purpose of phylogenetic inference is primarily about homology, as it is the procedure that expresses homology relationships among the characters, rather than the historical relationships of the taxa. Molecular homology is rather vaguely defined and understood, despite its importance in the molecular age. Indeed, homology has rarely been evaluated with respect to nucleotide sequence alignments, in spite of the fact that nucleotides are the only data that directly represent genotype. All other molecular data represent phenotype, just as do morphology and anatomy. Thus, efforts to improve sequence alignment for phylogenetic purposes should involve a more refined use of the homology concept at a molecular level. To this end, we present examples of molecular-data levels at which homology might be considered, and arrange them in a hierarchy. The concept that we propose has many levels, which link directly to the developmental and morphological components of homology. Of note, there is no simple relationship between gene homology and nucleotide homology. We also propose terminology with which to better describe and discuss molecular homology at these levels. Our over-arching conceptual framework is then used to shed light on the multitude of automated procedures that have been created for multiple-sequence alignment. Sequence alignment needs to be based on aligning homologous nucleotides, without necessary reference to homology at any other level of the hierarchy. In particular, inference of nucleotide homology involves deriving a plausible scenario for molecular change among the set of sequences. Our clarifications should allow the development of a procedure that specifically addresses homology, which is required when performing alignment for phylogenetic purposes, but which does not yet exist.

Download Full-text