scholarly journals Universal Architectural Concepts Underlying Protein Folding Patterns

2021 ◽  
Vol 7 ◽  
Author(s):  
Arun S. Konagurthu ◽  
Ramanan Subramanian ◽  
Lloyd Allison ◽  
David Abramson ◽  
Peter J. Stuckey ◽  
...  

What is the architectural “basis set” of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures—called concepts—typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence–structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.

2018 ◽  
Author(s):  
Arthur M. Lesk ◽  
Ramanan Subramanian ◽  
Lloyd Allison ◽  
David Abramson ◽  
Peter J. Stuckey ◽  
...  

ABSTRACTWhat is the architectural ‘basis set’ of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a comprehensive dictionary of 1,493 substructural concepts. Each concept represents a topologically-conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the world-wide protein data bank and completely inventoried all concept instances. This yields an unprecedented source of biological insights. These include: correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence–structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click) provides access to and navigation of the entire dictionary of concepts, and all associated information.


2018 ◽  
Author(s):  
Jianfu Zhou ◽  
Alexandra E. Panaitiu ◽  
Gevorg Grigoryan

AbstractThe ability to routinely design functional proteins, in a targeted manner, would have enormous implications for biomedical research and therapeutic development. Computational protein design (CPD) offers the potential to fulfill this need, and though recent years have brought considerable progress in the field, major limitations remain. Current state-of-the-art approaches to CPD aim to capture the determinants of structure from physical principles. While this has led to many successful designs, it does have strong limitations associated with inaccuracies in physical modeling, such that a robust general solution to CPD has yet to be found. Here we propose a fundamentally novel design framework—one based on identifying and applying patterns of sequence-structure compatibility found in known proteins, rather than approximating them from models of inter-atomic interactions. Specifically, we systematically decompose the target structure to be designed into structural building blocks we call TERMs (tertiary motifs) and use rapid structure search against the Protein Data Bank (PDB) to identify sequence patterns associated with each TERM from known protein structures that contain it. These results are then combined to produce a sequence-level pseudo-energy model that can score any sequence for compatibility with the target structure. This model can then be used to extract the optimal-scoring sequence via combinatorial optimization or otherwise sample the sequence space predicted to be well compatible with folding to the target. Here we carry out extensive computational analyses, showing that our method, which we dub dTERMen (design with TERM energies): 1) produces native-like sequences given native crystallographic or NMR backbones, 2) produces sequence-structure compatibility scores that correlate with thermodynamic stability, and 3) is able to predict experimental success of designed sequences generated with other methods, and 4) designs sequences that are found to fold to the desired target by structure prediction more frequently than sequences designed with an atomistic method. As an experimental validation of dTERMen, we perform a total surface redesign of Red Fluorescent Protein mCherry, marking a total of 64 residues as variable. The single sequence identified as optimal by dTERMen harbors 48 mutations relative to mCherry, but nevertheless folds, is monomeric in solution, exhibits similar stability to chemical denaturation as mCherry, and even preserves the fluorescence property. Our results strongly argue that the PDB is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins. This is highly significant, given that the structural database will only continue to grow, and signals the possibility of a whole host of novel data-driven CPD methods. Because such methods are likely to have orthogonal strengths relative to existing techniques, they could represent an important step towards removing remaining barriers to robust CPD.


2022 ◽  
Vol 79 (1) ◽  
Author(s):  
Tamás Hegedűs ◽  
Markus Geisler ◽  
Gergely László Lukács ◽  
Bianka Farkas

AbstractTransmembrane (TM) proteins are major drug targets, but their structure determination, a prerequisite for rational drug design, remains challenging. Recently, the DeepMind’s AlphaFold2 machine learning method greatly expanded the structural coverage of sequences with high accuracy. Since the employed algorithm did not take specific properties of TM proteins into account, the reliability of the generated TM structures should be assessed. Therefore, we quantitatively investigated the quality of structures at genome scales, at the level of ABC protein superfamily folds and for specific membrane proteins (e.g. dimer modeling and stability in molecular dynamics simulations). We tested template-free structure prediction with a challenging TM CASP14 target and several TM protein structures published after AlphaFold2 training. Our results suggest that AlphaFold2 performs well in the case of TM proteins and its neural network is not overfitted. We conclude that cautious applications of AlphaFold2 structural models will advance TM protein-associated studies at an unexpected level.


2014 ◽  
Vol 70 (a1) ◽  
pp. C491-C491
Author(s):  
Jürgen Haas ◽  
Alessandro Barbato ◽  
Tobias Schmidt ◽  
Steven Roth ◽  
Andrew Waterhouse ◽  
...  

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing goal in structural biology. Over the last two decades, a paradigm shift has occurred: starting from a large "knowledge gap" between the huge number of protein sequences compared to a small number of experimentally known structures, today, some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Methods for structure modeling and prediction have made substantial progress of the last decades, and template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. However, computational modeling and prediction techniques often fall short in accuracy compared to high-resolution experimental structures, and it is often difficult to convey the expected accuracy and structural variability of a specific model. Retrospectively assessing the quality of blind structure prediction in comparison to experimental reference structures allows benchmarking the state-of-the-art in structure prediction and identifying areas which need further development. The Critical Assessment of Structure Prediction (CASP) experiment has for the last 20 years assessed the progress in the field of protein structure modeling based on predictions for ca. 100 blind prediction targets per experiment which are carefully evaluated by human experts. The "Continuous Model EvaluatiOn" (CAMEO) project aims to provide a fully automated blind assessment for prediction servers based on weekly pre-released sequences of the Protein Data Bank PDB. CAMEO has been made possible by the development of novel scoring methods such as lDDT, which are robust against domain movements to allow for automated continuous structure comparison without human intervention.


2021 ◽  
Vol 14 (12) ◽  
pp. 1277
Author(s):  
Brennan Overhoff ◽  
Zackary Falls ◽  
William Mangione ◽  
Ram Samudrala

Computational approaches have accelerated novel therapeutic discovery in recent decades. The Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multitarget therapeutic discovery, repurposing, and design aims to improve their efficacy and safety by employing a holistic approach that computes interaction signatures between every drug/compound and a large library of non-redundant protein structures corresponding to the human proteome fold space. These signatures are compared and analyzed to determine if a given drug/compound is efficacious and safe for a given indication/disease. In this study, we used a deep learning-based autoencoder to first reduce the dimensionality of CANDO-computed drug–proteome interaction signatures. We then employed a reduced conditional variational autoencoder to generate novel drug-like compounds when given a target encoded “objective” signature. Using this approach, we designed compounds to recreate the interaction signatures for twenty approved and experimental drugs and showed that 16/20 designed compounds were predicted to be significantly (p-value ≤ 0.05) more behaviorally similar relative to all corresponding controls, and 20/20 were predicted to be more behaviorally similar relative to a random control. We further observed that redesigns of objectives developed via rational drug design performed significantly better than those derived from natural sources (p-value ≤ 0.05), suggesting that the model learned an abstraction of rational drug design. We also show that the designed compounds are structurally diverse and synthetically feasible when compared to their respective objective drugs despite consistently high predicted behavioral similarity. Finally, we generated new designs that enhanced thirteen drugs/compounds associated with non-small cell lung cancer and anti-aging properties using their predicted proteomic interaction signatures. his study represents a significant step forward in automating holistic therapeutic design with machine learning, enabling the rapid generation of novel, effective, and safe drug leads for any indication.


2019 ◽  
Vol 117 (2) ◽  
pp. 1059-1068 ◽  
Author(s):  
Jianfu Zhou ◽  
Alexandra E. Panaitiu ◽  
Gevorg Grigoryan

Current state-of-the-art approaches to computational protein design (CPD) aim to capture the determinants of structure from physical principles. While this has led to many successful designs, it does have strong limitations associated with inaccuracies in physical modeling, such that a reliable general solution to CPD has yet to be found. Here, we propose a design framework—one based on identifying and applying patterns of sequence–structure compatibility found in known proteins, rather than approximating them from models of interatomic interactions. We carry out extensive computational analyses and an experimental validation for our method. Our results strongly argue that the Protein Data Bank is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins. Because our method is likely to have orthogonal strengths relative to existing techniques, it could represent an important step toward removing remaining barriers to robust CPD.


2018 ◽  
Vol 20 (6) ◽  
pp. 2167-2184 ◽  
Author(s):  
Misagh Naderi ◽  
Jeffrey Mitchell Lemoine ◽  
Rajiv Gandhi Govindaraj ◽  
Omar Zade Kana ◽  
Wei Pan Feinstein ◽  
...  

Abstract Interactions between proteins and small molecules are critical for biological functions. These interactions often occur in small cavities within protein structures, known as ligand-binding pockets. Understanding the physicochemical qualities of binding pockets is essential to improve not only our basic knowledge of biological systems, but also drug development procedures. In order to quantify similarities among pockets in terms of their geometries and chemical properties, either bound ligands can be compared to one another or binding sites can be matched directly. Both perspectives routinely take advantage of computational methods including various techniques to represent and compare small molecules as well as local protein structures. In this review, we survey 12 tools widely used to match pockets. These methods are divided into five categories based on the algorithm implemented to construct binding-site alignments. In addition to the comprehensive analysis of their algorithms, test sets and the performance of each method are described. We also discuss general pharmacological applications of computational pocket matching in drug repurposing, polypharmacology and side effects. Reflecting on the importance of these techniques in drug discovery, in the end, we elaborate on the development of more accurate meta-predictors, the incorporation of protein flexibility and the integration of powerful artificial intelligence technologies such as deep learning.


2018 ◽  
pp. 73-89
Author(s):  
Paul M. Selzer ◽  
Richard J. Marhöfer ◽  
Oliver Koch

2021 ◽  
Author(s):  
Brennan Overhoff ◽  
Zackary Falls ◽  
William Mangione ◽  
Ram Samudrala

AbstractComputational approaches have accelerated novel therapeutic discovery in recent decades. The Computational Analysis of Novel Drug Opportunities (CANDO) platform for shotgun multi-target therapeutic discovery, repurposing, and design aims to improve their efficacy and safety by employing a holistic approach by computing interaction signatures between every drug/compound and a large library of non-redundant protein structures corresponding to the human proteome fold space. These signatures are compared and analyzed to determine if a given drug/compound is efficacious and safe for a given indication/disease. In this study, we used a deep learning based autoencoder to first reduce the dimensionality of CANDO computed drug-proteome interaction signatures. We then employed a reduced conditional variational autoencoder to generate novel drug-like compounds when given a target encoded “objective” signature. Using this model, we designed compounds to recreate the interaction signatures for twenty approved and experimental drugs and showed that 16/20 designed compounds are predicted to be significantly (p-value ≤ .05) more behaviorally similar relative to all corresponding controls, and 20/20 are predicted to be more behaviorally similar relative to a random control. We further observed that redesigns of objectives developed via rational drug design perform significantly better than those derived from natural sources (p-value ≤.05), suggesting that the model has learned an abstraction of rational drug design. We also show that designed compounds are structurally diverse and synthetically feasible when compared to their respective objective drugs despite consistently high predicted behavioral similarity. Finally, we generated new designs that enhance thirteen drugs/compounds associated with non-small cell lung cancer and anti-aging properties using their predicted proteomic interaction signatures. This work represents a significant step forward in automating holistic therapeutic design with machine learning, and subsequently offers a reduction in the time needed to generate novel, effective, and safe drug leads for any indication.


Sign in / Sign up

Export Citation Format

Share Document