scholarly journals DeepCLIP: predicting the effect of mutations on protein–RNA binding with deep learning

2020 ◽  
Author(s):  
Alexander Gulliver Bjørnholt Grønning ◽  
Thomas Koed Doktor ◽  
Simon Jonas Larsen ◽  
Ulrika Simone Spangsberg Petersen ◽  
Lise Lolle Holm ◽  
...  

Abstract Nucleotide variants can cause functional changes by altering protein–RNA binding in various ways that are not easy to predict. This can affect processes such as splicing, nuclear shuttling, and stability of the transcript. Therefore, correct modeling of protein–RNA binding is critical when predicting the effects of sequence variations. Many RNA-binding proteins recognize a diverse set of motifs and binding is typically also dependent on the genomic context, making this task particularly challenging. Here, we present DeepCLIP, the first method for context-aware modeling and predicting protein binding to RNA nucleic acids using exclusively sequence data as input. We show that DeepCLIP outperforms existing methods for modeling RNA-protein binding. Importantly, we demonstrate that DeepCLIP predictions correlate with the functional outcomes of nucleotide variants in independent wet lab experiments. Furthermore, we show how DeepCLIP binding profiles can be used in the design of therapeutically relevant antisense oligonucleotides, and to uncover possible position-dependent regulation in a tissue-specific manner. DeepCLIP is freely available as a stand-alone application and as a webtool at http://deepclip.compbio.sdu.dk.

2019 ◽  
Author(s):  
Alexander Gulliver Bjørnholt Grønning ◽  
Thomas Koed Doktor ◽  
Simon Jonas Larsen ◽  
Ulrika Simone Spangsberg Petersen ◽  
Lise Lolle Holm ◽  
...  

ABSTRACTNucleotide variants can cause functional changes by altering protein-RNA binding in various ways that are not easy to predict. This can affect processes such as splicing, nuclear shuttling, and stability of the transcript. Therefore, correct modelling of protein-RNA binding is critical when predicting the effects of sequence variations. Many RNA-binding proteins recognize a diverse set of motifs and binding is typically also dependent on the genomic context, making this task particularly challenging. Here, we present DeepCLIP, the first method for context-aware modeling and predicting protein binding to nucleic acids using exclusively sequence data as input. We show that DeepCLIP outperforms existing methods for modelling RNA-protein binding. Importantly, we demonstrate that DeepCLIP is able to reliably predict the functional effects of contextually dependent nucleotide variants in independent wet lab experiments. Furthermore, we show how DeepCLIP binding profiles can be used in the design of therapeutically relevant antisense oligonucleotides, and to uncover possible position-dependent regulation in a tissue-specific manner. DeepCLIP can be freely used at http://deepclip.compbio.sdu.dk.HighlightsWe have designed DeepCLIP as a simple neural network that requires only CLIP binding sites as input. The architecture and parameter settings of DeepCLIP makes it an efficient classifier and robust to train, making high performing models easy to train and recreate.Using an extensive benchmark dataset, we demonstrate that DeepCLIP outperforms existing tools in classification. Furthermore, DeepCLIP provides direct information about the neural network’s decision process through visualization of binding motifs and a binding profile that directly indicates sequence elements contributing to the classification.To show that DeepCLIP models generalize to different datasets we have demonstrated that predictions correlate with in vivo and in vitro experiments using quantitative binding assays and minigenes.Identifying the binding sites for regulatory RNA-binding proteins is fundamental for efficient design of (therapeutic) antisense oligonucleotides. Employing a reported disease associated mutation, we demonstrate that DeepCLIP can be used for design of therapeutic antisense oligonucleotides that block regions important for binding of regulatory proteins and correct aberrant splicing.Using DeepCLIP binding profiles, we uncovered a possible position-dependent mechanism behind the reported tissue-specificity of a group of TDP-43 repressed pseudoexons.We have made DeepCLIP available as an online tool for training and application of proteinRNA binding deep learning models and prediction of the potential effects of clinically detected sequence variations (http://deepclip.compbio.sdu.dk/). We also provide DeepCLIP as a configurable stand-alone program (http://www.github.com/deepclip).


2020 ◽  
Author(s):  
Qingzhen Hou ◽  
Bas Stringer ◽  
Katharina Waury ◽  
Henriette Capel ◽  
Reza Haydarlou ◽  
...  

AbstractMotivationAntibodies play an important role in clinical research and biotechnology, with their specificity determined by the interaction with the antigen’s epitope region, as a special type of protein-protein interaction (PPI) interface. The ubiquitous availability of sequence data, allows us to predicting epitopes from sequence in order to focus time-consuming wet-lab experiments onto the most promising epitope regions. Here, we extend our previously developed sequence-based predictors for homodimer and heterodimer PPI interfaces to predict epitope residues that have the potential to bind an antibody.ResultsWe collected and curated a high quality epitope dataset from the SAbDaB database. Our generic PPI heterodimer predictor obtained an AUC-ROC of 0.666 when evaluated on the epitope test set. We then trained a random forest model specifically on the epitope dataset, reaching AUC 0.694. Further training on the combined heterodimer and epitope datasets, improves our final predictor to AUC 0.703 on the epitope test set. This is better than the best state-of-the-art sequence-based epitope predictor BepiPred-2.0. On one solved antibody-antigen structure of the COVID19 virus spike RNA binding domain, our predictor reaches AUC 0.778. We added the SeRenDIP-CE Conformational Epitope predictors to our webserver, which is simple to use and only requires a single antigen sequence as input, which will help make the method immediately applicable in a wide range of biomedical and biomolecular research.AvailabilityWebserver, source code and datasets are available at www.ibi.vu.nl/programs/serendipwww/[email protected]


2018 ◽  
Author(s):  
Peter K. Koo ◽  
Praveen Anand ◽  
Steffan B. Paul ◽  
Sean R. Eddy

AbstractTo infer the sequence and RNA structure specificities of RNA-binding proteins (RBPs) from experiments that enrich for bound sequences, we introduce a convolutional residual network which we call ResidualBind. ResidualBind significantly outperforms previous methods on experimental data from many RBP families. We interrogate ResidualBind to identify what features it has learned from high-affinity sequences with saliency analysis along with 1st-order and 2nd-orderin silicomutagenesis. We show that in addition to sequence motifs, ResidualBind learns a model that includes the number of motifs, their spacing, and both positive and negative effects of RNA structure context. Strikingly, ResidualBind learns RNA structure context, including detailed base-pairing relationships, directly from sequence data, which we confirm on synthetic data. ResidualBind is a powerful, flexible, and interpretable model that can uncovercis-recognition preferences across a broad spectrum of RBPs.


2009 ◽  
Vol 20 (8) ◽  
pp. 2265-2275 ◽  
Author(s):  
Zhifa Shen ◽  
Nicolas Paquin ◽  
Amélie Forget ◽  
Pascal Chartrand

The transport and localization of mRNAs results in the asymmetric synthesis of specific proteins. In yeast, the nucleocytoplasmic shuttling protein She2 binds the ASH1 mRNA and targets it for localization at the bud tip by recruiting the She3p–Myo4p complex. Although the cytoplasmic role of She2p in mRNA localization is well characterized, its nuclear function is still unclear. Here, we show that She2p contains a nonclassical nuclear localization signal (NLS) that is essential for its nuclear import via the importin α Srp1p. Exclusion of She2p from the nucleus by mutagenesis of its NLS leads to defective ASH1 mRNA localization and Ash1p sorting. Interestingly, these phenotypes mimic knockouts of LOC1 and PUF6, which encode for nuclear RNA-binding proteins that bind the ASH1 mRNA and control its translation. We find that She2p interacts with both Loc1p and Puf6p and that excluding She2p from the nucleus decreases this interaction. Absence of nuclear She2p disrupts the binding of Loc1p and Puf6p to the ASH1 mRNA, suggesting that nuclear import of She2p is necessary to recruit both factors to the ASH1 transcript. This study reveals that a direct coupling between localization and translation regulation factors in the nucleus is required for proper cytoplasmic localization of mRNAs.


BMC Cancer ◽  
2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Fengxia Chen ◽  
Qingqing Wang ◽  
Yunfeng Zhou

Abstract Background RNA-binding proteins (RBPs) play crucial and multifaceted roles in post-transcriptional regulation. While RBPs dysregulation is involved in tumorigenesis and progression, little is known about the role of RBPs in bladder cancer (BLCA) prognosis. This study aimed to establish a prognostic model based on the prognosis-related RBPs to predict the survival of BLCA patients. Methods We downloaded BLCA RNA sequence data from The Cancer Genome Atlas (TCGA) database and identified RBPs differentially expressed between tumour and normal tissues. Then, functional enrichment analysis of these differentially expressed RBPs was conducted. Independent prognosis-associated RBPs were identified by univariable and multivariable Cox regression analyses to construct a risk score model. Subsequently, Kaplan–Meier and receiver operating characteristic curves were plotted to assess the performance of this prognostic model. Finally, a nomogram was established followed by the validation of its prognostic value and expression of the hub RBPs. Results The 385 differentially expressed RBPs were identified included 218 and 167 upregulated and downregulated RBPs, respectively. The eight independent prognosis-associated RBPs (EFTUD2, GEMIN7, OAS1, APOBEC3H, TRIM71, DARS2, YTHDC1, and RBMS3) were then used to construct a prognostic prediction model. An in-depth analysis showed lower overall survival (OS) in patients in the high-risk subgroup compared to that in patients in the low-risk subgroup according to the prognostic model. The area under the curve of the time-dependent receiver operator characteristic (ROC) curve were 0.795 and 0.669 for the TCGA training and test datasets, respectively, showing a moderate predictive discrimination of the prognostic model. A nomogram was established, which showed a favourable predictive value for the prognosis of BLCA. Conclusions We developed and validated the performance of a prognostic model for BLCA that might facilitate the development of new biomarkers for the prognostic assessment of BLCA patients.


Cells ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 2910
Author(s):  
Ewa A. Grzybowska ◽  
Maciej Wakula

Protein binding to the non-coding regions of mRNAs is relatively well characterized and its functionality has been described in many examples. New results obtained by high-throughput methods indicate that binding to the coding sequence (CDS) by RNA-binding proteins is also quite common, but the functions thereof are more obscure. As described in this review, CDS binding has a role in the regulation of mRNA stability, but it has also a more intriguing role in the regulation of translational efficiency. Global approaches, which suggest the significance of CDS binding along with specific examples of CDS-binding RBPs and their modes of action, are outlined here, pointing to the existence of a relatively less-known regulatory network controlling mRNA stability and translation on yet another level.


2019 ◽  
Author(s):  
Ammar S. Naqvi ◽  
Mukta Asnani ◽  
Kathryn L. Black ◽  
Katharina E. Hayer ◽  
Deanne Taylor ◽  
...  

AbstractCircular RNAs (circRNAs) represent a novel class of non-coding RNAs that are emerging as potentially important regulators of gene expression. circRNAs are typically generated from host gene transcripts through a non-canonical back-splicing mechanism, whose regulation is still not well understood. To explore regulation of circRNAs in cancer, we generated sequence data from RNase R-resistant transcripts in human p493-6 B-lymphoid cells and identified thousands of novel as well as previously identified circRNAs. Approximately 40% of expressed genes generated a circRNA, with half of them generating multiple isoforms, suggesting the involvement of alternative back-splicing and regulatory RNA-binding proteins (RBPs). We observed that genes generating circRNAs with back-spliced exonic junctions were enriched for RBP recognition motifs, including multiple splicing factors, most notably SRSF3, a splicing factor known to promote exon inclusion. To test the role of SRSF3 role in circRNA production, we performed traditional RNA-seq in p493-6 B-lymphoid cells with and without SRSF3 knockdown, and identified 926 mRNA transcripts, whose canonical splicing was affected by SRSF3. We found that a subset (205) of these SRSF3 targets served as host transcripts for circRNA, suggesting that SRSF3 may regulate exon circularization. Since this splicing factor is deregulated in hematologic malignancies, we hypothesize that SRSF3-dependent circRNAs, similar to their mRNA counterparts, might contribute to the pathogenesis of lymphomas and leukemias.


2011 ◽  
Author(s):  
Marco Fondi

Bioinformatics, that is the interdisciplinary field that blends computer science and biostatistics with biological and biomedical sciences, is expected to gain a central role in next feature. Indeed, it has now affected several fields of biology, providing crucial hints for the understanding of biological systems and also allowing a more accurate design of wet lab experiments. In this work, the analysis of sequence data has be used in different fields, such as evolution (e.g. the assembly and evolution of metabolism), infections control (e.g. the horizontal flow of antibiotic resistance), ecology (bacterial bioremediation).


2021 ◽  
Vol 20 ◽  
pp. 153303382110049
Author(s):  
Ming Wang ◽  
Feng Jiang ◽  
Ke Wei ◽  
Jimei Wang ◽  
Guoping Zhou ◽  
...  

Background: Dysregulation of RNA binding proteins (RBPs) has been identified in multiple malignant tumors correlated with tumor progression and occurrence. However, the function of RBPs is not well understood in hepatocellular carcinoma (HCC). Methods: The RNA sequence data of HCC was extracted out of the Cancer Genome Atlas (TCGA) database and different RBPs were calculated between regular and cancerous tissue. The study explored the expression and predictive value of the RBPs systemically with a series of bioinformatic analyzes. Results: A total of 330 RBPs, including 208 up-regulated and 122 down-regulated RBPs, were classified differently. Four RBPs (MRPL54, EZH2, PPARGC1A, EIF2AK4) were defined as the forecast related hub gene and used to construct a model for prediction. Further study showed that the high-risk subgroup is poor survived (OS) compared to the model-based low-risk subgroup. The area of the prognostic model under the time-dependent receiver operator characteristic (ROC) curve is 0.814 in TCGA training group and 0.729 in validation group, indicating a strong prognostic model. We also created a predictive nomogram and a web-based calculator ( https://dxyjiang.shinyapps.io/RBPpredict/ ) based on the 4 RBPs and internal validation in the TCGA cohort, which displayed a beneficial predictive ability for HCC. Conclusions: Our results provide new insights into HCC pathogenesis. The 4-RBP gene signature showed a reliable HCC prediction ability with possible applications in therapeutic decision making and personalized therapy.


Sign in / Sign up

Export Citation Format

Share Document