scholarly journals Modeling Enhancer-Promoter Interactions with Attention-Based Neural Networks

2017 ◽  
Author(s):  
Weiguang Mao ◽  
Dennis Kostka ◽  
Maria Chikina

AbstractBackgroundGene regulatory sequences play critical roles in ensuring tightly controlled RNA expression patterns that are essential in a large variety of biological processes. Specifically, enhancer sequences drive expression of their target genes, and the availability of genome-wide maps of enhancer-promoter interactions has opened up the possibility to use machine learning approaches to extract and interpret features that define these interactions in different biological contexts.MethodsInspired by machine translation models we develop an attention-based neural network model, EPIANN, to predict enhancer-promoter interactions based on DNA sequences. Codes and data are available at https://github.com/wgmao/EPIANN.ResultsOur approach accurately predicts enhancer-promoter interactions across six cell lines. In addition, our method generates pairwise attention scores at the sequence level, which specify how short regions in the enhancer and promoter pair-up to drive the interaction prediction. This allows us to identify over-represented transcription factors (TF) binding sites and TF-pair interactions in the context of enhancer function.

Genome ◽  
2020 ◽  
pp. 1-23
Author(s):  
Ian C. Tobias ◽  
Luis E. Abatti ◽  
Sakthi D. Moorthy ◽  
Shanelle Mullany ◽  
Tiegh Taylor ◽  
...  

Enhancers are cis-regulatory sequences located distally to target genes. These sequences consolidate developmental and environmental cues to coordinate gene expression in a tissue-specific manner. Enhancer function and tissue specificity depend on the expressed set of transcription factors, which recognize binding sites and recruit cofactors that regulate local chromatin organization and gene transcription. Unlike other genomic elements, enhancers are challenging to identify because they function independently of orientation, are often distant from their promoters, have poorly defined boundaries, and display no reading frame. In addition, there are no defined genetic or epigenetic features that are unambiguously associated with enhancer activity. Over recent years there have been developments in both empirical assays and computational methods for enhancer prediction. We review genome-wide tools, CRISPR advancements, and high-throughput screening approaches that have improved our ability to both observe and manipulate enhancers in vitro at the level of primary genetic sequences, chromatin states, and spatial interactions. We also highlight contemporary animal models and their importance to enhancer validation. Together, these experimental systems and techniques complement one another and broaden our understanding of enhancer function in development, evolution, and disease.


2019 ◽  
Author(s):  
Joanna Mitchelmore ◽  
Nastasiya Grinberg ◽  
Chris Wallace ◽  
Mikhail Spivakov

AbstractIdentifying DNA cis-regulatory modules (CRMs) that control the expression of specific genes is crucial for deciphering the logic of transcriptional control. Natural genetic variation can point to the possible gene regulatory function of specific sequences through their allelic associations with gene expression. However, comprehensive identification of causal regulatory sequences in brute-force association testing without incorporating prior knowledge is challenging due to limited statistical power and effects of linkage disequilibrium. Sequence variants affecting transcription factor (TF) binding at CRMs have a strong potential to influence gene regulatory function, which provides a motivation for prioritising such variants in association testing. Here, we generate an atlas of CRMs showing predicted allelic variation in TF binding affinity in human lymphoblastoid cell lines (LCLs) and test their association with the expression of their putative target genes inferred from Promoter Capture Hi-C and immediate linear proximity. We reveal over 1300 CRM TF-binding variants associated with target gene expression, the majority of them undetected with standard association testing. A large proportion of CRMs showing associations with the expression of genes they contact in 3D localise to the promoter regions of other genes, supporting the notion of ‘epromoters’: dual-action CRMs with promoter and distal enhancer activity.


2019 ◽  
Vol 20 (21) ◽  
pp. 5419 ◽  
Author(s):  
Gao-Feng Zhou ◽  
Li-Ping Zhang ◽  
Bi-Xian Li ◽  
Ou Sheng ◽  
Qing-Jiang Wei ◽  
...  

Long non-coding RNAs (lncRNAs) play important roles in plant growth and stress responses. As a dominant abiotic stress factor in soil, boron (B) deficiency stress has impacted the growth and development of citrus in the red soil region of southern China. In the present work, we performed a genome-wide identification and characterization of lncRNAs in response to B deficiency stress in the leaves of trifoliate orange (Poncirus trifoliata), an important rootstock of citrus. A total of 2101 unique lncRNAs and 24,534 mRNAs were predicted. Quantitative real-time polymerase chain reaction (qRT-PCR) experiments were performed for a total of 16 random mRNAs and lncRNAs to validate their existence and expression patterns. Expression profiling of the leaves of trifoliate orange under B deficiency stress identified 729 up-regulated and 721 down-regulated lncRNAs, and 8419 up-regulated and 8395 down-regulated mRNAs. Further analysis showed that a total of 84 differentially expressed lncRNAs (DELs) were up-regulated and 31 were down-regulated, where the number of up-regulated DELs was 2.71-fold that of down-regulated. A similar trend was also observed in differentially expressed mRNAs (DEMs, 4.21-fold). Functional annotation of these DEMs was performed using Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses, and the results demonstrated an enrichment of the categories of the biosynthesis of secondary metabolites (including phenylpropanoid biosynthesis/lignin biosynthesis), plant hormone signal transduction and the calcium signaling pathway. LncRNA target gene enrichment identified several target genes that were involved in plant hormones, and the expression of lncRNAs and their target genes was significantly influenced. Therefore, our results suggest that lncRNAs can regulate the metabolism and signal transduction of plant hormones, which play an important role in the responses of citrus plants to B deficiency stress. Co-expression network analysis indicated that 468 significantly differentially expressed genes may be potential targets of 90 lncRNAs, and a total of 838 matched lncRNA-mRNA pairs were identified. In summary, our data provides a rich resource of candidate lncRNAs and mRNAs, as well as their related pathways, thereby improving our understanding of the role of lncRNAs in response to B deficiency stress, and in symptom formation caused by B deficiency in the leaves of trifoliate orange.


2013 ◽  
Vol 368 (1632) ◽  
pp. 20130022 ◽  
Author(s):  
Noboru Jo Sakabe ◽  
Marcelo A. Nobrega

The complex expression patterns observed for many genes are often regulated by distal transcription enhancers. Changes in the nucleotide sequences of enhancers may therefore lead to changes in gene expression, representing a central mechanism by which organisms evolve. With the development of the experimental technique of chromatin immunoprecipitation (ChIP), in which discrete regions of the genome bound by specific proteins can be identified, it is now possible to identify transcription factor binding events (putative cis -regulatory elements) in entire genomes. Comparing protein–DNA binding maps allows us, for the first time, to attempt to identify regulatory differences and infer global patterns of change in gene expression across species. Here, we review studies that used genome-wide ChIP to study the evolution of enhancers. The trend is one of high divergence of cis -regulatory elements between species, possibly compensated by extensive creation and loss of regulatory elements and rewiring of their target genes. We speculate on the meaning of the differences observed and discuss that although ChIP experiments identify the biochemical event of protein–DNA interaction, it cannot determine whether the event results in a biological function, and therefore more studies are required to establish the effect of divergence of binding events on species-specific gene expression.


2014 ◽  
Vol 35 (5) ◽  
pp. 770-777 ◽  
Author(s):  
Sharon Schlesinger ◽  
Stephen P. Goff

Retroviruses have evolved complex transcriptional enhancers and promoters that allow their replication in a wide range of tissue and cell types. Embryonic stem (ES) cells, however, characteristically suppress transcription of proviruses formed after infection by exogenous retroviruses and also of most members of the vast array of endogenous retroviruses in the genome. These cells have unusual profiles of transcribed genes and are poised to make rapid changes in those profiles upon induction of differentiation. Many of the transcription factors in ES cells control both host and retroviral genes coordinately, such that retroviral expression patterns can serve as markers of ES cell pluripotency. This overlap is not coincidental; retrovirus-derived regulatory sequences are often used to control cellular genes important for pluripotency. These sequences specify the temporal control and perhaps “noisy” control of cellular genes that direct proper cell gene expression in primitive cells and their differentiating progeny. The evidence suggests that the viral elements have been domesticated for host needs, reflecting the wide-ranging exploitation of any and all available DNA sequences in assembling regulatory networks.


2018 ◽  
Author(s):  
R Spektor ◽  
ND Tippens ◽  
CA Mimoso ◽  
PD Soloway

ABSTRACTChromatin features are characterized by genome-wide assays for nucleosome location, protein binding sites, 3-dimensional interactions, and modifications to histones and DNA. For example, Assay for Transposase Accessible Chromatin sequencing (ATAC-seq) identifies nucleosome-depleted (open) chromatin, which harbors potentially active gene regulatory sequences; and bisulfite sequencing (BS-seq) quantifies DNA methylation. When two distinct chromatin features like these are assayed separately in populations of cells, it is impossible to determine, with certainty, where the features are coincident in the genome by simply overlaying datasets. Here we describe methyl-ATAC-seq (mATAC-seq), which implements modifications to ATAC-seq, including subjecting the output to BS-seq. Merging these assays into a single protocol identifies the locations of open chromatin, and reveals, unambiguously, the DNA methylation state of the underlying DNA. Such combinatorial methods eliminate the need to perform assays independently and infer where features are coincident.


2020 ◽  
Author(s):  
Kathleen Greenham ◽  
Ryan C. Sartor ◽  
Stevan Zorich ◽  
Ping Lou ◽  
Todd C. Mockler ◽  
...  

AbstractAn important challenge of crop improvement strategies is assigning function to paralogs in polyploid crops. Gene expression is one method for determining the activity of paralogs; however, the majority of transcript abundance data represents a static point that does not consider the spatial and temporal dynamics of the transcriptome. Studies in Arabidopsis have estimated up to 90% of the transcriptome to be under diel or circadian control depending on the condition. As a result, time of day effects on the transcriptome have major implications on how we characterize gene activity. In this study, we aimed to resolve the circadian transcriptome in the polyploid crop Brassica rapa and explore the fate of multicopy orthologs of Arabidopsis circadian regulated genes. We performed a high-resolution time course study with 2 h sampling density to capture the genes under circadian control. Strikingly, more than two-thirds of expressed genes exhibited rhythmicity indicative of circadian regulation. To compare the expression patterns of paralogous genes, we developed a program in R called DiPALM (Differential Pattern Analysis by Linear Models) that analyzes time course data to identify transcripts with significant pattern differences. Using DiPALM, we identified genome-wide divergence of expression patterns among retained paralogs. Cross-comparison with a previously generated diel drought experiment in B. rapa revealed evidence for differential drought response for these diverging paralog pairs. Using gene regulatory network models we compared transcription factor targets between B. rapa and Arabidopsis circadian networks to reveal additional evidence for divergence in expression between B. rapa paralogs that may be driven in part by variation in conserved non coding sequences. These findings provide new insight into the rapid expansion and divergence of the transcriptional network in a polyploid crop and offer a new method for assessing paralog activity at the transcript level.SignificanceThe circadian regulation of the transcriptome leads to time of day changes in gene expression that coordinates environmental conditions with physiological responses. Brassica rapa, a morphologically diverse crop species, has undergone whole genome triplication since diverging from Arabidopsis resulting in an expansion of gene copy number. To examine how this expansion has influenced the circadian transcriptome we developed a new method for comparing gene expression patterns. This method facilitated the discovery of genome-wide expansion of expression patterns for genes present in multiple copies and divergence in temporal abiotic stress response. We find support for conserved sequences outside the gene body contributing to these expression pattern differences and ultimately generating new connections in the gene regulatory network.


2020 ◽  
Author(s):  
Maud Fagny ◽  
Marieke Lydia Kuijjer ◽  
Maike Stam ◽  
Johann Joets ◽  
Olivier Turc ◽  
...  

AbstractEnhancers are important regulators of gene expression during numerous crucial processes including tissue differentiation across development. In plants, their recent molecular characterization revealed their capacity to activate the expression of several target genes through the binding of transcription factors. Nevertheless, identifying these target genes at a genome-wide level remains a challenge, in particular in species with large genomes, where enhancers and target genes can be hundreds of kilobases away. Therefore, the contribution of enhancers to regulatory network is still poorly understood in plants. In this study, we investigate the enhancer-driven regulatory network of two maize tissues at different stages: leaves at seedling stage and husks (bracts) at flowering. Using a systems biology approach, we integrate genomic, epigenomic and transcriptomic data to model the regulatory relationship between transcription factors and their potential target genes. We identify regulatory modules specific to husk and V2-IST, and show that they are involved in distinct functions related to the biology of each tissue. We evidence enhancers exhibiting binding sites for two distinct transcription factor families (DOF and AP2/ERF) that drive the tissue-specificity of gene expression in seedling immature leaf and husk. Analysis of the corresponding enhancer sequences reveals that two different transposable element families (TIR transposon Mutator and MITE Pif/Harbinger) have shaped the regulatory network in each tissue, and that MITEs have provided new transcription factor binding sites that are involved in husk tissue-specificity.SignificanceEnhancers play a major role in regulating tissue-specific gene expression in higher eukaryotes, including angiosperms. While molecular characterization of enhancers has improved over the past years, identifying their target genes at the genome-wide scale remains challenging. Here, we integrate genomic, epigenomic and transcriptomic data to decipher the tissue-specific gene regulatory network controlled by enhancers at two different stages of maize leaf development. Using a systems biology approach, we identify transcription factor families regulating gene tissue-specific expression in husk and seedling leaves, and characterize the enhancers likely to be involved. We show that a large part of maize enhancers is derived from transposable elements, which can provide novel transcription factor binding sites crucial to the regulation of tissue-specific biological functions.


BMC Genomics ◽  
2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Haitao Xing ◽  
Yuan Li ◽  
Yun Ren ◽  
Ying Zhao ◽  
Xiaoli Wu ◽  
...  

Abstract Background MicroRNAs (miRNAs) are endogenous, non-coding small functional RNAs that govern the post-transcriptional regulatory system of gene expression and control the growth and development of plants. Ginger is an herb that is well-known for its flavor and medicinal properties. The genes involved in ginger rhizome development and secondary metabolism have been discovered, but the genome-wide identification of miRNAs and their overall expression profiles and targets during ginger rhizome development are largely unknown. In this study, we used BGISEQ-500 technology to perform genome-wide identification of miRNAs from the leaf, stem, root, flower, and rhizome of ginger during three development stages. Results In total, 104 novel miRNAs and 160 conserved miRNAs in 28 miRNA families were identified. A total of 181 putative target genes for novel miRNAs and 2772 putative target genes for conserved miRNAs were predicted. Transcriptional factors were the most abundant target genes of miRNAs, and 17, 9, 8, 4, 13, 8, 3 conserved miRNAs and 5, 7, 4, 5, 5, 15, 9 novel miRNAs showed significant tissue-specific expression patterns in leaf, stem, root, flower, and rhizome. Additionally, 53 miRNAs were regarded as rhizome development-associated miRNAs, which mostly participate in metabolism, signal transduction, transport, and catabolism, suggesting that these miRNAs and their target genes play important roles in the rhizome development of ginger. Twelve candidate miRNA target genes were selected, and then, their credibility was confirmed using qRT-PCR. As the result of qRT-PCR analysis, the expression of 12 candidate target genes showed an opposite pattern after comparison with their miRNAs. The rhizome development system of ginger was observed to be governed by miR156, miR319, miR171a_2, miR164, and miR529, which modulated the expression of the SPL, MYB, GRF, SCL, and NAC genes, respectively. Conclusion This is a deep genome-wide investigation of miRNA and identification of miRNAs involved in rhizome development in ginger. We identified 52 rhizome-related miRNAs and 392 target genes, and this provides an important basis for understanding the molecular mechanisms of the miRNA target genes that mediate rhizome development in ginger.


2020 ◽  
Author(s):  
Ryan Clarke ◽  
Alexander R. Terry ◽  
Hannah Pennington ◽  
Matthew S. MacDougall ◽  
Maureen Regan ◽  
...  

SUMMARYGenetic manipulation of mammalian cells is instrumental to modern biomedical research but is currently limited by poor capabilities of sequentially controlling multiple manipulations in cells. Currently, either highly multiplexed manipulations can be delivered to populations of cells all at one time, or gene regulatory sequences can be engineered to conditionally activate a few manipulations within individual cells. Here, we provide proof-of-principle for a new system enabling multiple genetic manipulations to be executed as a preprogrammed cascade of events. The system leverages the programmability of the S. pyogenes Cas9 RNA-guided nuclease and is based on flexible arrangements of individual modules of activity. The basic module consists of an inactive single guide RNA (sgRNA) - like component that is converted to an active state through the effects of another sgRNA. Modules can be arranged to bring about an algorithmic program of genetic manipulations without the need for engineering cell type specific promoters or gene regulatory sequences. With the expanding diversity of available tools that utilize spCas9 to edit, repress or activate genes, this sgRNA-based system provides multiple levels for interfacing with host cell biology. In addition, ability of the system to progress through multiple modules from episomal plasmid DNA makes it suitable for applications sensitive to the presence of heterologous genomic DNA sequences and broadly applicable to biomedical research and mammalian cell engineering.


Sign in / Sign up

Export Citation Format

Share Document