scholarly journals Understanding how cis -regulatory function is encoded in DNA sequence using massively parallel reporter assays and designed sequences

Genomics ◽  
2015 ◽  
Vol 106 (3) ◽  
pp. 165-170 ◽  
Author(s):  
Michael A. White
2019 ◽  
Vol 40 (9) ◽  
pp. 1299-1313 ◽  
Author(s):  
Anat Kreimer ◽  
Zhongxia Yan ◽  
Nadav Ahituv ◽  
Nir Yosef

2017 ◽  
Author(s):  
Cynthia A. Kalita ◽  
Gregory A. Moyerbrailean ◽  
Christopher Brown ◽  
Xiaoquan Wen ◽  
Francesca Luca ◽  
...  

ABSTRACTMotivationThe majority of the human genome is composed of non-coding regions containing regulatory elements such as enhancers, which are crucial for controlling gene expression. Many variants associated with complex traits are in these regions, and may disrupt gene regulatory sequences. Consequently, it is important to not only identify true enhancers but also to test if a variant within an enhancer affects gene regulation. Recently, allele-specific analysis in high-throughput reporter assays, such as massively parallel reporter assays (MPRA), have been used to functionally validate non-coding variants. However, we are still missing high-quality and robust data analysis tools for these datasets.ResultsWe have further developed our method for allele-specific analysis QuASAR (quantitative allele-specific analysis of reads) to analyze allele-specific signals in barcoded read counts data from MPRA. Using this approach, we can take into account the uncertainty on the original plasmid proportions, over-dispersion, and sequencing errors. The provided allelic skew estimate and its standard error also simplifies meta-analysis of replicate experiments. Additionally, we show that a beta-binomial distribution better models the variability present in the allelic imbalance of these synthetic reporters and results in a test that is statistically well calibrated under the null. Applying this approach to the MPRA data by Tewheyet al.(2016), we found 602 SNPs with significant (FDR 10%) allele-specific regulatory function in LCLs. We also show that we can combine MPRA with QuASAR estimates to validate existing experimental and computational annotations of regulatory variants. Our study shows that with appropriate data analysis tools, we can improve the power to detect allelic effects in high throughput reporter assays.Availabilityhttp://github.com/piquelab/QuASAR/tree/master/[email protected];[email protected]


PLoS ONE ◽  
2019 ◽  
Vol 14 (6) ◽  
pp. e0218073 ◽  
Author(s):  
Rajiv Movva ◽  
Peyton Greenside ◽  
Georgi K. Marinov ◽  
Surag Nair ◽  
Avanti Shrikumar ◽  
...  

2020 ◽  
Vol 44 (7) ◽  
pp. 785-794
Author(s):  
Dandi Qiao ◽  
Corwin M. Zigler ◽  
Michael H. Cho ◽  
Edwin K. Silverman ◽  
Xiaobo Zhou ◽  
...  

2019 ◽  
Vol 15 ◽  
pp. P628-P628
Author(s):  
Karen Nuytemans ◽  
Derek J. van Booven ◽  
Natalia K. Hofmann ◽  
Farid Rajabli ◽  
Anthony J. Griswold ◽  
...  

2018 ◽  
Author(s):  
Rajiv Movva ◽  
Peyton Greenside ◽  
Georgi K. Marinov ◽  
Surag Nair ◽  
Avanti Shrikumar ◽  
...  

AbstractThe relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ~500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.


2019 ◽  
Author(s):  
Daniel Esposito ◽  
Jochen Weile ◽  
Jay Shendure ◽  
Lea M Starita ◽  
Anthony T Papenfuss ◽  
...  

AbstractMultiplex Assays of Variant Effect (MAVEs), such as deep mutational scans and massively parallel reporter assays, test thousands of sequence variants in a single experiment. Despite the importance of MAVE data for basic and clinical research, there is no standard resource for their discovery and distribution. Here we present MaveDB, a public repository for large-scale measurements of sequence variant impact, designed for interoperability with applications to interpret these datasets. We also describe the first of these applications, MaveVis, which retrieves, visualizes, and contextualizes variant effect maps. Together, the database and applications will empower the community to mine these powerful datasets.


2021 ◽  
Author(s):  
Anat Kreimer ◽  
Tal Ashuach ◽  
Fumitaka Inoue ◽  
Alex Khodaverdian ◽  
Nir Yosef ◽  
...  

AbstractGene regulatory elements play a key role in orchestrating gene expression during cellular differentiation, but what determines their function over time remains largely unknown. Here, we performed perturbation-based massively parallel reporter assays at seven early time points of neural differentiation to systematically characterize how regulatory elements and motifs within them guide cellular differentiation. By perturbing over 2,000 putative DNA binding motifs in active regulatory regions, we delineated four categories of functional elements, and observed that activity direction is mostly determined by the sequence itself, while the magnitude of effect depends on the cellular environment. We also find that fine-tuning transcription rates is often achieved by a combined activity of adjacent activating and repressing elements. Our work provides a blueprint for the sequence components needed to induce different transcriptional patterns in general and specifically during neural differentiation.


2017 ◽  
Author(s):  
Joe Paggi ◽  
Andrew Lamb ◽  
Kevin Tian ◽  
Irving Hsu ◽  
Pierre-Louis Cedoz ◽  
...  

AbstractMassively parallel reporter assays (MPRAs) are a method to probe the effects of short sequences on transcriptional regulation activity. In a MPRA, short sequences are extracted from suspected regulatory regions, inserted into reporter plasmids, transfected into cell-types of interest, and the transcriptional activity of each reporter is assayed. Recently, Ernst et al. presented MPRA data covering 15750 putative regulatory regions. We trained a multitask convolutional neural network architecture using these sequence expression readouts which predicts as output the expression level outputs across four combinations of cell types and promoters. The model allows for the assigning of importance scores to each base through in silico mutagenesis, and the resulting importance scores correlated well with regions enriched for conservation and transcription factor binding.


Sign in / Sign up

Export Citation Format

Share Document