scholarly journals MPRAnator: a web-based tool for the design of Massively Parallel Reporter Assay experiments.

2015 ◽  
Author(s):  
Ilias Georgakopoulos-Soares ◽  
Naman Jain ◽  
Jesse Gray ◽  
Martin Hemberg

DNA regulatory elements contain short motifs where transcription factors (TFs) can bind to modulate gene expression. Although the broad principles of TF regulation are well understood, the rules that dictate how combinatorial TF binding translates into transcriptional activity remain largely unknown. With the rapid advances in DNA synthesis and sequencing technologies and the continuing decline in the associated costs, high-throughput experiments can be performed to investigate the regulatory role of thousands of oligonucleotide sequences simultaneously. Nevertheless, designing high-throughput reporter assay experiments such as Massively Parallel Reporter Assays (MPRAs) and similar methods remains challenging. We introduce MPRAnator, a set of tools that facilitate rapid design of MPRA experiments. With MPRA Motif design, a set of variables provides fine control of how motifs are placed into sequences therefore allowing the user to investigate the rules that govern TF occupancy. MPRA SNP design can be used to investigate the functional effects of single or combinations of SNPs at regulatory sequences. Finally, the Transmutation tool allows for the design of negative controls by permitting scrambling, reversing, complementing or introducing multiple random mutations in the input sequences or motifs.

2017 ◽  
Author(s):  
Cynthia A. Kalita ◽  
Gregory A. Moyerbrailean ◽  
Christopher Brown ◽  
Xiaoquan Wen ◽  
Francesca Luca ◽  
...  

ABSTRACTMotivationThe majority of the human genome is composed of non-coding regions containing regulatory elements such as enhancers, which are crucial for controlling gene expression. Many variants associated with complex traits are in these regions, and may disrupt gene regulatory sequences. Consequently, it is important to not only identify true enhancers but also to test if a variant within an enhancer affects gene regulation. Recently, allele-specific analysis in high-throughput reporter assays, such as massively parallel reporter assays (MPRA), have been used to functionally validate non-coding variants. However, we are still missing high-quality and robust data analysis tools for these datasets.ResultsWe have further developed our method for allele-specific analysis QuASAR (quantitative allele-specific analysis of reads) to analyze allele-specific signals in barcoded read counts data from MPRA. Using this approach, we can take into account the uncertainty on the original plasmid proportions, over-dispersion, and sequencing errors. The provided allelic skew estimate and its standard error also simplifies meta-analysis of replicate experiments. Additionally, we show that a beta-binomial distribution better models the variability present in the allelic imbalance of these synthetic reporters and results in a test that is statistically well calibrated under the null. Applying this approach to the MPRA data by Tewheyet al.(2016), we found 602 SNPs with significant (FDR 10%) allele-specific regulatory function in LCLs. We also show that we can combine MPRA with QuASAR estimates to validate existing experimental and computational annotations of regulatory variants. Our study shows that with appropriate data analysis tools, we can improve the power to detect allelic effects in high throughput reporter assays.Availabilityhttp://github.com/piquelab/QuASAR/tree/master/[email protected];[email protected]


Author(s):  
David A Siegel ◽  
Olivier Le Tonqueze ◽  
Anne Biton ◽  
Noah Zaitlen ◽  
David J Erle

Abstract AU-rich elements (AREs) are 3′ UTR cis-regulatory elements that regulate the stability of mRNAs. Consensus ARE motifs have been determined, but little is known about how differences in 3′ UTR sequences that conform to these motifs affect their function. Here we use functional annotation of sequences from 3′ UTRs (fast-UTR), a massively parallel reporter assay (MPRA), to investigate the effects of 41,288 3′ UTR sequence fragments from 4,653 transcripts on gene expression and mRNA stability in Jurkat and Beas2B cells. Our analyses demonstrate that the length of an ARE and its registration (the first and last nucleotides of the repeating ARE motif) have significant effects on gene expression and stability. Based on this finding, we propose improved ARE classification and concomitant methods to categorize and predict the effect of AREs on gene expression and stability. Finally, to investigate the advantages of our general experimental design we examine other motifs including constitutive decay elements (CDEs), where we show that the length of the CDE stem-loop has a significant impact on steady-state expression and mRNA stability. We conclude that fast-UTR, in conjunction with our analytical approach, can produce improved yet simple sequence-based rules for predicting the activity of human 3′ UTRs.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 939 ◽  
Author(s):  
David Santiago-Algarra ◽  
Lan T.M. Dao ◽  
Lydie Pradel ◽  
Alexandre España ◽  
Salvatore Spicuglia

The regulation of gene transcription in higher eukaryotes is accomplished through the involvement of transcription start site (TSS)-proximal (promoters) and -distal (enhancers) regulatory elements. It is now well acknowledged that enhancer elements play an essential role during development and cell differentiation, while genetic alterations in these elements are a major cause of human disease. Many strategies have been developed to identify and characterize enhancers. Here, we discuss recent advances in high-throughput approaches to assess enhancer activity, from the well-established massively parallel reporter assays to the recent clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9-based technologies. We highlight how these approaches contribute toward a better understanding of enhancer function, eventually leading to the discovery of new types of regulatory sequences, and how the alteration of enhancers can affect transcriptional regulation.


2016 ◽  
Author(s):  
Fumitaka Inoue ◽  
Martin Kircher ◽  
Beth Martin ◽  
Gregory M. Cooper ◽  
Daniela M. Witten ◽  
...  

AbstractCandidate enhancers can be identified on the basis of chromatin modifications, the binding of chromatin modifiers and transcription factors and cofactors, or chromatin accessibility. However, validating such candidates as bona fide enhancers requires functional characterization, typically achieved through reporter assays that test whether a sequence can drive expression of a transcriptional reporter via a minimal promoter. A longstanding concern is that reporter assays are mainly implemented on episomes, which are thought to lack physiological chromatin. However, the magnitude and determinants of differences incis-regulation for regulatory sequences residing in episomes versus chromosomes remain almost completely unknown. To address this question in a systematic manner, we developed and applied a novel lentivirus-based massively parallel reporter assay (lentiMPRA) to directly compare the functional activities of 2,236 candidate liver enhancers in an episomal versus a chromosomally integrated context. We find that the activities of chromosomally integrated sequences are substantially different from the activities of the identical sequences assayed on episomes, and furthermore are correlated with different subsets of ENCODE annotations. The results of chromosomally-based reporter assays are also more reproducible and more strongly predictable by both ENCODE annotations and sequence-based models. With a linear model that combines chromatin annotations and sequence information, we achieve a Pearson’s R2of 0.347 for predicting the results of chromosomally integrated reporter assays. This level of prediction is better than with either chromatin annotations or sequence information alone and also outperforms predictive models of episomal assays. Our results have broad implications for howcis-regulatory elements are identified, prioritized and functionally validated.


2021 ◽  
Author(s):  
Siqi Zhao ◽  
Clarice Hong ◽  
David M Granas ◽  
Barak A. Cohen

We developed a single-cell massively parallel reporter assay (scMPRA) to measure the activity of libraries of cis-regulatory sequences (CRSs) across multiple cell-types simultaneously. As a proof of concept, we assayed a library of core promoters in a mixture of HEK293 and K562 cells and showed that scMPRA is a reproducible, highly parallel, single-cell reporter gene assay. Our results show that housekeeping promoters and CpG island promoters have lower activity in K562 cells relative to HEK293, which likely reflects developmental differences between the cell lines. Within K562 cells, scMPRA identified a subset of developmental promoters that are upregulated in the CD34+ /CD38 - sub-state, confirming this state as more "stem-like." Finally, we deconvolved the intrinsic and extrinsic components of promoter cell-to-cell variability and found that developmental promoters have a higher proportion of extrinsic noise compared to housekeeping promoters, which may reflect the responsiveness of developmental promoters to the cellular environment. We anticipate scMPRA will be widely applicable for studying the role of CRSs across diverse cell types.


Author(s):  
Diego Calderon ◽  
Andria Ellis ◽  
Riza M. Daza ◽  
Beth Martin ◽  
Jacob M. Tome ◽  
...  

AbstractGene regulation occurs through trans-acting factors (e.g. transcription factors) acting on cis-regulatory elements (e.g. enhancers). Massively parallel reporter assays (MPRAs) functionally survey large numbers of cis-regulatory elements for regulatory potential, but do not identify the trans-acting factors that mediate any observed effects. Here we describe transMPRA — a reporter assay that efficiently combines multiplex CRISPR-mediated perturbation and MPRAs to identify trans-acting factors that modulate the regulatory activity of specific enhancers.


2016 ◽  
Author(s):  
Avanthi Raghavan ◽  
Xiao Wang ◽  
Peter Rogov ◽  
Li Wang ◽  
Xiaolan Zhang ◽  
...  

AbstractGenome-wide association studies have identified a number of novel genetic loci linked to serum cholesterol and triglyceride levels. The causal DNA variants at these loci and the mechanisms by which they influence phenotype and disease risk remain largely unexplored. Expression quantitative trait locus analyses of patient liver and fat biopsies indicate that many lipid-associated variants influence gene expression in a cis-regulatory manner. However, linkage disequilibrium among neighboring SNPs at a genome-wide association study-implicated locus makes it challenging to pinpoint the actual variant underlying an association signal. We used a methodological framework for causal variant discovery that involves high-throughput identification of putative disease-causal loci through a functional reporter-based screen, the massively parallel reporter assay, followed by validation of prioritized variants in genome-edited human pluripotent stem cell models generated with CRISPR-Cas9. We complemented the stem cell models with CRISPR interference experiments in vitro and in knock-in mice in vivo. We provide validation for two high-priority SNPs, rs2277862 and rs10889356, being causal for lipid-associated expression quantitative trait loci. We also highlight the challenges inherent in modeling common genetic variation with these experimental approaches.Author SummaryGenome-wide association studies have identified numerous loci linked to a variety of clinical phenotypes. It remains a challenge to identify and validate the causal DNA variants in these loci. We describe the use of a high-throughput technique called the massively parallel reporter assay to analyze thousands of candidate causal DNA variants for their potential effects on gene expression. We use a combination of genome editing in human pluripotent stem cells, “CRISPR interference” experiments in other cultured human cell lines, and genetically modified mice to analyze the two highest-priority candidate DNA variants to emerge from the massively parallel reporter assay, and we confirm the relevance of the variants to nearby gene expression. These findings highlight a methodological framework with which to identify and functionally validate causal DNA variants.


Author(s):  
Stella C. Yuan ◽  
Eric Malekos ◽  
Melissa T. R. Hawkins

AbstractThe use of museum specimens held in natural history repositories for population and conservation genetic research is increasing in tandem with the use of massively parallel sequencing technologies. Short Tandem Repeats (STRs), or microsatellite loci, are commonly used genetic markers in wildlife and population genetic studies. However, they traditionally suffered from a host of issues including length homoplasy, high costs, low throughput, and difficulties in reproducibility across laboratories. Massively parallel sequencing technologies can address these problems, but the incorporation of museum specimen derived DNA suffers from significant fragmentation and exogenous DNA contamination. Combatting these issues requires extra measures of stringency in the lab and during data analysis, yet there have not been any high-throughput sequencing studies evaluating microsatellite allelic dropout from museum specimen extracted DNA. In this study, we evaluate genotyping errors derived from mammalian museum skin DNA extracts for previously characterized microsatellites across PCR replicates utilizing high-throughput sequencing. We found it useful to classify samples based on DNA concentration, which determined the rate by which genotypes were accurately recovered. Longer microsatellites performed worse in all museum specimens. Allelic dropout rates across loci were dependent on sample quantity, with high concentration museum specimens performing as well and recovering quality metrics nearly as high as the frozen tissue sample. Based on our results, we provide a set of best practices for quality assurance and incorporation of reliable genotypes from museum specimens.


2021 ◽  
Author(s):  
Nicolai von Kuegelgen ◽  
Samantha Mendonsa ◽  
Sayaka Dantsuji ◽  
Maya Ron ◽  
Marieluise Kirchner ◽  
...  

Cells adopt highly polarized shapes and form distinct subcellular compartments largely due to the localization of many mRNAs to specific areas, where they are translated into proteins with local functions. This mRNA localization is mediated by specific cis-regulatory elements in mRNAs, commonly called "zipcodes." Their recognition by RNA-binding proteins (RBPs) leads to the integration of the mRNAs into macromolecular complexes and their localization. While there are hundreds of localized mRNAs, only a few zipcodes have been characterized. Here, we describe a novel neuronal zipcode identification protocol (N-zip) that can identify zipcodes across hundreds of 3'UTRs. This approach combines a method of separating the principal subcellular compartments of neurons - cell bodies and neurites - with a massively parallel reporter assay. Our analysis identifies the let-7 binding site and (AU)n motif as de novo zipcodes in mouse primary cortical neurons and suggests a strategy for detecting many more.


Circulation ◽  
2015 ◽  
Vol 132 (suppl_3) ◽  
Author(s):  
Nathan R Tucker ◽  
Jiangchuan Ye ◽  
Honghuang Lin ◽  
Michael A McLellan ◽  
Emelia J Benjamin ◽  
...  

Introduction: Genome-wide association studies have identified 14 independent loci for atrial fibrillation (AF). The 4q25 locus upstream of the left-right asymmetry gene PITX2 is, by far, the strongest association signal for AF. However, as with most GWAS loci, the functional variants are noncoding, presumed to be regulatory, and remain unknown. We therefore sought to rapidly identify the functional variants at an AF locus by combining high throughput sequencing and massively parallel reporter assays. Methods and Results: We sequenced a ~750kb region encompassing the PITX2 locus in 462 individuals with early-onset AF from the MGH AF Study and 464 referents from the Framingham Heart Study. The SNP most significantly associated with AF in our sequenced sample was rs2129983, which is 140kb from PITX2 (OR=2.43, P =8.9X10 -16 ). rs2129983 is approximately 1.7kb from the most significantly associated SNP in a prior AF GWAS, rs6817105 (r 2 =0.52). From the targeted sequencing analysis, we identified 262 SNVs with a MAF >0.5% within a genomic region bounded by SNPs with an r2 greater than 0.4 with the top variant. To identify functional variants, we then utilized a massively parallel reporter assay (MPRA) in order to measure enhancer activity at each SNP across the entire AF locus. In both HL-1 and C2C12 myoblasts, MPRA identified many distinct SNP regions with differential enhancer activity. Using AF-association status as a standard, we were able to identify a series of variants that have both differential activity in either cell line tested and also a high level of association (rs17042076, rs4469143). Mechanistically, these functional SNPs are predicted to alter transcription factor binding. Conclusions: We have comprehensively identified the AF-associated variation at 4q25 and determined which of these variants are functional through differential enhancer activity. Here, in addition to identifying the causative variation for AF at 4q25, we provide a generalizable pathway for translating this work to other loci, a method that could expedite the identification of causative genetic variants at other disease loci.


Sign in / Sign up

Export Citation Format

Share Document