FIVEx: an interactive eQTL browser across public datasets

Author(s):  
Alan Kwong ◽  
Andrew P Boughton ◽  
Mukai Wang ◽  
Peter VandeHaar ◽  
Michael Boehnke ◽  
...  

Abstract Summary Expression quantitative trait loci (eQTLs) characterize the associations between genetic variation and gene expression to provide insights into tissue-specific gene regulation. Interactive visualization of tissue-specific eQTLs or splice QTLs (sQTLs) can facilitate our understanding of functional variants relevant to disease-related traits. However, combining the multi-dimensional nature of eQTLs/sQTLs into a concise and informative visualization is challenging. Existing QTL visualization tools provide useful ways to summarize the unprecedented scale of transcriptomic data but are not necessarily tailored to answer questions about the functional interpretations of trait-associated variants or other variants of interest. We developed FIVEx, an interactive eQTL/sQTL browser with an intuitive interface tailored to the functional interpretation of associated variants. It features the ability to navigate seamlessly between different data views while providing relevant tissue- and locus-specific information to offer users a better understanding of population-scale multi-tissue transcriptomic profiles. Our implementation of the FIVEx browser on the EBI eQTL catalogue, encompassing 16 publicly available RNA-seq studies, provides important insights for understanding potential tissue-specific regulatory mechanisms underlying trait-associated signals. Availability and implementation A FIVEx instance visualizing EBI eQTL catalogue data can be found at https://fivex.sph.umich.edu. Its source code is open source under an MIT license at https://github.com/statgen/fivex. Supplementary information Supplementary data are available at Bioinformatics online.

2021 ◽  
Author(s):  
Alan Kwong ◽  
Andrew P. Boughton ◽  
Mukai Wang ◽  
Peter VandeHaar ◽  
Michael Boehnke ◽  
...  

AbstractSummaryExpression quantitative trait loci (eQTLs) characterize the associations between genetic variation and gene expression to provide insights into tissue-specific gene regulation. Interactive visualization of tissue-specific eQTLs can facilitate our understanding of functional variants relevant to disease-related traits. However, combining the multi-dimensional nature of eQTLs into a concise and informative visualization is challenging. Existing eQTL visualization tools provide useful ways to summarize the unprecedented scale of transcriptomic data but are not necessarily tailored to answer questions about the functional interpretations of trait-associated variants or other variants of interest. We developed FIVEx, an interactive eQTL browser with an intuitive interface tailored to the functional interpretation of associated variants. It features the ability to navigate seamlessly between different data views while providing relevant tissue- and locus-specific information to offer users a better understanding of population-scale multi-tissue transcriptomic profiles. Our implementation of the FIVEx browser on the Gene-Tissue Expression (GTEx) dataset provides important insights for understanding potential tissue-specific regulatory mechanisms underlying trait-associated signals.Availability and implementationA FIVEx instance visualizing GTEx v8 data can be found at https://eqtl.pheweb.org. The FIVEx source code is open source under an MIT license at https://github.com/statgen/fivex.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Kai Kang ◽  
Caizhi Huang ◽  
Yuanyuan Li ◽  
David M. Umbach ◽  
Leping Li

Abstract Background Biological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and an added new function to aid cell type annotation. The R package would be of interest for the broader R community. Result We developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating the CDSeq estimated cell types using single-cell RNA sequencing (scRNA-seq) data. This function allows users to readily interpret and visualize the CDSeq estimated cell types. In addition, this new function further allows the users to annotate CDSeq-estimated cell types using marker genes. We carried out additional validations of the CDSeqR software using synthetic, real cell mixtures, and real bulk RNA-seq data from the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) project. Conclusions The existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell–cell interactions in the tissue microenvironment. Bulk level analyses neglect tissue heterogeneity, however, and hinder investigation of a cell-type-specific expression. The CDSeqR package may aid in silico dissection of bulk expression data, enabling researchers to recover cell-type-specific information.


2017 ◽  
Author(s):  
Bo Wang ◽  
Daniele Ramazzotti ◽  
Luca De Sano ◽  
Junjie Zhu ◽  
Emma Pierson ◽  
...  

AbstractMotivationWe here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a cell-to-cell similarity measure from single-cell RNA-seq data. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of cells. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization.Availability and ImplementationSIMLR is available on GitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on [email protected] or [email protected] InformationSupplementary data are available at Bioinformatics online.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 1784
Author(s):  
Shraddha Pai ◽  
Michael J. Apostolides ◽  
Andrew Jung ◽  
Matthew A. Moss

A key challenge in the application of whole-genome sequencing (WGS) for clinical diagnostic and research is the high-throughput prioritization of functional variants in the non-coding genome. This challenge is compounded by context-specific genetic modulation of gene expression, and variant-gene mapping depends on the tissues and organ systems affected in a given disease; for instance, a disease affecting the gastrointestinal system would use maps specific to genome regulation in gut-related tissues. While there are large-scale atlases of genome regulation, such as GTEx and NIH Roadmap Epigenomics, the clinical genetics community lacks publicly-available stand-alone software for high-throughput annotation of custom variant data with user-defined tissue-specific epigenetic maps and clinical genetic databases, to prioritize variants for a specific biomedical application. In this work, we provide a simple software pipeline, called SNPnotes, which takes as input variant calls for a patient and prioritizes those using information on clinical relevance from ClinVar, tissue-specific gene regulation from GTEx and disease associations from the NHGRI-EBI GWAS catalogue. This pipeline was developed as part of SVAI Research's "Undiagnosed-1" event for collaborative patient diagnosis. We applied this pipeline to WGS-based variant calls for an individual with a history of gastrointestinal symptoms, using 12 gut-specific eQTL maps and GWAS associations for metabolic diseases, for variant-gene mapping. Out of 6,248,584 SNPs, the pipeline identified 151 high-priority variants, overlapping 129 genes. These top SNPs all have known clinical pathogenicity, modulate gene expression in gut tissues and have genetic associations with metabolic disorders, and serve as starting points for hypotheses about mechanisms driving clinical symptoms. Simple software changes can be made to customize the pipeline for other tissue-specific applications. Future extensions could integrate maps of tissue-specific regulatory elements, higher-order chromatin loops, and mutations affecting splice variants.


Author(s):  
Adrienne R. Guarnieri ◽  
Sarah R. Anthony ◽  
Anamarie Gozdiff ◽  
Lisa C. Green ◽  
Salma M. Fleifil ◽  
...  

Adipose tissue homeostasis plays a central role in cardiovascular physiology, and the presence of thermogenically active brown adipose tissue (BAT) has recently been associated with cardiometabolic health. We have previously shown that adipose tissue-specific deletion of HuR (Adipo-HuR-/-) reduces BAT-mediated adaptive thermogenesis, and the goal of this work was to identify the cardiovascular impacts of Adipo-HuR-/-. We found that Adipo-HuR-/- mice exhibit a hypercontractile phenotype that is accompanied by increased left ventricle wall thickness and hypertrophic gene expression. Furthermore, hearts from Adipo-HuR-/- display increased fibrosis via picrosirius red staining and periostin expression. To identify underlying mechanisms, we applied both RNA-seq and weighted gene co-expression network analysis (WGCNA) across both cardiac and adipose tissue to define HuR-dependent changes in gene expression as well as significant relationships between adipose tissue gene expression and cardiac fibrosis. RNA-seq results demonstrated a significant increase in pro-inflammatory gene expression in both cardiac and subcutaneous white adipose tissue (scWAT) from Adipo-HuR-/- mice that is accompanied by an increase in serum levels of both TNF-ᵯC; and IL-6. In addition to inflammation-related genes, WGCNA identified a significant enrichment in extracellular vesicle-mediated transport and exosome-associated genes in scWAT whose expression most significantly associated with degree of cardiac fibrosis observed in Adipo-HuR-/- mice, implicating these processes as a likely adipose-to-cardiac paracrine mechanism. These results are significant in that they demonstrate the spontaneous onset of cardiovascular pathology in an adipose tissue-specific gene deletion model and contribute to our understanding of how disruptions in adipose tissue homeostasis may mediate cardiovascular disease.


Genes ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 614
Author(s):  
Xianfeng Chen ◽  
Zhifu Sun

Long non-coding RNAs (lncRNAs) are a large class of gene transcripts that do not code proteins; however, their functions are largely unknown and many new lncRNAs are yet to be discovered. Taking advantage of our previously developed, super-fast, novel lncRNA discovery pipeline, UClncR, and rich resources of GTEx RNA-seq data, we performed systematic novel lincRNA discovery for over 8000 samples across 30 tissue types. We conducted novel detection for each major tissue type first and then consolidated the novel discoveries from all tissue types. These novel lincRNs were profiled and analyzed along with known genes to identify tissue-specific genes in 30 major human tissue types. Thirteen sub-brain regions were also analyzed in a similar manner. Our analysis revealed thousands to tens of thousands of novel lincRNAs for each tissue type. These lincRNAs could define each tissue type’s identity and demonstrated their reliability and tissue-specific expression. Tissue-specific genes were identified for each major tissue type and sub-brain region. The tissue-specific genes clearly defined each respective tissue’s unique function and could be used to expand the interpretation of non-coding SNPs from genome-wide association (GWAS) studies.


2016 ◽  
Author(s):  
T. A. Mansour ◽  
E. Y. Scott ◽  
C. J. Finno ◽  
R. R. Bellone ◽  
M. J. Mienaltowski ◽  
...  

AbstractBackgroundTranscriptome interpretation relies on a good-quality reference transcriptome for accurate quantification of gene expression as well as functional analysis of genetic variants. The current annotation of the horse genome lacks the specificity and sensitivity necessary to assess gene expression especially at the isoform level, and suffers from insufficient annotation of untranslated regions (UTR). We built an annotation pipeline for horse and used it to integrate 1.9 billion reads from multiple RNA-seq data sets into a new refined transcriptome.ResultsThis equine transcriptome integrates eight different tissues from 59 individuals and improves gene structure and isoform resolution while providing considerable tissue-specific information. We utilized four levels of transcript filtration in our pipeline, aimed at producing several transcriptome versions that are suitable for different downstream analyses. Our most refined transcriptome includes 36,876 genes and 76,125 isoforms, with 6474 candidate transcriptional loci novel to the equine transcriptome.ConclusionsWe have employed a variety of descriptive statistics and figures that demonstrate the quality and content of the transcriptome. The equine transcriptomes that are provided by this pipeline show the best tissue-specific resolution of any equine transcriptome to date and can serve several types of downstream analyses.


2017 ◽  
Author(s):  
Rachel Kaletsky ◽  
Vicky Yao ◽  
April Williams ◽  
Alexi M. Runnels ◽  
Sean B. King ◽  
...  

AbstractThe biology and behavior of adults differ substantially from those of developing animals, and cell-specific information is critical for deciphering the biology of multicellular animals. Thus, adult tissue-specific transcriptomic data are critical for understanding molecular mechanisms that control their phenotypes. We used adult cell-specific isolation to identify the transcriptomes of C. elegans’ four major tissues (or “tissue-ome”), identifying ubiquitously expressed and tissue-specific “super-enriched” genes. These data newly reveal the hypodermis’ metabolic character, suggest potential worm-human tissue orthologies, and identify tissue-specific changes in the Insulin/IGF-1 signaling pathway. Tissue-specific alternative splicing analysis identified a large set of collagen isoforms and a neuron-specific CREB isoform. Finally, we developed a machine learning-based prediction tool for 70 sub-tissue cell types, which we used to predict cellular expression differences in IIS/FOXO signaling, stage-specific TGF-b activity, and basal vs. memory-induced CREB transcription. Together, these data provide a rich resource for understanding the biology governing multicellular adult animals


2014 ◽  
Author(s):  
Emma Pierson ◽  
GTEx Consortium ◽  
Daphne Koller ◽  
Alexis Battle ◽  
Sara Mostafavi

To understand the regulation of tissue-specific gene expression, the GTEx Consortium generated RNA-seq expression data for more than thirty distinct human tissues. This data provides an opportunity for deriving shared and tissue-specific gene regulatory networks on the basis of co-expression between genes. However, a small number of samples are available for a majority of the tissues, and therefore statistical inference of networks in this setting is highly underpowered. To address this problem, we infer tissue-specific gene co-expression networks for 35 tissues in the GTEx dataset using a novel algorithm, GNAT, that uses a hierarchy of tissues to share data between related tissues. We show that this transfer learning approach increases the accuracy with which networks are learned. Analysis of these networks reveals that tissue-specific transcription factors are hubs that preferentially connect to genes with tissue-specific functions. Additionally, we observe that genes with tissue-specific functions lie at the peripheries of our networks. We identify numerous modules enriched for Gene Ontology functions, and show that modules conserved across tissues are especially likely to have functions common to all tissues, while modules that are upregulated in a particular tissue are often instrumental to tissue-specific function. Finally, we provide a web tool, available at mostafavilab.stat.ubc.ca/GNAT, which allows exploration of gene function and regulation in a tissue-specific manner.


Sign in / Sign up

Export Citation Format

Share Document