scholarly journals Single-cell deconvolution of 3,000 post-mortem brain samples for eQTL and GWAS dissection in mental disorders

2021 ◽  
Author(s):  
Yongjin Park ◽  
Liang He ◽  
Jose Davila-Velderrain ◽  
Lei Hou ◽  
Shahin Mohammadi ◽  
...  

AbstractThousands of genetic variants acting in multiple cell types underlie complex disorders, yet most gene expression studies profile only bulk tissues, making it hard to resolve where genetic and non-genetic contributors act. This is particularly important for psychiatric and neurodegenerative disorders that impact multiple brain cell types with highly-distinct gene expression patterns and proportions. To address this challenge, we develop a new framework, SPLITR, that integrates single-nucleus and bulk RNA-seq data, enabling phenotype-aware deconvolution and correcting for systematic discrepancies between bulk and single-cell data. We deconvolved 3,387 post-mortem brain samples across 1,127 individuals and in multiple brain regions. We find that cell proportion varies across brain regions, individuals, disease status, and genotype, including genetic variants in TMEM106B that impact inhibitory neuron fraction and 4,757 cell-type-specific eQTLs. Our results demonstrate the power of jointly analyzing bulk and single-cell RNA-seq to provide insights into cell-type-specific mechanisms for complex brain disorders.

2021 ◽  
Author(s):  
Kai Kang ◽  
Caizhi David Huang ◽  
Yuanyuan Li ◽  
David M. Umbach ◽  
Leping Li

AbstractBackgroundBiological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and with a new function to aid interpretation of deconvolution outcomes. The R package would be of interest for the broader R community.ResultWe developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating CDSeq-estimated cell types using publicly available single-cell RNA sequencing (scRNA-seq) data (single-cell data from 20 major organs are included in the R package). This function allows users to readily interpret and visualize the CDSeq-estimated cell types. We carried out additional validations of the CDSeqR software with in silico and in vitro mixtures and with real experimental data including RNA-seq data from the Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) project.ConclusionsThe existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell-cell interactions in the tissue microenvironment. However, bulk level analyses neglect tissue heterogeneity and hinder investigation in a cell-type-specific fashion. The CDSeqR package can be viewed as providing in silico single-cell dissection of bulk measurements. It enables researchers to gain cell-type-specific information from bulk RNA-seq data.


2019 ◽  
Author(s):  
Nigatu A. Adossa ◽  
Leif Schauser ◽  
Vivi G. Gregersen ◽  
Laura L. Elo

AbstractBackgroundRecent advances in single-cell gene expression profiling technology have revolutionized the understanding of molecular processes underlying developmental cell and tissue differentiation, enabling the discovery of novel cell-types and molecular markers that characterize developmental trajectories. Common approaches for identifying marker genes are based on pairwise statistical testing for differential gene expression between cell-types in heterogeneous cell populations, which is challenging due to unequal sample sizes and variance between groups resulting in little statistical power and inflated type I errors.ResultsWe developed an alternative feature extraction method, Marker gene Identification for Cell-type Identity (MICTI) that encodes the cell-type specific expression information to each gene in every single-cell. This approach identifies features (genes) that are cell-type specific for a given cell-type in heterogeneous cell population. To validate this approach, we used (i) simulated single cell RNA-seq data, (ii) human pancreatic islet single-cell RNA-seq data and (iii) a simulated mixture of human single-cell RNA-seq data related to immune cells, particularly B cells, CD4+ memory cells, CD8+ memory cells, dendritic cells, fibroblast cells, and lymphoblast cells. For all cases, we were able to identify established cell-type-specific markers.ConclusionsOur approach represents a highly efficient and fast method as an alternative to differential expression analysis for molecular marker identification in heterogeneous single-cell RNA-seq data.


2017 ◽  
Author(s):  
Lingxue Zhu ◽  
Jing Lei ◽  
Bernie Devlin ◽  
Kathryn Roeder

Recent advances in technology have enabled the measurement of RNA levels for individual cells. Compared to traditional tissue-level bulk RNA-seq data, single cell sequencing yields valuable insights about gene expression profiles for different cell types, which is potentially critical for understanding many complex human diseases. However, developing quantitative tools for such data remains challenging because of high levels of technical noise, especially the “dropout” events. A “dropout” happens when the RNA for a gene fails to be amplified prior to sequencing, producing a “false” zero in the observed data. In this paper, we propose a Unified RNA-Sequencing Model (URSM) for both single cell and bulk RNA-seq data, formulated as a hierarchical model. URSM borrows the strength from both data sources and carefully models the dropouts in single cell data, leading to a more accurate estimation of cell type specific gene expression profile. In addition, URSM naturally provides inference on the dropout entries in single cell data that need to be imputed for downstream analyses, as well as the mixing proportions of different cell types in bulk samples. We adopt an empirical Bayes approach, where parameters are estimated using the EM algorithm and approximate inference is obtained by Gibbs sampling. Simulation results illustrate that URSM outperforms existing approaches both in correcting for dropouts in single cell data, as well as in deconvolving bulk samples. We also demonstrate an application to gene expression data on fetal brains, where our model successfully imputes the dropout genes and reveals cell type specific expression patterns.


2020 ◽  
Author(s):  
Mohit Goyal ◽  
Guillermo Serrano ◽  
Ilan Shomorony ◽  
Mikel Hernaez ◽  
Idoia Ochoa

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.


2019 ◽  
Author(s):  
Matthew N. Bernstein ◽  
Zhongjie Ma ◽  
Michael Gleicher ◽  
Colin N. Dewey

SummaryCell type annotation is a fundamental task in the analysis of single-cell RNA-sequencing data. In this work, we present CellO, a machine learning-based tool for annotating human RNA-seq data with the Cell Ontology. CellO enables accurate and standardized cell type classification by considering the rich hierarchical structure of known cell types, a source of prior knowledge that is not utilized by existing methods. Furthemore, CellO comes pre-trained on a novel, comprehensive dataset of human, healthy, untreated primary samples in the Sequence Read Archive, which to the best of our knowledge, is the most diverse curated collection of primary cell data to date. CellO’s comprehensive training set enables it to run out-of-the-box on diverse cell types and achieves superior or competitive performance when compared to existing state-of-the-art methods. Lastly, CellO’s linear models are easily interpreted, thereby enabling exploration of cell type-specific expression signatures across the ontology. To this end, we also present the CellO Viewer: a web application for exploring CellO’s models across the ontology.HighlightWe present CellO, a tool for hierarchically classifying cell type from single-cell RNA-seq data against the graph-structured Cell OntologyCellO is pre-trained on a comprehensive dataset comprising nearly all bulk RNA-seq primary cell samples in the Sequence Read ArchiveCellO achieves superior or comparable performance with existing methods while featuring a more comprehensive pre-packaged training setCellO is built with easily interpretable models which we expose through a novel web application, the CellO Viewer, for exploring cell type-specific signatures across the Cell OntologyGraphical Abstract


eLife ◽  
2019 ◽  
Vol 8 ◽  
Author(s):  
Dylan Kotliar ◽  
Adrian Veres ◽  
M Aurel Nagy ◽  
Shervin Tabrizi ◽  
Eran Hodis ◽  
...  

Identifying gene expression programs underlying both cell-type identity and cellular activities (e.g. life-cycle processes, responses to environmental cues) is crucial for understanding the organization of cells and tissues. Although single-cell RNA-Seq (scRNA-Seq) can quantify transcripts in individual cells, each cell’s expression profile may be a mixture of both types of programs, making them difficult to disentangle. Here, we benchmark and enhance the use of matrix factorization to solve this problem. We show with simulations that a method we call consensus non-negative matrix factorization (cNMF) accurately infers identity and activity programs, including their relative contributions in each cell. To illustrate the insights this approach enables, we apply it to published brain organoid and visual cortex scRNA-Seq datasets; cNMF refines cell types and identifies both expected (e.g. cell cycle and hypoxia) and novel activity programs, including programs that may underlie a neurosecretory phenotype and synaptogenesis.


2019 ◽  
Author(s):  
Alexandra Grubman ◽  
Gabriel Chew ◽  
John F. Ouyang ◽  
Guizhi Sun ◽  
Xin Yi Choo ◽  
...  

AbstractAlzheimer’s disease (AD) is a heterogeneous disease that is largely dependent on the complex cellular microenvironment in the brain. This complexity impedes our understanding of how individual cell types contribute to disease progression and outcome. To characterize the molecular and functional cell diversity in the human AD brain we utilized single nuclei RNA- seq in AD and control patient brains in order to map the landscape of cellular heterogeneity in AD. We detail gene expression changes at the level of cells and cell subclusters, highlighting specific cellular contributions to global gene expression patterns between control and Alzheimer’s patient brains. We observed distinct cellular regulation of APOE which was repressed in oligodendrocyte progenitor cells (OPCs) and astrocyte AD subclusters, and highly enriched in a microglial AD subcluster. In addition, oligodendrocyte and microglia AD subclusters show discordant expression of APOE. Integration of transcription factor regulatory modules with downstream GWAS gene targets revealed subcluster-specific control of AD cell fate transitions. For example, this analysis uncovered that astrocyte diversity in AD was under the control of transcription factor EB (TFEB), a master regulator of lysosomal function and which initiated a regulatory cascade containing multiple AD GWAS genes. These results establish functional links between specific cellular sub-populations in AD, and provide new insights into the coordinated control of AD GWAS genes and their cell-type specific contribution to disease susceptibility. Finally, we created an interactive reference web resource which will facilitate brain and AD researchers to explore the molecular architecture of subtype and AD-specific cell identity, molecular and functional diversity at the single cell level.HighlightsWe generated the first human single cell transcriptome in AD patient brainsOur study unveiled 9 clusters of cell-type specific and common gene expression patterns between control and AD brains, including clusters of genes that present properties of different cell types (i.e. astrocytes and oligodendrocytes)Our analyses also uncovered functionally specialized sub-cellular clusters: 5 microglial clusters, 8 astrocyte clusters, 6 neuronal clusters, 6 oligodendrocyte clusters, 4 OPC and 2 endothelial clusters, each enriched for specific ontological gene categoriesOur analyses found manifold AD GWAS genes specifically associated with one cell-type, and sets of AD GWAS genes co-ordinately and differentially regulated between different brain cell-types in AD sub-cellular clustersWe mapped the regulatory landscape driving transcriptional changes in AD brain, and identified transcription factor networks which we predict to control cell fate transitions between control and AD sub-cellular clustersFinally, we provide an interactive web-resource that allows the user to further visualise and interrogate our dataset.Data resource web interface:http://adsn.ddnetbio.com


2018 ◽  
Author(s):  
Xuran Wang ◽  
Jihwan Park ◽  
Katalin Susztak ◽  
Nancy R. Zhang ◽  
Mingyao Li

AbstractWe present MuSiC, a method that utilizes cell-type specific gene expression from single-cell RNA sequencing (RNA-seq) data to characterize cell type compositions from bulk RNA-seq data in complex tissues. When applied to pancreatic islet and whole kidney expression data in human, mouse, and rats, MuSiC outperformed existing methods, especially for tissues with closely related cell types. MuSiC enables characterization of cellular heterogeneity of complex tissues for identification of disease mechanisms.


2021 ◽  
Author(s):  
Hanbyeol Kim ◽  
Joongho Lee ◽  
Keunsoo Kang ◽  
Seokhyun Yoon

Abstract Cell type identification is a key step to downstream analysis of single cell RNA-seq experiments. Indispensible information for this is gene expression, which is used to cluster cells, train the model and set rejection thresholds. Problem is they are subject to batch effect arising from different platforms and preprocessing. We present MarkerCount, which uses the number of markers expressed regardless of their expression level to initially identify cell types and, then, reassign cell type in cluster-basis. MarkerCount works both in reference and marker-based mode, where the latter utilizes only the existing lists of markers, while the former required pre-annotated dataset to train the model. The performance was evaluated and compared with the existing identifiers, both marker and reference-based, that can be customized with publicly available datasets and marker DB. The results show that MarkerCount provides a stable performance when comparing with other reference-based and marker-based cell type identifiers.


2021 ◽  
Author(s):  
Jianbo Li ◽  
Ligang Wang ◽  
Dawei Yu ◽  
Junfeng Hao ◽  
Longchao Zhang ◽  
...  

Thoracolumbar vertebra (TLV) and rib primordium (RP) development is a common evolutionary feature across vertebrates although whole-organism analysis of TLV and RP gene expression dynamics has been lacking. Here we investigated the single-cell transcriptomic landscape of thoracic vertebra (TV), lumbar vertebra (LV), and RP cells from a pig embryo at 27 days post-fertilization (dpf) and identified six cell types with distinct gene-expression signatures. In-depth dissection of the gene-expression dynamics and RNA velocity revealed a coupled process of osteogenesis and angiogenesis during TLV and rib development. Further analysis of cell-type-specific and strand-specific expression uncovered the extremely high levels of HOXA10 3'-UTR sequence specific to osteoblast of LV cells, which may function as anti-HOXA10-antisense by counteracting the HOXA10-antisense effect to determine TLV transition. Thus, this work provides a valuable resource for understanding embryonic osteogenesis and angiogenesis underlying vertebrate TLV and RP development at the cell-type-specific resolution, which serves as a comprehensive view on the transcriptional profile of animal embryo development.


Sign in / Sign up

Export Citation Format

Share Document