Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq

eLife ◽

10.7554/elife.43803 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 37

Author(s):

Dylan Kotliar ◽

Adrian Veres ◽

M Aurel Nagy ◽

Shervin Tabrizi ◽

Eran Hodis ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Matrix Factorization ◽

Cell Types ◽

Environmental Cues ◽

Rna Seq ◽

Cell Type ◽

Type Identity ◽

Brain Organoid ◽

Non Negative Matrix Factorization

Identifying gene expression programs underlying both cell-type identity and cellular activities (e.g. life-cycle processes, responses to environmental cues) is crucial for understanding the organization of cells and tissues. Although single-cell RNA-Seq (scRNA-Seq) can quantify transcripts in individual cells, each cell’s expression profile may be a mixture of both types of programs, making them difficult to disentangle. Here, we benchmark and enhance the use of matrix factorization to solve this problem. We show with simulations that a method we call consensus non-negative matrix factorization (cNMF) accurately infers identity and activity programs, including their relative contributions in each cell. To illustrate the insights this approach enables, we apply it to published brain organoid and visual cortex scRNA-Seq datasets; cNMF refines cell types and identifies both expected (e.g. cell cycle and hypoxia) and novel activity programs, including programs that may underlie a neurosecretory phenotype and synaptogenesis.

Download Full-text

Identifying Gene Expression Programs of Cell-type Identity and Cellular Activity with Single-Cell RNA-Seq

10.1101/310599 ◽

2018 ◽

Cited By ~ 7

Author(s):

Dylan Kotliar ◽

Adrian Veres ◽

M. Aurel Nagy ◽

Shervin Tabrizi ◽

Eran Hodis ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Matrix Factorization ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Relative Contribution ◽

Neuronal Synapses ◽

Type Identity ◽

Brain Organoid

AbstractIdentifying gene expression programs underlying both cell-type identity and cellular activities (e.g. life-cycle processes, responses to environmental cues) is crucial for understanding the organization of cells and tissues. Although single-cell RNA-Seq (scRNA-Seq) can quantify transcripts in individual cells, each cell’s expression profile may be a mixture of both types of programs, making them difficult to disentangle. Here we illustrate and enhance the use of matrix factorization as a solution to this problem. We show with simulations that a method that we call consensus non-negative matrix factorization (cNMF) accurately infers identity and activity programs, including the relative contribution of programs in each cell. Applied to published brain organoid and visual cortex scRNA-Seq datasets, cNMF refines the hierarchy of cell types and identifies both expected (e.g. cell cycle and hypoxia) and intriguing novel activity programs. We propose that one of the novel programs may reflect a neurosecretory phenotype and a second may underlie the formation of neuronal synapses. We make cNMF available to the community and illustrate how this approach can provide key insights into gene expression variation within and between cell types.

Download Full-text

MarkerCount: A stable, count-based cell type identifier for single cell RNA-Seq experiments

10.21203/rs.3.rs-418249/v1 ◽

2021 ◽

Author(s):

Hanbyeol Kim ◽

Joongho Lee ◽

Keunsoo Kang ◽

Seokhyun Yoon

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Batch Effect ◽

Expression Level ◽

Rna Seq ◽

Cell Type ◽

Stable Performance ◽

Downstream Analysis

Abstract Cell type identification is a key step to downstream analysis of single cell RNA-seq experiments. Indispensible information for this is gene expression, which is used to cluster cells, train the model and set rejection thresholds. Problem is they are subject to batch effect arising from different platforms and preprocessing. We present MarkerCount, which uses the number of markers expressed regardless of their expression level to initially identify cell types and, then, reassign cell type in cluster-basis. MarkerCount works both in reference and marker-based mode, where the latter utilizes only the existing lists of markers, while the former required pre-annotated dataset to train the model. The performance was evaluated and compared with the existing identifiers, both marker and reference-based, that can be customized with publicly available datasets and marker DB. The results show that MarkerCount provides a stable performance when comparing with other reference-based and marker-based cell type identifiers.

Download Full-text

Author response: Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq

10.7554/elife.43803.044 ◽

2019 ◽

Cited By ~ 1

Author(s):

Dylan Kotliar ◽

Adrian Veres ◽

M Aurel Nagy ◽

Shervin Tabrizi ◽

Eran Hodis ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Author Response ◽

Cellular Activity ◽

Rna Seq ◽

Cell Type ◽

Type Identity

Download Full-text

Comprehensive characterization of tissue-specific chromatin accessibility in L2 Caenorhabditis elegans nematodes

10.1101/2020.09.15.299123 ◽

2020 ◽

Author(s):

Timothy J. Durham ◽

Riza M. Daza ◽

Louis Gevirtzman ◽

Darren A. Cusanovich ◽

William Stafford Noble ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Patterns ◽

Cell Types ◽

Chromatin Accessibility ◽

Gene Expression Patterns ◽

Rna Seq ◽

Cell Type ◽

Tissue Specific ◽

C Elegans

AbstractRecently developed single cell technologies allow researchers to characterize cell states at ever greater resolution and scale. C. elegans is a particularly tractable system for studying development, and recent single cell RNA-seq studies characterized the gene expression patterns for nearly every cell type in the embryo and at the second larval stage (L2). Gene expression patterns are useful for learning about gene function and give insight into the biochemical state of different cell types; however, in order to understand these cell types, we must also determine how these gene expression levels are regulated. We present the first single cell ATAC-seq study in C. elegans. We collected data in L2 larvae to match the available single cell RNA-seq data set, and we identify tissue-specific chromatin accessibility patterns that align well with existing data, including the L2 single cell RNA-seq results. Using a novel implementation of the latent Dirichlet allocation algorithm, we leverage the single-cell resolution of the sci-ATAC-seq data to identify accessible loci at the level of individual cell types, providing new maps of putative cell type-specific gene regulatory sites, with promise for better understanding of cellular differentiation and gene regulation in the worm.

Download Full-text

DSAVE: Detection of misclassified cells in single-cell RNA-Seq data

PLoS ONE ◽

10.1371/journal.pone.0243360 ◽

2020 ◽

Vol 15 (12) ◽

pp. e0243360

Author(s):

Johan Gustafsson ◽

Jonathan Robinson ◽

Juan S. Inda-Díaz ◽

Elias Björnson ◽

Rebecka Jörnsten ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Rna Seq ◽

Cell Type ◽

Log Likelihood ◽

Single Cell Rna Sequencing ◽

Cell Transcriptome ◽

Average Gene ◽

Single Cell Transcriptome

Single-cell RNA sequencing has become a valuable tool for investigating cell types in complex tissues, where clustering of cells enables the identification and comparison of cell populations. Although many studies have sought to develop and compare different clustering approaches, a deeper investigation into the properties of the resulting populations is lacking. Specifically, the presence of misclassified cells can influence downstream analyses, highlighting the need to assess subpopulation purity and to detect such cells. We developed DSAVE (Down-SAmpling based Variation Estimation), a method to evaluate the purity of single-cell transcriptome clusters and to identify misclassified cells. The method utilizes down-sampling to eliminate differences in sampling noise and uses a log-likelihood based metric to help identify misclassified cells. In addition, DSAVE estimates the number of cells needed in a population to achieve a stable average gene expression profile within a certain gene expression range. We show that DSAVE can be used to find potentially misclassified cells that are not detectable by similar tools and reveal the cause of their divergence from the other cells, such as differing cell state or cell type. With the growing use of single-cell RNA-seq, we foresee that DSAVE will be an increasingly useful tool for comparing and purifying subpopulations in single-cell RNA-Seq datasets.

Download Full-text

Single-cell deconvolution of 3,000 post-mortem brain samples for eQTL and GWAS dissection in mental disorders

10.1101/2021.01.21.426000 ◽

2021 ◽

Author(s):

Yongjin Park ◽

Liang He ◽

Jose Davila-Velderrain ◽

Lei Hou ◽

Shahin Mohammadi ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Genetic Variants ◽

Cell Types ◽

Brain Regions ◽

Post Mortem ◽

Rna Seq ◽

Cell Type ◽

Post Mortem Brain ◽

Cell Type Specific

AbstractThousands of genetic variants acting in multiple cell types underlie complex disorders, yet most gene expression studies profile only bulk tissues, making it hard to resolve where genetic and non-genetic contributors act. This is particularly important for psychiatric and neurodegenerative disorders that impact multiple brain cell types with highly-distinct gene expression patterns and proportions. To address this challenge, we develop a new framework, SPLITR, that integrates single-nucleus and bulk RNA-seq data, enabling phenotype-aware deconvolution and correcting for systematic discrepancies between bulk and single-cell data. We deconvolved 3,387 post-mortem brain samples across 1,127 individuals and in multiple brain regions. We find that cell proportion varies across brain regions, individuals, disease status, and genotype, including genetic variants in TMEM106B that impact inhibitory neuron fraction and 4,757 cell-type-specific eQTLs. Our results demonstrate the power of jointly analyzing bulk and single-cell RNA-seq to provide insights into cell-type-specific mechanisms for complex brain disorders.

Download Full-text

The landscape of mouse epididymal cells defined by the single-cell RNA-Seq

10.1101/2020.01.05.895052 ◽

2020 ◽

Author(s):

Jianwu Shi ◽

Mengmeng Sang ◽

Gangcai Xie ◽

Hao Chen

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Types ◽

Principal Cell ◽

Sperm Maturation ◽

Specific Cell ◽

Rna Seq ◽

Cell Type ◽

Gene Expression Signatures ◽

Mouse Epididymis

ABSTRACTSpermatozoa acquire their fertilizing ability and forward motility properties during epididymal transit. Although lots of attempts elucidating the functions of different cell types in epididymis, the composition of epididymal tubal and cell types are still largely unknown. Using single-cell RNA sequence, we analyzed the cell constitutions and their gene expression profiles of adult epididymis derived from caput, corpus and cauda epididymis with a total of 12,597 cells. This allowed us to elucidate the full range of gene expression changes during epididymis and derive region-specific gene expression signatures along the epididymis. A total of 7 cell populations were identified with all known constituent cells of mouse epididymis, as well as two novel cell types. Our analyses revealed a segment to segment variation of the same cell type in the three different part of epididymis and generated a reference dataset of epididymal cell gene expression. Focused analyses uncovered nine subtypes of principal cell. Two subtypes of principal cell, c0.3 and c.6 respectively, in our results supported with previous finding that they mainly located in the caput of mouse epididymis and play important roles during sperm maturation. We also showed unique gene expression signatures of each cell population and key pathways that may concert epididymal epithelial cell-sperm interactions. Overall, our single-cell RNA seq datasets of epididymis provide a comprehensive potential cell types and information-rich resource for the studies of epididymal composition, epididymal microenvironment regulation by the specific cell type, or contraceptive development, as well as a gene expression roadmap to be emulated in efforts to achieve sperm maturation regulation in the epididymis.

Download Full-text

CDSeqR: fast complete deconvolution for gene expression data from bulk tissues

10.1101/2021.01.30.428954 ◽

2021 ◽

Author(s):

Kai Kang ◽

Caizhi David Huang ◽

Yuanyuan Li ◽

David M. Umbach ◽

Leping Li

Keyword(s):

Gene Expression ◽

Single Cell ◽

In Silico ◽

Cell Types ◽

R Package ◽

Biological Tissues ◽

Specific Cell ◽

Rna Seq ◽

Cell Type ◽

Cell Type Specific

AbstractBackgroundBiological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and with a new function to aid interpretation of deconvolution outcomes. The R package would be of interest for the broader R community.ResultWe developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating CDSeq-estimated cell types using publicly available single-cell RNA sequencing (scRNA-seq) data (single-cell data from 20 major organs are included in the R package). This function allows users to readily interpret and visualize the CDSeq-estimated cell types. We carried out additional validations of the CDSeqR software with in silico and in vitro mixtures and with real experimental data including RNA-seq data from the Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) project.ConclusionsThe existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell-cell interactions in the tissue microenvironment. However, bulk level analyses neglect tissue heterogeneity and hinder investigation in a cell-type-specific fashion. The CDSeqR package can be viewed as providing in silico single-cell dissection of bulk measurements. It enables researchers to gain cell-type-specific information from bulk RNA-seq data.

Download Full-text

Integrative single-cell and bulk RNA-seq analysis in human retina identified cell type-specific composition and gene expression changes for age-related macular degeneration

10.1101/768143 ◽

2019 ◽

Author(s):

Yafei Lyu ◽

Randy Zauhar ◽

Nico Dana ◽

Christianne E. Strang ◽

Kui Wang ◽

...

Keyword(s):

Gene Expression ◽

Macular Degeneration ◽

Single Cell ◽

Cell Types ◽

Age Related Macular Degeneration ◽

Peripheral Retina ◽

Rna Seq ◽

Cell Type ◽

Age Related ◽

The Impact

Age-related macular degeneration (AMD) preferentially affects distinct cell types and topographic regions in retina. To characterize the impact of AMD on gene expression changes across retinal cell types and regions, we generated both single-cell RNA-seq (scRNA-seq) and bulk RNA-seq data from macular and peripheral retina in postmortem human donors with and without AMD. The scRNA-seq data revealed 11 major cell types with many previously reported AMD risk genes showing substantial cell type and region specificity. Cell type proportional changes with advancing AMD stage were significant for Müller glia, rods, astrocytes, microglia and endothelium.

Download Full-text

Decision letter: Identifying gene expression programs of cell-type identity and cellular activity with single-cell RNA-Seq

10.7554/elife.43803.043 ◽

2019 ◽

Author(s):

Elisabetta Mereu ◽

Berthold Göttgens

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cellular Activity ◽

Rna Seq ◽

Cell Type ◽

Type Identity

Download Full-text