Gene expression distribution deconvolution in single-cell RNA sequencing

Single-cell RNA sequencing (scRNA-seq) enables the quantification of each gene’s expression distribution across cells, thus allowing the assessment of the dispersion, nonzero fraction, and other aspects of its distribution beyond the mean. These statistical characterizations of the gene expression distribution are critical for understanding expression variation and for selecting marker genes for population heterogeneity. However, scRNA-seq data are noisy, with each cell typically sequenced at low coverage, thus making it difficult to infer properties of the gene expression distribution from raw counts. Based on a reexamination of nine public datasets, we propose a simple technical noise model for scRNA-seq data with unique molecular identifiers (UMI). We develop deconvolution of single-cell expression distribution (DESCEND), a method that deconvolves the true cross-cell gene expression distribution from observed scRNA-seq counts, leading to improved estimates of properties of the distribution such as dispersion and nonzero fraction. DESCEND can adjust for cell-level covariates such as cell size, cell cycle, and batch effects. DESCEND’s noise model and estimation accuracy are further evaluated through comparisons to RNA FISH data, through data splitting and simulations and through its effectiveness in removing known batch effects. We demonstrate how DESCEND can clarify and improve downstream analyses such as finding differentially expressed genes, identifying cell types, and selecting differentiation markers.

Download Full-text

Gene Expression Distribution Deconvolution in Single Cell RNA Sequencing

10.1101/227033 ◽

2017 ◽

Cited By ~ 6

Author(s):

Jingshu Wang ◽

Mo Huang ◽

Eduardo Torre ◽

Hannah Dueck ◽

Sydney Shaffer ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Population Heterogeneity ◽

Noise Model ◽

Estimation Accuracy ◽

Marker Genes ◽

Batch Effects ◽

Public Data ◽

Single Cell Rna Sequencing

AbstractSingle-cell RNA sequencing (scRNA-seq) enables the quantification of each gene’s expression distribution across cells, thus allowing the assessment of the dispersion, burstiness, and other aspects of its distribution beyond the mean. These statistical characterizations of the gene expression distribution are critical for understanding expression variation and for selecting marker genes for population heterogeneity. However, scRNA-seq data is noisy, with each cell typically sequenced at low coverage, thus making it difficult to infer properties of the gene expression distribution from raw counts. Based on a re-examination of 9 public data sets, we propose a simple technical noise model for scRNA-seq data with Unique Molecular Identifiers (UMI). We develop DESCEND, a method that deconvolves the true cross-cell gene expression distribution from observed scRNA-seq counts, leading to improved estimates of properties of the distribution such as dispersion and burstiness. DESCEND can adjust for cell-level covariates such as cell size, cell cycle and batch effects. DESCEND’s noise model and estimation accuracy are further evaluated through comparisons to RNA FISH data, through data splitting and simulations, and through its effectiveness in removing known batch effects. We demonstrate how DESCEND can clarify and improve downstream analyses such as finding differentially bursty genes, identifying cell types, and selecting differentiation markers.

Download Full-text

Identification of silkworm hemocyte subsets and analysis of their response to BmNPV infection based on single-cell RNA sequencing

10.1101/2020.10.18.344127 ◽

2020 ◽

Author(s):

Min Feng ◽

Junming Xia ◽

Shigang Fei ◽

Xiong Wang ◽

Yaohong Zhou ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Current Knowledge ◽

Early Stage ◽

Expression Profiles ◽

Marker Genes ◽

Potential Marker ◽

Wide Range ◽

Single Cell Rna Sequencing

AbstractA wide range of hemocyte types exist in insects but a full definition of the different subclasses is not yet established. The current knowledge of the classification of silkworm hemocytes mainly comes from morphology rather than specific markers, so our understanding of the detailed classification, hemocyte lineage and functions of silkworm hemocytes is very incomplete. Bombyx mori nucleopolyhedrovirus (BmNPV) is a representative member of the baculoviruses, which are a major pathogens that specifically infects silkworms and cause serious loss in sericulture industry. Here, we performed single-cell RNA sequencing (scRNA-seq) of silkworm hemocytes in BmNPV and mock-infected larvae to comprehensively identify silkworm hemocyte subsets and determined specific molecular and cellular characteristics in each hemocyte subset before and after viral infection. A total of 19 cell clusters and their potential marker genes were identified in silkworm hemocytes. Among these hemocyte clusters, clusters 0, 1, 2, 5 and 9 might be granulocytes (GR); clusters 14 and 17 were predicted as plasmatocytes (PL); cluster 18 was tentatively identified as spherulocytes (SP); and clusters 7 and 11 could possibly correspond to oenocytoids (OE). In addition, all of the hemocyte clusters were infected by BmNPV and some infected cells carried high viral-load in silkworm larvae at 3 day post infection (dpi). Interestingly, BmNPV infection can cause severe and diverse changes in gene expression in hemocytes. Cells belonging to the infection group mainly located at the early stage of the pseudotime trajectories. Furthermore, we found that BmNPV infection suppresses the immune response in the major hemocyte types. In summary, our scRNA-seq analysis revealed the diversity of silkworm hemocytes and provided a rich resource of gene expression profiles for a systems-level understanding of their functions in the uninfected condition and as a response to BmNPV.

Download Full-text

Identification and Removal of Doublets with DoubletDecon

10.1101/2020.04.23.058156 ◽

2020 ◽

Author(s):

Erica A. K. DePasquale ◽

Daniel Schnell ◽

Kashish Chetal ◽

Nathan Salomonisi

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Transitional Cell ◽

Cell Populations ◽

Marker Genes ◽

Unique Gene ◽

Link Type ◽

Single Cell Rna Sequencing

SUMMARYRetention of multiplet captures in single-cell RNA-sequencing (scRNA-seq) data can hinder identification of discrete or transitional cell populations and associated marker genes. To overcome this challenge, we created DoubletDecon to identify and remove doublets, multiplets of two cells, by using a combination of deconvolution to identify putative doublets and analyses of unique gene expression. Here we provide the protocol for running DoubletDecon on scRNA-seq data.For complete details on the use of this protocol, please see DePasquale et al. (2019) (https://doi.org/10.1016/j.celrep.2019.09.082).GRAPHICAL ABSTRACT

Download Full-text

Integrated single‐cell RNA sequencing analyses suggest developmental paths of cancer‐associated fibroblasts with gene expression dynamics

Clinical and Translational Medicine ◽

10.1002/ctm2.487 ◽

2021 ◽

Vol 11 (7) ◽

Author(s):

Hee Chul Chung ◽

Eun Jeong Cho ◽

Hyeonjin Lee ◽

Won‐Kyung Kim ◽

Ji‐Hye Oh ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Cancer Associated Fibroblasts ◽

Single Cell Rna Sequencing ◽

Gene Expression Dynamics

Download Full-text

SCALE: modeling allele-specific gene expression by single-cell RNA sequencing

Genome Biology ◽

10.1186/s13059-017-1200-8 ◽

2017 ◽

Vol 18 (1) ◽

Cited By ~ 32

Author(s):

Yuchao Jiang ◽

Nancy R. Zhang ◽

Mingyao Li

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Specific Gene ◽

Specific Gene Expression ◽

Scale Modeling ◽

Single Cell Rna Sequencing ◽

Allele Specific

Download Full-text

Cryopreservation of microglia enables single-cell RNA sequencing with minimal effects on disease-related gene expression patterns

iScience ◽

10.1016/j.isci.2021.102357 ◽

2021 ◽

Vol 24 (4) ◽

pp. 102357

Author(s):

Brenda Morsey ◽

Meng Niu ◽

Shetty Ravi Dyavar ◽

Courtney V. Fletcher ◽

Benjamin G. Lamberty ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Expression Patterns ◽

Gene Expression Patterns ◽

Related Gene ◽

Single Cell Rna Sequencing ◽

Disease Related Gene

Download Full-text

Single-cell RNA sequencing of the mammalian pineal gland identifies two pinealocyte subtypes and cell type-specific daily patterns of gene expression

PLoS ONE ◽

10.1371/journal.pone.0205883 ◽

2018 ◽

Vol 13 (10) ◽

pp. e0205883 ◽

Cited By ~ 9

Author(s):

Joseph C. Mays ◽

Michael C. Kelly ◽

Steven L. Coon ◽

Lynne Holtzclaw ◽

Martin F. Rath ◽

...

Keyword(s):

Gene Expression ◽

Pineal Gland ◽

Single Cell ◽

Rna Sequencing ◽

Cell Type ◽

Single Cell Rna Sequencing ◽

Cell Type Specific ◽

Mammalian Pineal Gland ◽

Daily Patterns

Download Full-text

Bioinformatics analysis of the gene expression profile of retinal pigmental epithelial cells based in single-cell RNA sequencing in myopic mice

Archives of Medical Science ◽

10.5114/aoms/131835 ◽

2021 ◽

Vol 17 (2) ◽

pp. 574-577

Author(s):

Ya Mo ◽

Mu-Lin He ◽

Jia-Zhen Yu ◽

Xue-Jun Xie

Keyword(s):

Gene Expression ◽

Epithelial Cells ◽

Single Cell ◽

Rna Sequencing ◽

Gene Expression Profile ◽

Expression Profile ◽

Bioinformatics Analysis ◽

Single Cell Rna Sequencing

Download Full-text

Evaluation of single-cell classifiers for single-cell RNA sequencing data sets

Briefings in Bioinformatics ◽

10.1093/bib/bbz096 ◽

2019 ◽

Vol 21 (5) ◽

pp. 1581-1595 ◽

Cited By ~ 6

Author(s):

Xinlei Zhao ◽

Shuang Wu ◽

Nan Fang ◽

Xiao Sun ◽

Jue Fan

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Reference Data ◽

Predictive Accuracy ◽

Cell Types ◽

Superior Performance ◽

Marker Genes ◽

Data Sets ◽

Sequencing Data ◽

Single Cell Rna Sequencing

Abstract Single-cell RNA sequencing (scRNA-seq) has been rapidly developing and widely applied in biological and medical research. Identification of cell types in scRNA-seq data sets is an essential step before in-depth investigations of their functional and pathological roles. However, the conventional workflow based on clustering and marker genes is not scalable for an increasingly large number of scRNA-seq data sets due to complicated procedures and manual annotation. Therefore, a number of tools have been developed recently to predict cell types in new data sets using reference data sets. These methods have not been generally adapted due to a lack of tool benchmarking and user guidance. In this article, we performed a comprehensive and impartial evaluation of nine classification software tools specifically designed for scRNA-seq data sets. Results showed that Seurat based on random forest, SingleR based on correlation analysis and CaSTLe based on XGBoost performed better than others. A simple ensemble voting of all tools can improve the predictive accuracy. Under nonideal situations, such as small-sized and class-imbalanced reference data sets, tools based on cluster-level similarities have superior performance. However, even with the function of assigning ‘unassigned’ labels, it is still challenging to catch novel cell types by solely using any of the single-cell classifiers. This article provides a guideline for researchers to select and apply suitable classification tools in their analysis workflows and sheds some lights on potential direction of future improvement on classification tools.

Download Full-text

Microbial single-cell RNA sequencing by split-pool barcoding

Science ◽

10.1126/science.aba5257 ◽

2020 ◽

Vol 371 (6531) ◽

pp. eaba5257 ◽

Cited By ~ 2

Author(s):

Anna Kuchina ◽

Leandra M. Brettner ◽

Luana Paleologu ◽

Charles M. Roco ◽

Alexander B. Rosenberg ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

High Throughput ◽

Single Cell Analysis ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Growth Stages ◽

High Throughput Analysis ◽

Single Cell Rna Sequencing

Single-cell RNA sequencing (scRNA-seq) has become an essential tool for characterizing gene expression in eukaryotes, but current methods are incompatible with bacteria. Here, we introduce microSPLiT (microbial split-pool ligation transcriptomics), a high-throughput scRNA-seq method for Gram-negative and Gram-positive bacteria that can resolve heterogeneous transcriptional states. We applied microSPLiT to >25,000 Bacillus subtilis cells sampled at different growth stages, creating an atlas of changes in metabolism and lifestyle. We retrieved detailed gene expression profiles associated with known, but rare, states such as competence and prophage induction and also identified unexpected gene expression states, including the heterogeneous activation of a niche metabolic pathway in a subpopulation of cells. MicroSPLiT paves the way to high-throughput analysis of gene expression in bacterial communities that are otherwise not amenable to single-cell analysis, such as natural microbiota.

Download Full-text