Bias, robustness and scalability in differential expression analysis of single-cell RNA-seq data

Mapping Intimacies ◽

10.1101/143289 ◽

2017 ◽

Cited By ~ 16

Author(s):

Charlotte Soneson ◽

Mark D. Robinson

Keyword(s):

Single Cell ◽

Differential Expression ◽

Statistical Methods ◽

Expression Analysis ◽

Method Development ◽

Differential Expression Analysis ◽

Data Sets ◽

Rna Seq ◽

Data Set ◽

Extensive Evaluation

AbstractBackgroundAs single-cell RNA-seq (scRNA-seq) is becoming increasingly common, the amount of publicly available data grows rapidly, generating a useful resource for computational method development and extension of published results. Although processed data matrices are typically made available in public repositories, the procedure to obtain these varies widely between data sets, which may complicate reuse and cross-data set comparison. Moreover, while many statistical methods for performing differential expression analysis of scRNA-seq data are becoming available, their relative merits and the performance compared to methods developed for bulk RNA-seq data are not sufficiently well understood.ResultsWe present conquer, a collection of consistently processed, analysis-ready public single-cell RNA-seq data sets. Each data set has count and transcripts per million (TPM) estimates for genes and transcripts, as well as quality control and exploratory analysis reports. We use a subset of the data sets available in conquer to perform an extensive evaluation of the performance and characteristics of statistical methods for differential gene expression analysis, evaluating a total of 30 statistical approaches on both experimental and simulated scRNA-seq data.ConclusionsConsiderable differences are found between the methods in terms of the number and characteristics of the genes that are called differentially expressed. Pre-filtering of lowly expressed genes can have important effects on the results, particularly for some of the methods originally developed for analysis of bulk RNA-seq data. Generally, however, methods developed for bulk RNA-seq analysis do not perform notably worse than those developed specifically for scRNA-seq.

Download Full-text

Two-phase differential expression analysis for single cell RNA-seq

Bioinformatics ◽

10.1093/bioinformatics/bty329 ◽

2018 ◽

Vol 34 (19) ◽

pp. 3340-3348 ◽

Cited By ~ 11

Author(s):

Zhijin Wu ◽

Yi Zhang ◽

Michael L Stitzel ◽

Hao Wu

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Rna Seq ◽

Two Phase

Download Full-text

genesorteR: Feature Ranking in Clustered Single Cell Data

10.1101/676379 ◽

2019 ◽

Cited By ~ 5

Author(s):

Mahmoud M Ibrahim ◽

Rafael Kramann

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Large Cell ◽

R Package ◽

Marker Genes ◽

Data Sets ◽

Cell Type ◽

Cell Data

ABSTRACTMarker genes identified in single cell experiments are expected to be highly specific to a certain cell type and highly expressed in that cell type. Detecting a gene by differential expression analysis does not necessarily satisfy those two conditions and is typically computationally expensive for large cell numbers.Here we present genesorteR, an R package that ranks features in single cell data in a manner consistent with the expected definition of marker genes in experimental biology research. We benchmark genesorteR using various data sets and show that it is distinctly more accurate in large single cell data sets compared to other methods. genesorteR is orders of magnitude faster than current implementations of differential expression analysis methods, can operate on data containing millions of cells and is applicable to both single cell RNA-Seq and single cell ATAC-Seq data.genesorteR is available at https://github.com/mahmoudibrahim/genesorteR.

Download Full-text

A discriminative learning approach to differential expression analysis for single-cell RNA-seq

Nature Methods ◽

10.1038/s41592-018-0303-9 ◽

2019 ◽

Vol 16 (2) ◽

pp. 163-166 ◽

Cited By ~ 26

Author(s):

Vasilis Ntranos ◽

Lynn Yi ◽

Páll Melsted ◽

Lior Pachter

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Discriminative Learning ◽

Learning Approach ◽

Rna Seq

Download Full-text

SwarnSeq: An improved statistical approach for differential expression analysis of single-cell RNA-seq data

Genomics ◽

10.1016/j.ygeno.2021.02.014 ◽

2021 ◽

Vol 113 (3) ◽

pp. 1308-1324

Author(s):

Samarendra Das ◽

Shesh N. Rai

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Statistical Approach ◽

Differential Expression Analysis ◽

Rna Seq

Download Full-text

Valid post-clustering differential analysis for single-cell RNA-Seq

10.1101/463265 ◽

2018 ◽

Cited By ~ 1

Author(s):

Jesse M. Zhang ◽

Govinda M. Kamath ◽

David N. Tse

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

State Of The Art ◽

Differential Expression Analysis ◽

Differential Analysis ◽

Rna Seq ◽

Analysis Framework ◽

Link Type ◽

False Discoveries

SummarySingle-cell computational pipelines involve two critical steps: organizing cells (clustering) and identifying the markers driving this organization (differential expression analysis). State-of-the-art pipelines perform differential analysis after clustering on the same dataset. We observe that because clustering forces separation, reusing the same dataset generates artificially low p-values and hence false discoveries. We introduce a valid post-clustering differential analysis framework which corrects for this problem. We provide software at https://github.com/jessemzhang/tn_test.

Download Full-text

zingeR: unlocking RNA-seq tools for zero-inflation and single cell applications

10.1101/157982 ◽

2017 ◽

Cited By ~ 7

Author(s):

Koen Van den Berge ◽

Charlotte Soneson ◽

Michael I. Love ◽

Mark D. Robinson ◽

Lieven Clement

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Negative Binomial ◽

Differential Expression Analysis ◽

Negative Binomial Model ◽

Binomial Model ◽

Rna Seq ◽

Zero Inflation ◽

Zero Counts

AbstractDropout in single cell RNA-seq (scRNA-seq) applications causes many transcripts to go undetected. It induces excess zero counts, which leads to power issues in differential expression (DE) analysis and has triggered the development of bespoke scRNA-seq DE tools that cope with zero-inflation. Recent evaluations, however, have shown that dedicated scRNA-seq tools provide no advantage compared to traditional bulk RNA-seq tools. We introduce zingeR, a zero-inflated negative binomial model that identifies excess zero counts and generates observation weights to unlock bulk RNA-seq pipelines for zero-inflation, boosting performance in scRNA-seq differential expression analysis.

Download Full-text

Distribution-free complex hypothesis testing for single-cell RNA-seq differential expression analysis

10.1101/2021.05.21.445165 ◽

2021 ◽

Author(s):

Marine Gauthier ◽

Denis Agniel ◽

Rodolphe Thiébaut ◽

Boris P. Hejblum

Keyword(s):

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

State Of The Art ◽

Permutation Test ◽

Differential Expression Analysis ◽

Cumulative Distribution ◽

Rna Seq ◽

Distribution Free ◽

Art Methods

State-of-the-art methods for single-cell RNA-seq (scRNA-seq) Differential Expression Analysis (DEA) often rely on strong distributional assumptions that are difficult to verify in practice. Furthermore, while the increasing complexity of clinical and biological single-cell studies calls for greater tool versatility, the majority of existing methods only tackle the comparison between two conditions. We propose a novel, distribution-free, and flexible approach to DEA for single-cell RNA-seq data. This new method, called ccdf, tests the association of each gene expression with one or many variables of interest (that can be either continuous or discrete), while potentially adjusting for additional covariates. To test such complex hypotheses, ccdf uses a conditional independence test relying on the conditional cumulative distribution function, estimated through multiple regressions. We provide the asymptotic distribution of the ccdf test statistic as well as a permutation test (when the number of observed cells is not sufficiently large). ccdf substantially expands the possibilities for scRNA-seq DEA studies: it obtains good statistical performance in various simulation scenarios considering complex experimental designs i.e. beyond the two condition comparison), while retaining competitive performance with state-of-the-art methods in a two-condition benchmark.

Download Full-text

Individual Level Differential Expression Analysis for Single Cell RNA-seq data

10.1101/2021.05.10.443350 ◽

2021 ◽

Author(s):

Mengqi Zhang ◽

Si Liu ◽

Zhen Miao ◽

Fang Han ◽

Raphael Gottardo ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Bulk Solution ◽

Rna Seq ◽

Cell Level ◽

Individual Level ◽

Level Data

Bulk RNA-seq data quantify the expression of a gene in an individual by one number (e.g., fragment count). In contrast, single cell RNA-seq (scRNA-seq) data provide much richer information: the distribution of gene expression across many cells. To assess differential expression across individuals using scRNA-seq data, a straightforward solution is to create ''pseudo'' bulk RNA-seq data by adding up the fragment counts of a gene across cells for each individual, and then apply methods designed for differential expression using bulk RNA-seq data. This pseudo-bulk solution reduces the distribution of gene expression across cells to a single number and thus loses a good amount of information. We propose to assess differential expression using the gene expression distribution measured by cell level data. We find denoising cell level data can substantially improve the power of this approach. We apply our method, named IDEAS (Individual level Differential Expression Analysis for scRNA-seq), to study the gene expression difference between autism subjects and controls. We find neurogranin-expressing neurons harbor a high proportion of differentially expressed genes, and ERBB signals in microglia are associated with autism.

Download Full-text

Best practices on the differential expression analysis of multi-species RNA-seq

Genome Biology ◽

10.1186/s13059-021-02337-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Matthew Chung ◽

Vincent M. Bruno ◽

David A. Rasko ◽

Christina A. Cuomo ◽

José F. Muñoz ◽

...

Keyword(s):

Best Practices ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Single Species ◽

Rna Seq ◽

Species Analysis ◽

Differential Gene ◽

Multiple Species ◽

Downstream Analysis

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.

Download Full-text

Accounting for technical noise in differential expression analysis of single-cell RNA sequencing data

Nucleic Acids Research ◽

10.1093/nar/gkx754 ◽

2017 ◽

Vol 45 (19) ◽

pp. 10978-10988 ◽

Cited By ~ 26

Author(s):

Cheng Jia ◽

Yu Hu ◽

Derek Kelly ◽

Junhyong Kim ◽

Mingyao Li ◽

...

Keyword(s):

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Sequencing Data ◽

Technical Noise ◽

Single Cell Rna Sequencing

Download Full-text