A framework for RNA quality correction in differential expression analysis

AbstractRNA sequencing (RNA-seq) is a powerful approach for measuring gene expression levels in cells and tissues, but it relies on high-quality RNA. We demonstrate here that statistical adjustment employing existing quality measures largely fails to remove the effects of RNA degradation when RNA quality associates with the outcome of interest. Using RNA-seq data from a molecular degradation experiment of human brain tissue, we introduce the quality surrogate variable (qSVA) analysis framework for estimating and removing the confounding effect of RNA quality in differential expression analysis. We show this approach results in greatly improved replication rates (>3x) across two large independent postmortem human brain studies of schizophrenia. Finally, we explored public datasets to demonstrate potential RNA quality confounding when comparing expression levels of different brain regions and diagnostic groups beyond schizophrenia. Our approach can therefore improve the interpretation of differential expression analysis of transcriptomic data from the human brain.

Download Full-text

qSVA framework for RNA quality correction in differential expression analysis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1617384114 ◽

2017 ◽

Vol 114 (27) ◽

pp. 7130-7135 ◽

Cited By ~ 41

Author(s):

Andrew E. Jaffe ◽

Ran Tao ◽

Alexis L. Norris ◽

Marc Kealhofer ◽

Abhinav Nellore ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Brain Regions ◽

Rna Degradation ◽

Rna Seq ◽

Postmortem Human Brain ◽

Surrogate Variable Analysis ◽

Expression Levels ◽

Rna Quality

RNA sequencing (RNA-seq) is a powerful approach for measuring gene expression levels in cells and tissues, but it relies on high-quality RNA. We demonstrate here that statistical adjustment using existing quality measures largely fails to remove the effects of RNA degradation when RNA quality associates with the outcome of interest. Using RNA-seq data from molecular degradation experiments of human primary tissues, we introduce a method—quality surrogate variable analysis (qSVA)—as a framework for estimating and removing the confounding effect of RNA quality in differential expression analysis. We show that this approach results in greatly improved replication rates (>3×) across two large independent postmortem human brain studies of schizophrenia and also removes potential RNA quality biases in earlier published work that compared expression levels of different brain regions and other diagnostic groups. Our approach can therefore improve the interpretation of differential expression analysis of transcriptomic data from human tissue.

Download Full-text

Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads

10.1101/016196 ◽

2015 ◽

Author(s):

Hung-I Harry Chen ◽

Yuanhang Liu ◽

Yi Zou ◽

Zhao Lai ◽

Devanand Sarkar ◽

...

Keyword(s):

Differential Expression ◽

Rna Sequencing ◽

Expression Analysis ◽

Background Noise ◽

Negative Binomial ◽

Differential Expression Analysis ◽

Rna Seq ◽

Sequencing Data ◽

Expression Levels ◽

Differential Expressed Genes

Background RNA sequencing (RNA-seq) is a powerful tool for genome-wide expression profiling of biological samples with the advantage of high-throughput and high resolution. There are many existing algorithms nowadays for quantifying expression levels and detecting differential gene expression, but none of them takes the misaligned reads that are mapped to non-exonic regions into account. We developed a novel algorithm, XBSeq, where a statistical model was established based on the assumption that observed signals are the convolution of true expression signals and sequencing noises. The mapped reads in non-exonic regions are considered as sequencing noises, which follows a Poisson distribution. Given measureable observed and noise signals from RNA-seq data, true expression signals, assuming governed by the negative binomial distribution, can be delineated and thus the accurate detection of differential expressed genes. Results We implemented our novel XBSeq algorithm and evaluated it by using a set of simulated expression datasets under different conditions, using a combination of negative binomial and Poisson distributions with parameters derived from real RNA-seq data. We compared the performance of our method with other commonly used differential expression analysis algorithms. We also evaluated the changes in true and false positive rates with variations in biological replicates, differential fold changes, and expression levels in non-exonic regions. We also tested the algorithm on a set of real RNA-seq data where the common and different detection results from different algorithms were reported. Conclusions In this paper, we proposed a novel XBSeq, a differential expression analysis algorithm for RNA-seq data that takes non-exonic mapped reads into consideration. When background noise is at baseline level, the performance of XBSeq and DESeq are mostly equivalent. However, our method surpasses DESeq and other algorithms with the increase of non-exonic mapped reads. Only in very low read count condition XBSeq had a slightly higher false discovery rate, which may be improved by adjusting the background noise effect in this situation. Taken together, by considering non-exonic mapped reads, XBSeq can provide accurate expression measurement and thus detect differential expressed genes even in noisy conditions.

Download Full-text

Best practices on the differential expression analysis of multi-species RNA-seq

Genome Biology ◽

10.1186/s13059-021-02337-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Matthew Chung ◽

Vincent M. Bruno ◽

David A. Rasko ◽

Christina A. Cuomo ◽

José F. Muñoz ◽

...

Keyword(s):

Best Practices ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Single Species ◽

Rna Seq ◽

Species Analysis ◽

Differential Gene ◽

Multiple Species ◽

Downstream Analysis

AbstractAdvances in transcriptome sequencing allow for simultaneous interrogation of differentially expressed genes from multiple species originating from a single RNA sample, termed dual or multi-species transcriptomics. Compared to single-species differential expression analysis, the design of multi-species differential expression experiments must account for the relative abundances of each organism of interest within the sample, often requiring enrichment methods and yielding differences in total read counts across samples. The analysis of multi-species transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines. We describe best practices for multi-species transcriptomics and differential gene expression.

Download Full-text

Characterization of the nuclear and cytosolic transcriptomes in human brain tissue reveals new insights into the subcellular distribution of RNA transcripts

Scientific Reports ◽

10.1038/s41598-021-83541-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ammar Zaghlool ◽

Adnan Niazi ◽

Åsa K. Björklund ◽

Jakub Orzechowski Westholm ◽

Adam Ameur ◽

...

Keyword(s):

Human Brain ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Adult Brain ◽

Sequencing Data ◽

Human Brain Tissue ◽

Protein Coding ◽

Rna Transcripts ◽

Nuclear Rna ◽

The Impact

AbstractTranscriptome analysis has mainly relied on analyzing RNA sequencing data from whole cells, overlooking the impact of subcellular RNA localization and its influence on our understanding of gene function, and interpretation of gene expression signatures in cells. Here, we separated cytosolic and nuclear RNA from human fetal and adult brain samples and performed a comprehensive analysis of cytosolic and nuclear transcriptomes. There are significant differences in RNA expression for protein-coding and lncRNA genes between cytosol and nucleus. We show that transcripts encoding the nuclear-encoded mitochondrial proteins are significantly enriched in the cytosol compared to the rest of protein-coding genes. Differential expression analysis between fetal and adult frontal cortex show that results obtained from the cytosolic RNA differ from results using nuclear RNA both at the level of transcript types and the number of differentially expressed genes. Our data provide a resource for the subcellular localization of thousands of RNA transcripts in the human brain and highlight differences in using the cytosolic or the nuclear transcriptomes for expression analysis.

Download Full-text

The long and the short of it: unlocking nanopore long-read RNA sequencing data with short-read differential expression analysis tools

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab028 ◽

2021 ◽

Vol 3 (2) ◽

Author(s):

Xueyi Dong ◽

Luyi Tian ◽

Quentin Gouil ◽

Hasaru Kariyawasam ◽

Shian Su ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Transcriptomic Analysis ◽

Statistical Testing ◽

Rna Seq ◽

Sequencing Data ◽

Short Read ◽

Sequencing Platform ◽

Long Read

Abstract Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.

Download Full-text

Survey of Methods Used for Differential Expression Analysis on RNA Seq Data

Learning and Analytics in Intelligent Systems - Biologically Inspired Techniques in Many-Criteria Decision Making ◽

10.1007/978-3-030-39033-4_21 ◽

2020 ◽

pp. 226-239

Author(s):

Reema Joshi ◽

Rosy Sarmah

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Rna Seq

Download Full-text

Alignment and mapping methodology influence transcript abundance estimation

10.1101/657874 ◽

2019 ◽

Cited By ~ 6

Author(s):

Avi Srivastava ◽

Laraib Malik ◽

Hirak Sarkar ◽

Mohsen Zakeri ◽

Fatemeh Almodaresi ◽

...

Keyword(s):

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Computational Cost ◽

Simulated Data ◽

Transcript Abundance ◽

Mapping Method ◽

Rna Seq ◽

Transcript Quantification ◽

Quantification Model

AbstractBackgroundThe accuracy of transcript quantification using RNA-seq data depends on many factors, such as the choice of alignment or mapping method and the quantification model being adopted. While the choice of quantification model has been shown to be important, considerably less attention has been given to comparing the effect of various read alignment approaches on quantification accuracy.ResultsWe investigate the influence of mapping and alignment on the accuracy of transcript quantification in both simulated and experimental data, as well as the effect on subsequent differential expression analysis. We observe that, even when the quantification model itself is held fixed, the effect of choosing a different alignment methodology, or aligning reads using different parameters, on quantification estimates can sometimes be large, and can affect downstream differential expression analyses as well. These effects can go unnoticed when assessment is focused too heavily on simulated data, where the alignment task is often simpler than in experimentally-acquired samples. We also introduce a new alignment methodology, called selective alignment, to overcome the shortcomings of lightweight approaches without incurring the computational cost of traditional alignment.ConclusionWe observe that, on experimental datasets, the performance of lightweight mapping and alignment-based approaches varies significantly and highlight some of the underlying factors. We show this variation both in terms of quantification and downstream differential expression analysis. In all comparisons, we also show the improved performance of our proposed selective alignment method and suggest best practices for performing RNA-seq quantification.

Download Full-text

Characterization of the nuclear and cytosolic transcriptomes in human brain tissue reveals new insights into the subcellular distribution of RNA transcripts

10.1101/2020.04.08.031419 ◽

2020 ◽

Author(s):

Ammar Zaghlool ◽

Adnan Niazi ◽

Åsa K. Björklund ◽

Jakub Orzechowski Westholm ◽

Adam Ameur ◽

...

Keyword(s):

Gene Expression ◽

Subcellular Localization ◽

Human Brain ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Adult Brain ◽

Sequencing Data ◽

Human Brain Tissue ◽

Protein Coding ◽

The Impact

AbstractTranscriptome analysis has mainly relied on analyzing RNA sequencing data from whole cells, overlooking the impact of subcellular RNA localization and its influence on our understanding of gene function, and interpretation of gene expression signatures in cells. Here, we performed a comprehensive analysis of cytosolic and nuclear transcriptomes in human fetal and adult brain samples. We show significant differences in RNA expression for protein-coding and lncRNA genes between cytosol and nucleus. Transcripts displaying differential subcellular localization belong to particular functional categories and display tissue-specific localization patterns. We also show that transcripts encoding the nuclear-encoded mitochondrial proteins are significantly enriched in the cytosol compared to the rest of protein-coding genes. Further investigation of the use of the cytosolic or the nuclear transcriptome for differential gene expression analysis indicates important differences in results depending on the cellular compartment. These differences were manifested at the level of transcript types and the number of differentially expressed genes. Our data provide a resource of RNA subcellular localization in the human brain and highlight differences in using the cytosolic or the nuclear transcriptomes for differential expression analysis.

Download Full-text