scholarly journals Identification of cell-type-specific marker genes from co-expression patterns in tissue samples

Author(s):  
Yixuan Qiu ◽  
Jiebiao Wang ◽  
Jing Lei ◽  
Kathryn Roeder

Abstract Motivation Marker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. Results To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. Availability and implementation We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Author(s):  
Yixuan Qiu ◽  
Jiebiao Wang ◽  
Jing Lei ◽  
Kathryn Roeder

AbstractMotivationMarker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern.ResultsTo capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list.Availability and implementationWe implement this method as an R package markerpen, hosted on https://github.com/yixuan/[email protected]


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254194
Author(s):  
Hong-Tae Park ◽  
Woo Bin Park ◽  
Suji Kim ◽  
Jong-Sung Lim ◽  
Gyoungju Nah ◽  
...  

Mycobacterium avium subsp. paratuberculosis (MAP) is a causative agent of Johne’s disease, which is a chronic and debilitating disease in ruminants. MAP is also considered to be a possible cause of Crohn’s disease in humans. However, few studies have focused on the interactions between MAP and human macrophages to elucidate the pathogenesis of Crohn’s disease. We sought to determine the initial responses of human THP-1 cells against MAP infection using single-cell RNA-seq analysis. Clustering analysis showed that THP-1 cells were divided into seven different clusters in response to phorbol-12-myristate-13-acetate (PMA) treatment. The characteristics of each cluster were investigated by identifying cluster-specific marker genes. From the results, we found that classically differentiated cells express CD14, CD36, and TLR2, and that this cell type showed the most active responses against MAP infection. The responses included the expression of proinflammatory cytokines and chemokines such as CCL4, CCL3, IL1B, IL8, and CCL20. In addition, the Mreg cell type, a novel cell type differentiated from THP-1 cells, was discovered. Thus, it is suggested that different cell types arise even when the same cell line is treated under the same conditions. Overall, analyzing gene expression patterns via scRNA-seq classification allows a more detailed observation of the response to infection by each cell type.


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1522 ◽  
Author(s):  
Brendan T. Innes ◽  
Gary D. Bader

Single-cell RNA sequencing (scRNAseq) represents a new kind of microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, in a single experiment. This technology is particularly valuable for characterization of tissue heterogeneity because it can be used to identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular type share similar transcriptomes, distinct from other cell types in the tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. The R Shiny software tool described here, scClustViz, provides a simple interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. This helps select a clustering parameter for further analysis. scClustViz also provides interactive visualisation of: cluster-specific distributions of technical factors, such as predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation of cell types and identification of cell type specific marker genes; and gene expression distributions over all cells and cell types. scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise. It is available at https://baderlab.github.io/scClustViz/.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1522 ◽  
Author(s):  
Brendan T. Innes ◽  
Gary D. Bader

Single-cell RNA sequencing (scRNAseq) represents a new kind of microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, in a single experiment. This technology is particularly valuable for characterization of tissue heterogeneity because it can be used to identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular type share similar transcriptomes, distinct from other cell types in the tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. The R Shiny software tool described here, scClustViz, provides a simple interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. This helps select a clustering parameter for further analysis. scClustViz also provides interactive visualisation of: cluster-specific distributions of technical factors, such as predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation of cell types and identification of cell type specific marker genes; and gene expression distributions over all cells and cell types. scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise. It is available at https://baderlab.github.io/scClustViz/.


2020 ◽  
Author(s):  
Mohit Goyal ◽  
Guillermo Serrano ◽  
Ilan Shomorony ◽  
Mikel Hernaez ◽  
Idoia Ochoa

AbstractSingle-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.


2021 ◽  
Author(s):  
Daniel Osorio ◽  
Marieke Lydia Kuijjer ◽  
James J. Cai

Motivation: Characterizing cells with rare molecular phenotypes is one of the promises of high throughput single-cell RNA sequencing (scRNA-seq) techniques. However, collecting enough cells with the desired molecular phenotype in a single experiment is challenging, requiring several samples preprocessing steps to filter and collect the desired cells experimentally before sequencing. Data integration of multiple public single-cell experiments stands as a solution for this problem, allowing the collection of enough cells exhibiting the desired molecular signatures. By increasing the sample size of the desired cell type, this approach enables a robust cell type transcriptome characterization. Results: Here, we introduce rPanglaoDB, an R package to download and merge the uniformly processed and annotated scRNA-seq data provided by the PanglaoDB database. To show the potential of rPanglaoDB for collecting rare cell types by integrating multiple public datasets, we present a biological application collecting and characterizing a set of 157 fibrocytes. Fibrocytes are a rare monocyte-derived cell type, that exhibits both the inflammatory features of macrophages and the tissue remodeling properties of fibroblasts. This constitutes the first fibrocytes' unbiased transcriptome profile report. We compared the transcriptomic profile of the fibrocytes against the fibroblasts collected from the same tissue samples and confirm their associated relationship with healing processes in tissue damage and infection through the activation of the prostaglandin biosynthesis and regulation pathway. Availability and Implementation: rPanglaoDB is implemented as an R package available through the CRAN repositories https://CRAN.R-project.org/package=rPanglaoDB.


2019 ◽  
Vol 35 (19) ◽  
pp. 3818-3820 ◽  
Author(s):  
Eugene Urrutia ◽  
Li Chen ◽  
Haibo Zhou ◽  
Yuchao Jiang

Abstract Summary Single-cell assay of transposase-accessible chromatin followed by sequencing (scATAC-seq) is an emerging new technology for the study of gene regulation with single-cell resolution. The data from scATAC-seq are unique—sparse, binary and highly variable even within the same cell type. As such, neither methods developed for bulk ATAC-seq nor single-cell RNA-seq data are appropriate. Here, we present Destin, a bioinformatic and statistical framework for comprehensive scATAC-seq data analysis. Destin performs cell-type clustering via weighted principle component analysis, weighting accessible chromatin regions by existing genomic annotations and publicly available regulomic datasets. The weights and additional tuning parameters are determined via model-based likelihood. We evaluated the performance of Destin using downsampled bulk ATAC-seq data of purified samples and scATAC-seq data from seven diverse experiments. Compared to existing methods, Destin was shown to outperform across all datasets and platforms. For demonstration, we further applied Destin to 2088 adult mouse forebrain cells and identified cell-type-specific association of previously reported schizophrenia GWAS loci. Availability and implementation Destin toolkit is freely available as an R package at https://github.com/urrutiag/destin. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Mahmoud M Ibrahim ◽  
Rafael Kramann

ABSTRACTMarker genes identified in single cell experiments are expected to be highly specific to a certain cell type and highly expressed in that cell type. Detecting a gene by differential expression analysis does not necessarily satisfy those two conditions and is typically computationally expensive for large cell numbers.Here we present genesorteR, an R package that ranks features in single cell data in a manner consistent with the expected definition of marker genes in experimental biology research. We benchmark genesorteR using various data sets and show that it is distinctly more accurate in large single cell data sets compared to other methods. genesorteR is orders of magnitude faster than current implementations of differential expression analysis methods, can operate on data containing millions of cells and is applicable to both single cell RNA-Seq and single cell ATAC-Seq data.genesorteR is available at https://github.com/mahmoudibrahim/genesorteR.


2021 ◽  
Author(s):  
A. Ali Heydari ◽  
Oscar A. Davalos ◽  
Lihong Zhao ◽  
Katrina K. Hoyer ◽  
Suzanne S. Sindi

MotivationSingle-cell RNA sequencing (scRNAseq) technologies allow for measurements of gene expression at a single-cell resolution. This provides researchers with a tremendous advantage for detecting heterogeneity, delineating cellular maps or identifying rare subpopulations. However, a critical complication remains: the low number of single-cell observations due to limitations by cost or rarity of subpopulation. This absence of suicient data may cause inaccuracy or irreproducibility of downstream analysis. In this work, we present ACTIVA (Automated Cell-Type-informed Introspective Variational Autoencoder): a novel framework for generating realistic synthetic data using a single-stream adversarial variational autoencoder conditioned with cell-type information. Data generation and augmentation with ACTIVA can enhance scRNAseq pipelines and analysis, such as benchmarking new algorithms, studying the accuracy of classifiers and detecting marker genes. ACTIVA will facilitate analysis of smaller datasets, potentially reducing the number of patients and animals necessary in initial studies.ResultsWe train and evaluate models on multiple public scRNAseq datasets. Under the same conditions, ACTIVA trains up to 17 times faster than the GAN-based state-of-the-art model, scGAN (2.2 hours compared to 39.5 hours on Brain Small) while performing better or comparable in our quantitative and qualitative evaluations. We show that augmenting rare-populations with ACTIVA can significantly increase the classification accuracy of the rare population (more than 45% improvement in our rarest test case).Availability of data and codeLinks to raw, pre- and post-processed data, source code and tutorials are available at https://github.com/SindiLab.Supplementary informationSupplementary material can be found as a separate file with the same pre-print submission.


Author(s):  
Tobias Tekath ◽  
Martin Dugas

Abstract Motivation Each year, the number of published bulk and single-cell RNA-seq data sets is growing exponentially. Studies analyzing such data are commonly looking at gene-level differences, while the collected RNA-seq data inherently represents reads of transcript isoform sequences. Utilizing transcriptomic quantifiers, RNA-seq reads can be attributed to specific isoforms, allowing for analysis of transcript-level differences. A differential transcript usage (DTU) analysis is testing for proportional differences in a gene’s transcript composition, and has been of rising interest for many research questions, such as analysis of differential splicing or cell type identification. Results We present the R package DTUrtle, the first DTU analysis workflow for both bulk and single-cell RNA-seq data sets, and the first package to conduct a ‘classical’ DTU analysis in a single-cell context. DTUrtle extends established statistical frameworks, offers various result aggregation and visualization options and a novel detection probability score for tagged-end data. It has been successfully applied to bulk and single-cell RNA-seq data of human and mouse, confirming and extending key results. Additionally, we present novel potential DTU applications like the identification of cell type specific transcript isoforms as biomarkers. Availability The R package DTUrtle is available at https://github.com/TobiTekath/DTUrtle with extensive vignettes and documentation at https://tobitekath.github.io/DTUrtle/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document