Identifying cell types from single-cell data based on similarities and dissimilarities between cells

Yuanyuan Li; Ping Luo; Yi Lu; Fang-Xiang Wu

doi:10.1186/s12859-020-03873-z

Identifying cell types from single-cell data based on similarities and dissimilarities between cells

BMC Bioinformatics ◽

10.1186/s12859-020-03873-z ◽

2021 ◽

Vol 22 (S3) ◽

Author(s):

Yuanyuan Li ◽

Ping Luo ◽

Yi Lu ◽

Fang-Xiang Wu

Keyword(s):

Gene Expression ◽

Single Cell ◽

Spectral Clustering ◽

Incidence Matrix ◽

Expression Patterns ◽

Cell Types ◽

Clustering Method ◽

Different Types ◽

Cell Data ◽

Spectral Clustering Method

Abstract Background With the development of the technology of single-cell sequence, revealing homogeneity and heterogeneity between cells has become a new area of computational systems biology research. However, the clustering of cell types becomes more complex with the mutual penetration between different types of cells and the instability of gene expression. One way of overcoming this problem is to group similar, related single cells together by the means of various clustering analysis methods. Although some methods such as spectral clustering can do well in the identification of cell types, they only consider the similarities between cells and ignore the influence of dissimilarities on clustering results. This methodology may limit the performance of most of the conventional clustering algorithms for the identification of clusters, it needs to develop special methods for high-dimensional sparse categorical data. Results Inspired by the phenomenon that same type cells have similar gene expression patterns, but different types of cells evoke dissimilar gene expression patterns, we improve the existing spectral clustering method for clustering single-cell data that is based on both similarities and dissimilarities between cells. The method first measures the similarity/dissimilarity among cells, then constructs the incidence matrix by fusing similarity matrix with dissimilarity matrix, and, finally, uses the eigenvalues of the incidence matrix to perform dimensionality reduction and employs the K-means algorithm in the low dimensional space to achieve clustering. The proposed improved spectral clustering method is compared with the conventional spectral clustering method in recognizing cell types on several real single-cell RNA-seq datasets. Conclusions In summary, we show that adding intercellular dissimilarity can effectively improve accuracy and achieve robustness and that improved spectral clustering method outperforms the traditional spectral clustering method in grouping cells.

Download Full-text

Improved Spectral Clustering Method for Identifying Cell Types from Single-Cell Data

Intelligent Computing Theories and Application - Lecture Notes in Computer Science ◽

10.1007/978-3-030-26969-2_17 ◽

2019 ◽

pp. 177-189

Author(s):

Yuanyuan Li ◽

Ping Luo ◽

Yi Lu ◽

Fang-Xiang Wu

Keyword(s):

Single Cell ◽

Spectral Clustering ◽

Cell Types ◽

Clustering Method ◽

Cell Data ◽

Spectral Clustering Method

Download Full-text

Comprehensive integration of single cell data

10.1101/460147 ◽

2018 ◽

Cited By ~ 74

Author(s):

Tim Stuart ◽

Andrew Butler ◽

Paul Hoffman ◽

Christoph Hafemeister ◽

Efthymia Papalexi ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Patterns ◽

Cell Types ◽

Chromatin Accessibility ◽

Substantial Improvement ◽

Single Cell Protein ◽

Cell Protein ◽

Spatial Positioning ◽

Cell Data

Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to “anchor” diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets.Availability: Installation instructions, documentation, and tutorials are available at: https://www.satijalab.org/seurat

Download Full-text

A single cell brain atlas in human Alzheimer’s disease

10.1101/628347 ◽

2019 ◽

Cited By ~ 4

Author(s):

Alexandra Grubman ◽

Gabriel Chew ◽

John F. Ouyang ◽

Guizhi Sun ◽

Xin Yi Choo ◽

...

Keyword(s):

Gene Expression ◽

Transcription Factor ◽

Single Cell ◽

Cell Fate ◽

Expression Patterns ◽

Cell Types ◽

Gene Expression Patterns ◽

Cell Type ◽

Web Resource ◽

Cell Type Specific

AbstractAlzheimer’s disease (AD) is a heterogeneous disease that is largely dependent on the complex cellular microenvironment in the brain. This complexity impedes our understanding of how individual cell types contribute to disease progression and outcome. To characterize the molecular and functional cell diversity in the human AD brain we utilized single nuclei RNA- seq in AD and control patient brains in order to map the landscape of cellular heterogeneity in AD. We detail gene expression changes at the level of cells and cell subclusters, highlighting specific cellular contributions to global gene expression patterns between control and Alzheimer’s patient brains. We observed distinct cellular regulation of APOE which was repressed in oligodendrocyte progenitor cells (OPCs) and astrocyte AD subclusters, and highly enriched in a microglial AD subcluster. In addition, oligodendrocyte and microglia AD subclusters show discordant expression of APOE. Integration of transcription factor regulatory modules with downstream GWAS gene targets revealed subcluster-specific control of AD cell fate transitions. For example, this analysis uncovered that astrocyte diversity in AD was under the control of transcription factor EB (TFEB), a master regulator of lysosomal function and which initiated a regulatory cascade containing multiple AD GWAS genes. These results establish functional links between specific cellular sub-populations in AD, and provide new insights into the coordinated control of AD GWAS genes and their cell-type specific contribution to disease susceptibility. Finally, we created an interactive reference web resource which will facilitate brain and AD researchers to explore the molecular architecture of subtype and AD-specific cell identity, molecular and functional diversity at the single cell level.HighlightsWe generated the first human single cell transcriptome in AD patient brainsOur study unveiled 9 clusters of cell-type specific and common gene expression patterns between control and AD brains, including clusters of genes that present properties of different cell types (i.e. astrocytes and oligodendrocytes)Our analyses also uncovered functionally specialized sub-cellular clusters: 5 microglial clusters, 8 astrocyte clusters, 6 neuronal clusters, 6 oligodendrocyte clusters, 4 OPC and 2 endothelial clusters, each enriched for specific ontological gene categoriesOur analyses found manifold AD GWAS genes specifically associated with one cell-type, and sets of AD GWAS genes co-ordinately and differentially regulated between different brain cell-types in AD sub-cellular clustersWe mapped the regulatory landscape driving transcriptional changes in AD brain, and identified transcription factor networks which we predict to control cell fate transitions between control and AD sub-cellular clustersFinally, we provide an interactive web-resource that allows the user to further visualise and interrogate our dataset.Data resource web interface:http://adsn.ddnetbio.com

Download Full-text

Semi-soft Clustering of Single Cell Data

10.1101/285056 ◽

2018 ◽

Author(s):

Lingxue Zhu ◽

Jing Lei ◽

Bernie Devlin ◽

Kathryn Roeder

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Pairwise Comparison ◽

Cell Types ◽

Intermediate Cell ◽

Soft Clustering ◽

Membership Matrix ◽

Cell Data

AbstractMotivated by the dynamics of development, in which cells of recognizable types, or pure cell types, transition into other types over time, we propose a method of semi-soft clustering that can classify both pure and intermediate cell types from data on gene expression or protein abundance from individual cells. Called SOUP, for Semi-sOft clUstering with Pure cells, this novel algorithm reveals the clustering structure for both pure cells, which belong to one single cluster, as well as transitional cells with soft memberships. SOUP involves a two-step process: identify the set of pure cells and then estimate a membership matrix. To find pure cells, SOUP uses the special block structure the K cell types form in a similarity matrix, devised by pairwise comparison of the gene expression profiles of individual cells. Once pure cells are identified, they provide the key information from which the membership matrix can be computed. SOUP is applicable to general clustering problems as well, as long as the unrestrictive modeling assumptions hold. The performance of SOUP is documented via extensive simulation studies. Using SOUP to analyze two single cell data sets from brain shows it produce sensible and interpretable results.

Download Full-text

Comprehensive characterization of tissue-specific chromatin accessibility in L2 Caenorhabditis elegans nematodes

10.1101/2020.09.15.299123 ◽

2020 ◽

Author(s):

Timothy J. Durham ◽

Riza M. Daza ◽

Louis Gevirtzman ◽

Darren A. Cusanovich ◽

William Stafford Noble ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Patterns ◽

Cell Types ◽

Chromatin Accessibility ◽

Gene Expression Patterns ◽

Rna Seq ◽

Cell Type ◽

Tissue Specific ◽

C Elegans

AbstractRecently developed single cell technologies allow researchers to characterize cell states at ever greater resolution and scale. C. elegans is a particularly tractable system for studying development, and recent single cell RNA-seq studies characterized the gene expression patterns for nearly every cell type in the embryo and at the second larval stage (L2). Gene expression patterns are useful for learning about gene function and give insight into the biochemical state of different cell types; however, in order to understand these cell types, we must also determine how these gene expression levels are regulated. We present the first single cell ATAC-seq study in C. elegans. We collected data in L2 larvae to match the available single cell RNA-seq data set, and we identify tissue-specific chromatin accessibility patterns that align well with existing data, including the L2 single cell RNA-seq results. Using a novel implementation of the latent Dirichlet allocation algorithm, we leverage the single-cell resolution of the sci-ATAC-seq data to identify accessible loci at the level of individual cell types, providing new maps of putative cell type-specific gene regulatory sites, with promise for better understanding of cellular differentiation and gene regulation in the worm.

Download Full-text

Kidney Single-cell Transcriptomes Predict Spatial Corticomedullary Gene Expression and Tissue Osmolality Gradients

Journal of the American Society of Nephrology ◽

10.1681/asn.2020070930 ◽

2020 ◽

pp. ASN.2020070930

Author(s):

Christian Hinze ◽

Nikos Karaiskos ◽

Anastasiya Boltengagen ◽

Katharina Walentin ◽

Klea Redo ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Collecting Duct ◽

Expression Patterns ◽

Cell Types ◽

Spatial Position ◽

Sufficient Information ◽

Additional Information ◽

Spatial Reconstruction ◽

Tissue Osmolality

BackgroundSingle-cell transcriptomes from dissociated tissues provide insights into cell types and their gene expression and may harbor additional information on spatial position and the local microenvironment. The kidney’s cells are embedded into a gradient of increasing tissue osmolality from the cortex to the medulla, which may alter their transcriptomes and provide cues for spatial reconstruction.MethodsSingle-cell or single-nuclei mRNA sequencing of dissociated mouse kidneys and of dissected cortex, outer, and inner medulla, to represent the corticomedullary axis, was performed. Computational approaches predicted the spatial ordering of cells along the corticomedullary axis and quantitated expression levels of osmo-responsive genes. In situ hybridization validated computational predictions of spatial gene-expression patterns. The strategy was used to compare single-cell transcriptomes from wild-type mice to those of mice with a collecting duct–specific knockout of the transcription factor grainyhead-like 2 (Grhl2CD−/−), which display reduced renal medullary osmolality.ResultsSingle-cell transcriptomics from dissociated kidneys provided sufficient information to approximately reconstruct the spatial position of kidney tubule cells and to predict corticomedullary gene expression. Spatial gene expression in the kidney changes gradually and osmo-responsive genes follow the physiologic corticomedullary gradient of tissue osmolality. Single-nuclei transcriptomes from Grhl2CD−/− mice indicated a flattened expression gradient of osmo-responsive genes compared with control mice, consistent with their physiologic phenotype.ConclusionsSingle-cell transcriptomics from dissociated kidneys facilitated the prediction of spatial gene expression along the corticomedullary axis and quantitation of osmotically regulated genes, allowing the prediction of a physiologic phenotype.

Download Full-text

Decision tree models and cell fate choice

10.1101/2020.12.19.423629 ◽

2020 ◽

Author(s):

Ivan Croydon Veleslavov ◽

Michael P.H. Stumpf

Keyword(s):

Gene Expression ◽

Single Cell ◽

Cell Fate ◽

Cell Types ◽

Lineage Tree ◽

Tree Models ◽

Fate Decision ◽

Average Gene ◽

Lineage Trees ◽

Cell Data

AbstractSingle cell transcriptomics has laid bare the heterogeneity of apparently identical cells at the level of gene expression. For many cell-types we now know that there is variability in the abundance of many transcripts, and that average transcript abun-dance or average gene expression can be a unhelpful concept. A range of clustering and other classification methods have been proposed which use the signal in single cell data to classify, that is assign cell types, to cells based on their transcriptomic states. In many cases, however, we would like to have not just a classifier, but also a set of interpretable rules by which this classification occurs. Here we develop and demonstrate the interpretive power of one such approach, which sets out to establish a biologically interpretable classification scheme. In particular we are interested in capturing the chain of regulatory events that drive cell-fate decision making across a lineage tree or lineage sequence. We find that suitably defined decision trees can help to resolve gene regulatory programs involved in shaping lineage trees. Our approach combines predictive power with interpretabilty and can extract logical rules from single cell data.

Download Full-text

Single-Cell Analysis of the Gene Expression Effects of Developmental Lead (Pb) Exposure on the Mouse Hippocampus

Toxicological Sciences ◽

10.1093/toxsci/kfaa069 ◽

2020 ◽

Vol 176 (2) ◽

pp. 396-409

Author(s):

Kelly M Bakulski ◽

John F Dou ◽

Robert C Thompson ◽

Christopher Lee ◽

Lauren Y Middleton ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Single Cell Analysis ◽

Expression Patterns ◽

Cell Types ◽

Cell Cluster ◽

Marker Genes ◽

Cell Type ◽

Cell Clusters ◽

Pb Exposure

Abstract Lead (Pb) exposure is ubiquitous with permanent neurodevelopmental effects. The hippocampus brain region is involved in learning and memory with heterogeneous cellular composition. The hippocampus cell type-specific responses to Pb are unknown. The objective of this study is to examine perinatal Pb treatment effects on adult hippocampus gene expression, at the level of individual cells. In mice perinatally exposed to control water or a human physiologically relevant level (32 ppm in maternal drinking water) of Pb, 2 weeks prior to mating through weaning, we tested for hippocampus gene expression and cellular differences at 5 months of age. We sequenced RNA from 5258 hippocampal cells to (1) test for treatment gene expression differences averaged across all cells, (2) compare cell cluster composition by treatment, and (3) test for treatment gene expression and pathway differences within cell clusters. Gene expression patterns revealed 12 hippocampus cell clusters, mapping to major expected cell types (eg, microglia, astrocytes, neurons, and oligodendrocytes). Perinatal Pb treatment was associated with 12.4% more oligodendrocytes (p = 4.4 × 10−21) in adult mice. Across all cells, Pb treatment was associated with expression of cell cluster marker genes. Within cell clusters, Pb treatment (q < 0.05) caused differential gene expression in endothelial, microglial, pericyte, and astrocyte cells. Pb treatment upregulated protein folding pathways in microglia (p = 3.4 × 10−9) and stress response in oligodendrocytes (p = 3.2 × 10−5). Bulk tissue analysis may be influenced by changes in cell type composition, obscuring effects within vulnerable cell types. This study serves as a biological reference for future single-cell toxicant studies, to ultimately characterize molecular effects on cognition and behavior.

Download Full-text

Finding cell-specific expression patterns in the early Ciona embryo with single-cell RNA-seq

10.1101/197699 ◽

2017 ◽

Author(s):

Garth R. Ilsley ◽

Ritsuko Suyama ◽

Takeshi Noda ◽

Nori Satoh ◽

Nicholas M. Luscombe

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Patterns ◽

Cell Types ◽

Specific Gene ◽

Rna Seq ◽

Cell Stage ◽

Specific Expression ◽

Temporal Gene Expression

AbstractSingle-cell RNA-seq has been established as a reliable and accessible technique enabling new types of analyses, such as identifying cell types and studying spatial and temporal gene expression variation and change at single-cell resolution. Recently, single-cell RNA-seq has been applied to developing embryos, which offers great potential for finding and characterising genes controlling the course of development along with their expression patterns. In this study, we applied single-cell RNA-seq to the 16-cell stage of the Ciona embryo, a marine chordate and performed a computational search for cell-specific gene expression patterns. We recovered many known expression patterns from our single-cell RNA-seq data and despite extensive previous screens, we succeeded in finding new cell-specific patterns, which we validated by in situ and single-cell qPCR.

Download Full-text

scClustViz – Single-cell RNAseq cluster assessment and visualization

F1000Research ◽

10.12688/f1000research.16198.2 ◽

2019 ◽

Vol 7 ◽

pp. 1522 ◽

Cited By ~ 8

Author(s):

Brendan T. Innes ◽

Gary D. Bader

Keyword(s):

Gene Expression ◽

Single Cell ◽

Clustering Algorithms ◽

Expression Patterns ◽

Software Tool ◽

Cell Types ◽

Marker Genes ◽

Specific Marker ◽

Cell Type ◽

Single Experiment

Single-cell RNA sequencing (scRNAseq) represents a new kind of microscope that can measure the transcriptome profiles of thousands of individual cells from complex cellular mixtures, such as in a tissue, in a single experiment. This technology is particularly valuable for characterization of tissue heterogeneity because it can be used to identify and classify all cell types in a tissue. This is generally done by clustering the data, based on the assumption that cells of a particular type share similar transcriptomes, distinct from other cell types in the tissue. However, nearly all clustering algorithms have tunable parameters which affect the number of clusters they will identify in data. The R Shiny software tool described here, scClustViz, provides a simple interactive graphical user interface for exploring scRNAseq data and assessing the biological relevance of clustering results. Given that cell types are expected to have distinct gene expression patterns, scClustViz uses differential gene expression between clusters as a metric for assessing the fit of a clustering result to the data at multiple cluster resolution levels. This helps select a clustering parameter for further analysis. scClustViz also provides interactive visualisation of: cluster-specific distributions of technical factors, such as predicted cell cycle stage and other metadata; cluster-wise gene expression statistics to simplify annotation of cell types and identification of cell type specific marker genes; and gene expression distributions over all cells and cell types. scClustViz provides an interactive interface for visualisation, assessment, and biological interpretation of cell-type classifications in scRNAseq experiments that can be easily added to existing analysis pipelines, enabling customization by bioinformaticians while enabling biologists to explore their results without the need for computational expertise. It is available at https://baderlab.github.io/scClustViz/.

Download Full-text