scholarly journals Non-linear Archetypal Analysis of Single-cell RNA-seq Data by Deep Autoencoders

2021 ◽  
Author(s):  
Yuge Wang ◽  
Hongyu Zhao

Advances in single-cell RNA sequencing (scRNA-seq) have led to successes in discovering novel cell types and understanding cellular heterogeneity among complex cell populations through cluster analysis. However, cluster analysis is not able to reveal continuous spectrum of states and underlying gene expression programs (GEPs) shared across cell types. We introduce scAAnet, an autoencoder for single-cell non-linear archetypal analysis, to identify GEPs and infer the relative activity of each GEP across cells. We use a count distribution-based loss term to account for the sparsity and overdispersion of the raw count data and add an archetypal constraint to the loss function of scAAnet. We first show that scAAnet outperforms existing methods for archetypal analysis across different metrics through simulations. We then demonstrate the ability of scAAnet to extract biologically meaningful GEPs using publicly available scRNA-seq datasets including a pancreatic islet dataset, a lung idiopathic pulmonary fibrosis dataset and a prefrontal cortex dataset.

2020 ◽  
Author(s):  
Jixing Zhong ◽  
Gen Tang ◽  
Jiacheng Zhu ◽  
Xin Qiu ◽  
Weiying Wu ◽  
...  

AbstractParkinson’s disease (PD) is a neurodegenerative disease leading to the impairment of execution of movement. PD pathogenesis has been largely investigated, but either restricted in bulk level or at certain cell types, which failed to capture cellular heterogeneity and intrinsic interplays among distinct cell types. To overcome this, we applied single-nucleus RNA-seq and single cell ATAC-seq on cerebellum, midbrain and striatum of PD mouse and matched control. With 74,493 cells in total, we comprehensively depicted the dysfunctions under PD pathology covering proteostasis, neuroinflammation, calcium homeostasis and extracellular neurotransmitter homeostasis. Besides, by multi-omics approach, we identified putative biomarkers for early stage of PD, based on the relationships between transcriptomic and epigenetic profiles. We located certain cell types that primarily contribute to PD early pathology, narrowing the gap between genotypes and phenotypes. Taken together, our study provides a valuable resource to dissect the molecular mechanism of PD pathogenesis at single cell level, which could facilitate the development of novel methods regarding diagnosis, monitoring and practical therapies against PD at early stage.


2018 ◽  
Author(s):  
Xuran Wang ◽  
Jihwan Park ◽  
Katalin Susztak ◽  
Nancy R. Zhang ◽  
Mingyao Li

AbstractWe present MuSiC, a method that utilizes cell-type specific gene expression from single-cell RNA sequencing (RNA-seq) data to characterize cell type compositions from bulk RNA-seq data in complex tissues. When applied to pancreatic islet and whole kidney expression data in human, mouse, and rats, MuSiC outperformed existing methods, especially for tissues with closely related cell types. MuSiC enables characterization of cellular heterogeneity of complex tissues for identification of disease mechanisms.


2021 ◽  
Author(s):  
Lorenzo Martini ◽  
Roberta Bardini ◽  
Stefano Di Carlo

The mammalian cortex contains a great variety of neuronal cells. In particular, GABAergic interneurons, which play a major role in neuronal circuit function, exhibit an extraordinary diversity of cell types. In this regard, single-cell RNA-seq analysis is crucial to study cellular heterogeneity. To identify and analyze rare cell types, it is necessary to reliably label cells through known markers. In this way, all the related studies are dependent on the quality of the employed marker genes. Therefore, in this work, we investigate how a set of chosen inhibitory interneurons markers perform. The gene set consists of both immunohistochemistry-derived genes and single-cell RNA-seq taxonomy ones. We employed various human and mouse datasets of the brain cortex, consequently processed with the Monocle3 pipeline. We defined metrics based on the relations between unsupervised cluster results and the marker expression. Specifically, we calculated the specificity, the fraction of cells expressing, and some metrics derived from decision tree analysis like entropy gain and impurity reduction. The results highlighted the strong reliability of some markers but also the low quality of others. More interestingly, though, a correlation emerges between the general performances of the genes set and the experimental quality of the datasets. Therefore, the proposed method allows evaluating the quality of a dataset in relation to its reliability regarding the inhibitory interneurons cellular heterogeneity study.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Shahin Mohammadi ◽  
Jose Davila-Velderrain ◽  
Manolis Kellis

Abstract Dissecting the cellular heterogeneity embedded in single-cell transcriptomic data is challenging. Although many methods and approaches exist, identifying cell states and their underlying topology is still a major challenge. Here, we introduce the concept of multiresolution cell-state decomposition as a practical approach to simultaneously capture both fine- and coarse-grain patterns of variability. We implement this concept in ACTIONet, a comprehensive framework that combines archetypal analysis and manifold learning to provide a ready-to-use analytical approach for multiresolution single-cell state characterization. ACTIONet provides a robust, reproducible, and highly interpretable single-cell analysis platform that couples dominant pattern discovery with a corresponding structural representation of the cell state landscape. Using multiple synthetic and real data sets, we demonstrate ACTIONet’s superior performance relative to existing alternatives. We use ACTIONet to integrate and annotate cells across three human cortex data sets. Through integrative comparative analysis, we define a consensus vocabulary and a consistent set of gene signatures discriminating against the transcriptomic cell types and subtypes of the human prefrontal cortex.


2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Akram Vasighizaker ◽  
Saiteja Danda ◽  
Luis Rueda

AbstractIdentifying relevant disease modules such as target cell types is a significant step for studying diseases. High-throughput single-cell RNA-Seq (scRNA-seq) technologies have advanced in recent years, enabling researchers to investigate cells individually and understand their biological mechanisms. Computational techniques such as clustering, are the most suitable approach in scRNA-seq data analysis when the cell types have not been well-characterized. These techniques can be used to identify a group of genes that belong to a specific cell type based on their similar gene expression patterns. However, due to the sparsity and high-dimensionality of scRNA-seq data, classical clustering methods are not efficient. Therefore, the use of non-linear dimensionality reduction techniques to improve clustering results is crucial. We introduce a method that is used to identify representative clusters of different cell types by combining non-linear dimensionality reduction techniques and clustering algorithms. We assess the impact of different dimensionality reduction techniques combined with the clustering of thirteen publicly available scRNA-seq datasets of different tissues, sizes, and technologies. We further performed gene set enrichment analysis to evaluate the proposed method’s performance. As such, our results show that modified locally linear embedding combined with independent component analysis yields overall the best performance relative to the existing unsupervised methods across different datasets.


Author(s):  
Hananeh Aliee ◽  
Fabian Theis

AbstractTissues are complex systems of interacting cell types. Knowing cell-type proportions in a tissue is very important to identify which cells or cell types are targeted by a disease or perturbation. When measuring such responses using RNA-seq, bulk RNA-seq masks cellular heterogeneity. Hence, several computational methods have been proposed to infer cell-type proportions from bulk RNA samples. Their performance with noisy reference profiles highly depends on the set of genes undergoing deconvolution. These genes are often selected based on prior knowledge or a single-criterion test that might not be useful to dissect closely correlated cell types. In this work, we introduce AutoGeneS, a tool that automatically extracts informative genes and reveals the cellular heterogeneity of bulk RNA samples. AutoGeneS requires no prior knowledge about marker genes and selects genes by simultaneously optimizing multiple criteria: minimizing the correlation and maximizing the distance between cell types. It can be applied to reference profiles from various sources like single-cell experiments or sorted cell populations. Results from human samples of peripheral blood illustrate that AutoGeneS outperforms other methods. Our results also highlight the impact of our approach on analyzing bulk RNA samples with noisy single-cell reference profiles and closely correlated cell types. Ground truth cell proportions analyzed by flow cytometry confirmed the accuracy of the predictions of AutoGeneS in identifying cell-type proportions. AutoGeneS is available for use via a standalone Python package (https://github.com/theislab/AutoGeneS).


2019 ◽  
Author(s):  
Di Ran ◽  
Shanshan Zhang ◽  
Nicholas Lytal ◽  
Lingling An

AbstractSingle-cell RNA sequencing (scRNA-seq) has become an important tool to unravel cellular heterogeneity, discover new cell types, and understand cell development at single-cell resolution. However, one major challenge to scRNA-seq research is the presence of “drop-out” events, which usually is due to extremely low mRNA input or the stochastic nature of gene expression. In this paper, we present a novel Single-Cell RNA-seq Drop-Out Correction (scDoc) method, imputing drop-out events by borrowing information for the same gene from highly similar cells. scDoc is the first method that involves drop-out information to account for cell-to-cell similarity estimation, which is crucial in scRNA-seq drop-out imputation but has not been appropriately examined. We evaluated the performance of scDoc using both simulated data and real scRNA-seq studies. Results show that scDoc can impute the drop-out events more accurately and robustly; specifically, it outperforms all available imputation methods in reference to data visualization, cell subpopulation identification, and differential expression detection in scRNA-seq data.


Author(s):  
Yunjin Li ◽  
Qiyue Xu ◽  
Duojiao Wu ◽  
Geng Chen

Single-cell RNA-seq (scRNA-seq) technologies are broadly applied to dissect the cellular heterogeneity and expression dynamics, providing unprecedented insights into single-cell biology. Most of the scRNA-seq studies mainly focused on the dissection of cell types/states, developmental trajectory, gene regulatory network, and alternative splicing. However, besides these routine analyses, many other valuable scRNA-seq investigations can be conducted. Here, we first review cell-to-cell communication exploration, RNA velocity inference, identification of large-scale copy number variations and single nucleotide changes, and chromatin accessibility prediction based on single-cell transcriptomics data. Next, we discuss the identification of novel genes/transcripts through transcriptome reconstruction approaches, as well as the profiling of long non-coding RNAs and circular RNAs. Additionally, we survey the integration of single-cell and bulk RNA-seq datasets for deconvoluting the cell composition of large-scale bulk samples and linking single-cell signatures to patient outcomes. These additional analyses could largely facilitate corresponding basic science and clinical applications.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ruizhu Huang ◽  
Charlotte Soneson ◽  
Pierre-Luc Germain ◽  
Thomas S.B. Schmidt ◽  
Christian Von Mering ◽  
...  

AbstracttreeclimbR is for analyzing hierarchical trees of entities, such as phylogenies or cell types, at different resolutions. It proposes multiple candidates that capture the latent signal and pinpoints branches or leaves that contain features of interest, in a data-driven way. It outperforms currently available methods on synthetic data, and we highlight the approach on various applications, including microbiome and microRNA surveys as well as single-cell cytometry and RNA-seq datasets. With the emergence of various multi-resolution genomic datasets, treeclimbR provides a thorough inspection on entities across resolutions and gives additional flexibility to uncover biological associations.


Sign in / Sign up

Export Citation Format

Share Document