scholarly journals Poincaré Maps for Analyzing Complex Hierarchies in Single-Cell Data

2019 ◽  
Author(s):  
Anna Klimovskaia ◽  
David Lopez-Paz ◽  
Léon Bottou ◽  
Maximilian Nickel

AbstractThe need to understand cell developmental processes spawned a plethora of computational methods for discovering hierarchies from scRNAseq data. However, existing techniques are based on Euclidean geometry, a suboptimal choice for modeling complex cell trajectories with multiple branches. To overcome this fundamental representation issue we propose Poincaré maps, a method that harness the power of hyperbolic geometry into the realm of single-cell data analysis. Often understood as a continuous extension of trees, hyperbolic geometry enables the embedding of complex hierarchical data in only two dimensions while preserving the pairwise distances between points in the hierarchy. This enables direct exploratory analysis and the use of our embeddings in a wide variety of downstream data analysis tasks, such as visualization, clustering, lineage detection and pseudo-time inference. When compared to existing methods —unable to address all these important tasks using a single embedding— Poincaré maps produce state-of-the-art two-dimensional representations of cell trajectories on multiple scRNAseq datasets. More specifically, we demonstrate that Poincaré maps allow in a straightforward manner to formulate new hypotheses about biological processes unbeknown to prior methods.Significance statementThe discovery of hierarchies in biological processes is central to developmental biology. We propose Poincaré maps, a new method based on hyperbolic geometry to discover continuous hierarchies from pairwise similarities. We demonstrate the efficacy of our method on multiple single-cell datasets on tasks such as visualization, clustering, lineage identification, and pseudo-time inference.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
M. Büttner ◽  
J. Ostner ◽  
C. L. Müller ◽  
F. J. Theis ◽  
B. Schubert

AbstractCompositional changes of cell types are main drivers of biological processes. Their detection through single-cell experiments is difficult due to the compositionality of the data and low sample sizes. We introduce scCODA (https://github.com/theislab/scCODA), a Bayesian model addressing these issues enabling the study of complex cell type effects in disease, and other stimuli. scCODA demonstrated excellent detection performance, while reliably controlling for false discoveries, and identified experimentally verified cell type changes that were missed in original analyses.


2020 ◽  
Author(s):  
M. Büttner ◽  
J. Ostner ◽  
CL. Müller ◽  
FJ. Theis ◽  
B. Schubert

AbstractCompositional changes of cell types are main drivers of biological processes. Their detection through single-cell experiments is difficult due to the compositionality of the data and low sample sizes. We introduce scCODA (https://github.com/theislab/scCODA), a Bayesian model addressing these issues enabling the study of complex cell type effects in disease, and other stimuli. scCODA demonstrated excellent detection performance and identified experimentally verified cell type changes that were missed in original analyses.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Anna Klimovskaia ◽  
David Lopez-Paz ◽  
Léon Bottou ◽  
Maximilian Nickel

2021 ◽  
pp. 338872
Author(s):  
Gerjen H. Tinnevelt ◽  
Kristiaan Wouters ◽  
Geert J. Postma ◽  
Rita Folcarelli ◽  
Jeroen J. Jansen

eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Prathitha Kar ◽  
Sriram Tiruvadi-Krishnan ◽  
Jaana Männik ◽  
Jaan Männik ◽  
Ariel Amir

Collection of high-throughput data has become prevalent in biology. Large datasets allow the use of statistical constructs such as binning and linear regression to quantify relationships between variables and hypothesize underlying biological mechanisms based on it. We discuss several such examples in relation to single-cell data and cellular growth. In particular, we show instances where what appears to be ordinary use of these statistical methods leads to incorrect conclusions such as growth being non-exponential as opposed to exponential and vice versa. We propose that the data analysis and its interpretation should be done in the context of a generative model, if possible. In this way, the statistical methods can be validated either analytically or against synthetic data generated via the use of the model, leading to a consistent method for inferring biological mechanisms from data. On applying the validated methods of data analysis to infer cellular growth on our experimental data, we find the growth of length in E. coli to be non-exponential. Our analysis shows that in the later stages of the cell cycle the growth rate is faster than exponential.


2020 ◽  
Author(s):  
Giovana Ravizzoni Onzi ◽  
Juliano Luiz Faccioni ◽  
Alvaro G. Alvarado ◽  
Paula Andreghetto Bracco ◽  
Harley I. Kornblum ◽  
...  

Outliers are often ignored or even removed from data analysis. In cancer, however, single outlier cells can be of major importance, since they have uncommon characteristics that may confer capacity to invade, metastasize, or resist to therapy. Here we present the Single-Cell OUTlier analysis (SCOUT), a resource for single-cell data analysis focusing on outlier cells, and the SCOUT Selector (SCOUTS), an application to systematically apply SCOUT on a dataset over a wide range of biological markers. Using publicly available datasets of cancer samples obtained from mass cytometry and single-cell RNA-seq platforms, outlier cells for the expression of proteins or RNAs were identified and compared to their non-outlier counterparts among different samples. Our results show that analyzing single-cell data using SCOUT can uncover key information not easily observed in the analysis of the whole population.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Lucille Lopez-Delisle ◽  
Jean-Baptiste Delisle

Abstract Background The number of studies using single-cell RNA sequencing (scRNA-seq) is constantly growing. This powerful technique provides a sampling of the whole transcriptome of a cell. However, sparsity of the data can be a major hurdle when studying the distribution of the expression of a specific gene or the correlation between the expressions of two genes. Results We show that the main technical noise associated with these scRNA-seq experiments is due to the sampling, i.e., Poisson noise. We present a new tool named baredSC, for Bayesian Approach to Retrieve Expression Distribution of Single-Cell data, which infers the intrinsic expression distribution in scRNA-seq data using a Gaussian mixture model. baredSC can be used to obtain the distribution in one dimension for individual genes and in two dimensions for pairs of genes, in particular to estimate the correlation in the two genes’ expressions. We apply baredSC to simulated scRNA-seq data and show that the algorithm is able to uncover the expression distribution used to simulate the data, even in multi-modal cases with very sparse data. We also apply baredSC to two real biological data sets. First, we use it to measure the anti-correlation between Hoxd13 and Hoxa11, two genes with known genetic interaction in embryonic limb. Then, we study the expression of Pitx1 in embryonic hindlimb, for which a trimodal distribution has been identified through flow cytometry. While other methods to analyze scRNA-seq are too sensitive to sampling noise, baredSC reveals this trimodal distribution. Conclusion baredSC is a powerful tool which aims at retrieving the expression distribution of few genes of interest from scRNA-seq data.


FEBS Journal ◽  
2018 ◽  
Vol 286 (8) ◽  
pp. 1451-1467 ◽  
Author(s):  
Helena Todorov ◽  
Yvan Saeys

Sign in / Sign up

Export Citation Format

Share Document