scholarly journals Mechanistic hierarchical population model identifies latent causes of cell-to-cell variability

2017 ◽  
Author(s):  
Carolin Loos ◽  
Katharina Moeller ◽  
Fabian Fröhlich ◽  
Tim Hucho ◽  
Jan Hasenauer

All biological systems exhibit cell-to-cell variability, and this variability often has functional implications. To gain a thorough understanding of biological processes, the latent causes and underlying mechanisms of this variability must be elucidated. Cell populations comprising multiple distinct subpopulations are commonplace in biology, yet no current methods allow the sources of variability between and within individual subpopulations to be identified. This limits the analysis of single-cell data, for example provided by flow cytometry and microscopy. In this study, we present a data-driven modeling framework for the analysis of populations comprising heterogeneous subpopulations. Our approach combines mixture modeling with frameworks for distribution approximation, facilitating the integration of multiple single-cell datasets and the detection of causal differences between and within subpopulations. The computational efficiency of our framework allows hundreds of competing hypotheses to be compared, giving unprecedented depth of a study. We demonstrated the ability of our method to capture multiple levels of heterogeneity in the analyzes of simulated data and data from highly heterogeneous sensory neurons involved in pain initiation. Our approach identified the sources of cell-to-cell variability and revealed mechanisms that underlie the modulation of nerve growth factor-induced Erk1/2 signaling by extracellular scaffolds.

Author(s):  
Xiangtao Li ◽  
Shaochuan Li ◽  
Lei Huang ◽  
Shixiong Zhang ◽  
Ka-chun Wong

Abstract Single-cell RNA sequencing (scRNA-seq) technologies have been heavily developed to probe gene expression profiles at single-cell resolution. Deep imputation methods have been proposed to address the related computational challenges (e.g. the gene sparsity in single-cell data). In particular, the neural architectures of those deep imputation models have been proven to be critical for performance. However, deep imputation architectures are difficult to design and tune for those without rich knowledge of deep neural networks and scRNA-seq. Therefore, Surrogate-assisted Evolutionary Deep Imputation Model (SEDIM) is proposed to automatically design the architectures of deep neural networks for imputing gene expression levels in scRNA-seq data without any manual tuning. Moreover, the proposed SEDIM constructs an offline surrogate model, which can accelerate the computational efficiency of the architectural search. Comprehensive studies show that SEDIM significantly improves the imputation and clustering performance compared with other benchmark methods. In addition, we also extensively explore the performance of SEDIM in other contexts and platforms including mass cytometry and metabolic profiling in a comprehensive manner. Marker gene detection, gene ontology enrichment and pathological analysis are conducted to provide novel insights into cell-type identification and the underlying mechanisms. The source code is available at https://github.com/li-shaochuan/SEDIM.


2021 ◽  
Vol 17 (12) ◽  
pp. e1009466
Author(s):  
Stephen Zhang ◽  
Anton Afanassiev ◽  
Laura Greenstreet ◽  
Tetsuya Matsumoto ◽  
Geoffrey Schiebinger

Understanding how cells change their identity and behaviour in living systems is an important question in many fields of biology. The problem of inferring cell trajectories from single-cell measurements has been a major topic in the single-cell analysis community, with different methods developed for equilibrium and non-equilibrium systems (e.g. haematopoeisis vs. embryonic development). We show that optimal transport analysis, a technique originally designed for analysing time-courses, may also be applied to infer cellular trajectories from a single snapshot of a population in equilibrium. Therefore, optimal transport provides a unified approach to inferring trajectories that is applicable to both stationary and non-stationary systems. Our method, StationaryOT, is mathematically motivated in a natural way from the hypothesis of a Waddington’s epigenetic landscape. We implement StationaryOT as a software package and demonstrate its efficacy in applications to simulated data as well as single-cell data from Arabidopsis thaliana root development.


2019 ◽  
Author(s):  
Anna Klimovskaia ◽  
David Lopez-Paz ◽  
Léon Bottou ◽  
Maximilian Nickel

AbstractThe need to understand cell developmental processes spawned a plethora of computational methods for discovering hierarchies from scRNAseq data. However, existing techniques are based on Euclidean geometry, a suboptimal choice for modeling complex cell trajectories with multiple branches. To overcome this fundamental representation issue we propose Poincaré maps, a method that harness the power of hyperbolic geometry into the realm of single-cell data analysis. Often understood as a continuous extension of trees, hyperbolic geometry enables the embedding of complex hierarchical data in only two dimensions while preserving the pairwise distances between points in the hierarchy. This enables direct exploratory analysis and the use of our embeddings in a wide variety of downstream data analysis tasks, such as visualization, clustering, lineage detection and pseudo-time inference. When compared to existing methods —unable to address all these important tasks using a single embedding— Poincaré maps produce state-of-the-art two-dimensional representations of cell trajectories on multiple scRNAseq datasets. More specifically, we demonstrate that Poincaré maps allow in a straightforward manner to formulate new hypotheses about biological processes unbeknown to prior methods.Significance statementThe discovery of hierarchies in biological processes is central to developmental biology. We propose Poincaré maps, a new method based on hyperbolic geometry to discover continuous hierarchies from pairwise similarities. We demonstrate the efficacy of our method on multiple single-cell datasets on tasks such as visualization, clustering, lineage identification, and pseudo-time inference.


2017 ◽  
Author(s):  
Rosanna C. G. Smith ◽  
Ben D. MacArthur

AbstractPurpose of ReviewTo outline how ideas from Information Theory may be used to analyze single cell data and better understand stem cell behaviour.Recent findingsRecent technological breakthroughs in single cell profiling have made it possible to interrogate cell-to-cell variability in a multitude of contexts, including the role it plays in stem cell dynamics. Here we review how measures from information theory are being used to extract biological meaning from the complex, high-dimensional and noisy datasets that arise from single cell profiling experiments. We also discuss how concepts linking information theory and statistical mechanics are being used to provide insight into cellular identity, variability and dynamics.SummaryWe provide a brief introduction to some basic notions from information theory and how they may be used to understand stem cell identities at the single cell level. We also discuss how work in this area might develop in the near future.


2019 ◽  
Vol 2 (4) ◽  
pp. e201900443 ◽  
Author(s):  
Jun Woo ◽  
Boris J. Winterhoff ◽  
Timothy K. Starr ◽  
Constantin Aliferis ◽  
Jinhua Wang

Recent single-cell transcriptomic studies revealed new insights into cell-type heterogeneities in cellular microenvironments unavailable from bulk studies. A significant drawback of currently available algorithms is the need to use empirical parameters or rely on indirect quality measures to estimate the degree of complexity, i.e., the number of subgroups present in the sample. We fill this gap with a single-cell data analysis procedure allowing for unambiguous assessments of the depth of heterogeneity in subclonal compositions supported by data. Our approach combines nonnegative matrix factorization, which takes advantage of the sparse and nonnegative nature of single-cell RNA count data, with Bayesian model comparison enabling de novo prediction of the depth of heterogeneity. We show that the method predicts the correct number of subgroups using simulated data, primary blood mononuclear cell, and pancreatic cell data. We applied our approach to a collection of single-cell tumor samples and found two qualitatively distinct classes of cell-type heterogeneity in cancer microenvironments.


2020 ◽  
Vol 12 (5) ◽  
pp. 122-138
Author(s):  
Mustafa Ozen ◽  
Tomasz Lipniacki ◽  
Andre Levchenko ◽  
Effat S Emamian ◽  
Ali Abdi

Abstract Characterization of decision-making in cells in response to received signals is of importance for understanding how cell fate is determined. The problem becomes multi-faceted and complex when we consider cellular heterogeneity and dynamics of biochemical processes. In this paper, we present a unified set of decision-theoretic, machine learning and statistical signal processing methods and metrics to model the precision of signaling decisions, in the presence of uncertainty, using single cell data. First, we introduce erroneous decisions that may result from signaling processes and identify false alarms and miss events associated with such decisions. Then, we present an optimal decision strategy which minimizes the total decision error probability. Additionally, we demonstrate how graphing receiver operating characteristic curves conveniently reveals the trade-off between false alarm and miss probabilities associated with different cell responses. Furthermore, we extend the introduced framework to incorporate the dynamics of biochemical processes and reactions in a cell, using multi-time point measurements and multi-dimensional outcome analysis and decision-making algorithms. The introduced multivariate signaling outcome modeling framework can be used to analyze several molecular species measured at the same or different time instants. We also show how the developed binary outcome analysis and decision-making approach can be extended to more than two possible outcomes. As an example and to show how the introduced methods can be used in practice, we apply them to single cell data of PTEN, an important intracellular regulatory molecule in a p53 system, in wild-type and abnormal cells. The unified signaling outcome modeling framework presented here can be applied to various organisms ranging from viruses, bacteria, yeast and lower metazoans to more complex organisms such as mammalian cells. Ultimately, this signaling outcome modeling approach can be utilized to better understand the transition from physiological to pathological conditions such as inflammation, various cancers and autoimmune diseases.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
M. Büttner ◽  
J. Ostner ◽  
C. L. Müller ◽  
F. J. Theis ◽  
B. Schubert

AbstractCompositional changes of cell types are main drivers of biological processes. Their detection through single-cell experiments is difficult due to the compositionality of the data and low sample sizes. We introduce scCODA (https://github.com/theislab/scCODA), a Bayesian model addressing these issues enabling the study of complex cell type effects in disease, and other stimuli. scCODA demonstrated excellent detection performance, while reliably controlling for false discoveries, and identified experimentally verified cell type changes that were missed in original analyses.


2019 ◽  
Author(s):  
Mustafa Ozen ◽  
Tomasz Lipniacki ◽  
Andre Levchenko ◽  
Effat S. Emamian ◽  
Ali Abdi

AbstractCharacterization of decision makings in a cell in response to received signals is of high importance for understanding how cell fate is determined. The problem becomes multi-faceted and complex when we consider cellular heterogeneity and dynamics of biochemical processes. In this paper, we present a unified set of decision-theoretic and statistical signal processing methods and metrics to model the precision of signaling decisions, given uncertainty, using single cell data. First, we introduce erroneous decisions that may result from signaling processes, and identify false alarm and miss event that are associated with such decisions. Then, we present an optimal decision strategy which minimizes the total decision error probability. The optimal decision threshold or boundary is determined using the maximum likelihood principle that chooses the hypothesis under which the data are most probable. Additionally, we demonstrate how graphing receiver operating characteristic curve conveniently reveals the trade-off between false alarm and miss probabilities associated with different cell responses. Furthermore, we extend the introduced signaling outcome modeling framework to incorporate the dynamics of biochemical processes and reactions in a cell, using multi-time point measurements and multi-dimensional outcome analysis and decision making algorithms. The introduced multivariate signaling outcome modeling framework can be used to analyze several molecular species measured at the same or different time instants. We also show how the developed binary outcome analysis and decision making approach can be extended to include more than two possible outcomes. To show how the overall set of introduced models and methods can be used in practice and as an example, we apply them to single cell data of an intracellular regulatory molecule called Phosphatase and Tensin homolog (PTEN) in a p53 system, in wild-type and abnormal, e.g., mutant cells. These molecules are involved in tumor suppression, cell cycle regulation and apoptosis. The unified signaling outcome modeling framework presented here can be applied to various organisms ranging from simple ones such as viruses, bacteria, yeast, and lower metazoans, to more complex organisms such as mammalian cells. Ultimately, this signaling outcome modeling approach can be useful for better understanding of transition from physiological to pathological conditions such as inflammation, various cancers and autoimmune diseases.Brief SummaryCells are supposed to make correct decisions, i.e., respond properly to various signals and initiate certain cellular functions, based on the signals they receive from the surrounding environment. Due to signal transduction noise, signaling malfunctions or other factors, cells may respond differently to the same input signals, which may result in incorrect cell decisions. Modeling and quantification of decision making processes and signaling outcomes in cells have emerged as important research areas in recent years. Here we present univariate and multivariate data-driven statistical models and methods for analyzing dynamic decision making processes and signaling outcomes. Furthermore, we exemplify the methods using single cell data generated by a p53 system, in wild-type and abnormal cells.


2020 ◽  
Author(s):  
Amanda F. Alexander ◽  
Hannah Forbes ◽  
Kathryn Miller-Jensen

AbstractFollowing TLR4 stimulation of macrophages, negative feedback mediated by the anti-inflammatory cytokine IL-10 limits the inflammatory response. However, extensive cell-to-cell variability in TLR4-stimulated cytokine secretion raises questions about how negative feedback is robustly implemented. To explore this, we characterized the TLR4-stimulated secretion program in primary murine macrophages using a single-cell microwell assay that enabled evaluation of functional autocrine IL-10 signaling. High-dimensional analysis of single-cell data revealed three distinct tiers of TLR4-induced proinflammatory activation based on levels of cytokine secretion. Surprisingly, while IL-10 inhibits TLR4-induced activation in the highest tier, it also contributes to the TLR4-induced activation threshold by regulating which cells transition from non-secreting to secreting states. This role for IL-10 in restraining TLR4 inflammatory activation is largely mediated by intermediate IFN-β signaling, while TNF-a likely mediates response resolution by IL-10. Thus, cell-to-cell variability in cytokine regulatory motifs provides a means to tailor the TLR4-induced inflammatory response.


2017 ◽  
Author(s):  
Lisa Weber ◽  
William Raymond ◽  
Brian Munsky

AbstractIn quantitative analyses of biological processes, one may use many different scales of models (e.g., spatial or non-spatial, deterministic or stochastic, time-varying or at steady-state) or many different approaches to match models to experimental data (e.g., model fitting or parameter uncertainty/sloppiness quantification with different experiment designs). These different analyses can lead to surprisingly different results, even when applied to the same data and the same model. We use a simplified gene regulation model to illustrate many of these concerns, especially for ODE analyses of deterministic processes, chemical master equation and finite state projection analyses of heterogeneous processes, and stochastic simulations. For each analysis, we employ Matlab and Python software to consider a time-dependent input signal (e.g., a kinase nuclear translocation) and several model hypotheses, along with simulated single-cell data. We illustrate different approaches (e.g., deterministic and stochastic) to identify the mechanisms and parameters of the same model from the same simulated data. For each approach, we explore how uncertainty in parameter space varies with respect to the chosen analysis approach or specific experiment design. We conclude with a discussion of how our simulated results relate to the integration of experimental and computational investigations to explore signal-activated gene expression models in yeast [1] and human cells [2]‡.PACS numbers: 87.10.+e, 87.15.Aa, 05.10.Gg, 05.40.Ca,02.50.-rSubmitted to: Phys. Biol.


Sign in / Sign up

Export Citation Format

Share Document