scholarly journals cgCorrect: A method to correct for confounding cell-cell variation due to cell growth in single-cell transcriptomics

2016 ◽  
Author(s):  
Thomas Blasi ◽  
Florian Buettner ◽  
Michael K. Strasser ◽  
Carsten Marr ◽  
Fabian J. Theis

AbstractMotivation: Accessing gene expression at the single cell level has unraveled often large heterogeneity among seemingly homogeneous cells, which remained obscured in traditional population based approaches. The computational analysis of single-cell transcriptomics data, however, still imposes unresolved challenges with respect to normalization, visualization and modeling the data. One such issue are differences in cell size, which introduce additional variability into the data, for which appropriate normalization techniques are needed. Otherwise, these differences in cell size may obscure genuine heterogeneities among cell populations and lead to overdispersed steady-state distributions of mRNA transcript numbers.Results: We present cgCorrect, a statistical framework to correct for differences in cell size that are due to cell growth in single-cell transcriptomics data. We derive the probability for the cell growth corrected mRNA transcript number given the measured, cell size dependent mRNA transcript number, based on the assumption that the average number of transcripts in a cell increases proportional to the cell’s volume during cell cycle. cgCorrect can be used for both data normalization, and to analyze steady-state distributions used to infer the gene expression mechanism. We demonstrate its applicability on both simulated data and single-cell quantitative real-time PCR data from mouse blood stem and progenitor cells. We show that correcting for differences in cell size affects the interpretation of the data obtained by typically performed computational analysis.Availability: A Matlab implementation of cgCorrect is available at http://icb.helmholtz-muenchen.de/cgCorrectSupplementary information: Supplementary information are available online. The simulated data set is available at http://icb.helmholtz-muenchen.de/cgCorrect

Author(s):  
Lyla Atta ◽  
Arpan Sahoo ◽  
Jean Fan

Abstract Motivation Single-cell transcriptomics profiling technologies enable genome-wide gene expression measurements in individual cells but can currently only provide a static snapshot of cellular transcriptional states. RNA velocity analysis can help infer cell state changes using such single-cell transcriptomics data. To interpret these cell state changes inferred from RNA velocity analysis as part of underlying cellular trajectories, current approaches rely on visualization with principal components, t-distributed stochastic neighbor embedding and other 2D embeddings derived from the observed single-cell transcriptional states. However, these 2D embeddings can yield different representations of the underlying cellular trajectories, hindering the interpretation of cell state changes. Results We developed VeloViz to create RNA velocity-informed 2D and 3D embeddings from single-cell transcriptomics data. Using both real and simulated data, we demonstrate that VeloViz embeddings are able to capture underlying cellular trajectories across diverse trajectory topologies, even when intermediate cell states may be missing. By considering the predicted future transcriptional states from RNA velocity analysis, VeloViz can help visualize a more reliable representation of underlying cellular trajectories. Availability and implementation Source code is available on GitHub (https://github.com/JEFworks-Lab/veloviz) and Bioconductor (https://bioconductor.org/packages/veloviz) with additional tutorials at https://JEF.works/veloviz/. Datasets used can be found on Zenodo (https://doi.org/10.5281/zenodo.4632471). Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 87 (6) ◽  
Author(s):  
Katsuya Fuchino ◽  
Helena Chan ◽  
Ling Chin Hwang ◽  
Per Bruheim

ABSTRACT The alphaproteobacterium Zymomonas mobilis exhibits extreme ethanologenic physiology, making this species a promising biofuel producer. Numerous studies have investigated its biology relevant to industrial applications and mostly at the population level. However, the organization of single cells in this industrially important polyploid species has been largely uncharacterized. In the present study, we characterized basic cellular behavior of Z. mobilis strain Zm6 under anaerobic conditions at the single-cell level. We observed that growing Z. mobilis cells often divided at a nonmidcell position, which contributed to variant cell size at birth. However, the cell size variance was regulated by a modulation of cell cycle span, mediated by a correlation of bacterial tubulin homologue FtsZ ring accumulation with cell growth. The Z. mobilis culture also exhibited heterogeneous cellular DNA content among individual cells, which might have been caused by asynchronous replication of chromosome that was not coordinated with cell growth. Furthermore, slightly angled divisions might have resulted in temporary curvatures of attached Z. mobilis cells. Overall, the present study uncovers a novel bacterial cell organization in Z. mobilis. IMPORTANCE With increasing environmental concerns about the use of fossil fuels, development of a sustainable biofuel production platform has been attracting significant public attention. Ethanologenic Z. mobilis species are endowed with an efficient ethanol fermentation capacity that surpasses, in several respects, that of baker’s yeast (Saccharomyces cerevisiae), the most-used microorganism for ethanol production. For development of a Z. mobilis culture-based biorefinery, an investigation of its uncharacterized cell biology is important, because bacterial cellular organization and metabolism are closely associated with each other in a single cell compartment. In addition, the current work demonstrates that the polyploid bacterium Z. mobilis exhibits a distinctive mode of bacterial cell organization, likely reflecting its unique metabolism that does not prioritize incorporation of nutrients for cell growth. Thus, another significant result of this work is to advance our general understanding in the diversity of bacterial cell architecture.


2019 ◽  
Vol 17 (06) ◽  
pp. 1950035
Author(s):  
Huiqing Wang ◽  
Yuanyuan Lian ◽  
Chun Li ◽  
Yue Ma ◽  
Zhiliang Yan ◽  
...  

As a tool of interpreting and analyzing genetic data, gene regulatory network (GRN) could reveal regulatory relationships between genes, proteins, and small molecules, as well as understand physiological activities and functions within biological cells, interact in pathways, and how to make changes in the organism. Traditional GRN research focuses on the analysis of the regulatory relationships through the average of cellular gene expressions. These methods are difficult to identify the cell heterogeneity of gene expression. Existing methods for inferring GRN using single-cell transcriptional data lack expression information when genes reach steady state, and the high dimensionality of single-cell data leads to high temporal and spatial complexity of the algorithm. In order to solve the problem in traditional GRN inference methods, including the lack of cellular heterogeneity information, single-cell data complexity and lack of steady-state information, we propose a method for GRN inference using single-cell transcription and gene knockout data, called SINgle-cell transcription data-KNOckout data (SIN-KNO), which focuses on combining dynamic and steady-state information of regulatory relationship contained in gene expression. Capturing cell heterogeneity information could help understand the gene expression difference in different cells. So, we could observe gene expression changes more accurately. Gene knockout data could observe the gene expression levels at steady-state of all other genes when one gene is knockout. Classifying the genes before analyzing the single-cell data could determine a large number of non-existent regulation, greatly reducing the number of regulation required for inference. In order to show the efficiency, the proposed method has been compared with several typical methods in this area including GENIE3, JUMP3, and SINCERITIES. The results of the evaluation indicate that the proposed method can analyze the diversified information contained in the two types of data, establish a more accurate gene regulation network, and improve the computational efficiency. The method provides a new thinking for dealing with large datasets and high computational complexity of single-cell data in the GRN inference.


2019 ◽  
Author(s):  
Pengchao Ye ◽  
Wenbin Ye ◽  
Congting Ye ◽  
Shuchao Li ◽  
Lishan Ye ◽  
...  

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) is fast and becoming a powerful technique for studying dynamic gene regulation at unprecedented resolution. However, scRNA-seq data suffer from problems of extremely high dropout rate and cell-to-cell variability, demanding new methods to recover gene expression loss. Despite the availability of various dropout imputation approaches for scRNA-seq, most studies focus on data with a medium or large number of cells, while few studies have explicitly investigated the differential performance across different sample sizes or the applicability of the approach on small or imbalanced data. It is imperative to develop new imputation approaches with higher generalizability for data with various sample sizes. Results We proposed a method called scHinter for imputing dropout events for scRNA-seq with special emphasis on data with limited sample size. scHinter incorporates a voting-based ensemble distance and leverages the synthetic minority oversampling technique for random interpolation. A hierarchical framework is also embedded in scHinter to increase the reliability of the imputation for small samples. We demonstrated the ability of scHinter to recover gene expression measurements across a wide spectrum of scRNA-seq datasets with varied sample sizes. We comprehensively examined the impact of sample size and cluster number on imputation. Comprehensive evaluation of scHinter across diverse scRNA-seq datasets with imbalanced or limited sample size showed that scHinter achieved higher and more robust performance than competing approaches, including MAGIC, scImpute, SAVER and netSmooth. Availability and implementation Freely available for download at https://github.com/BMILAB/scHinter. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Bong-Hyun Kim ◽  
Kijin Yu ◽  
Peter C W Lee

Abstract Motivation Cancer classification based on gene expression profiles has provided insight on the causes of cancer and cancer treatment. Recently, machine learning-based approaches have been attempted in downstream cancer analysis to address the large differences in gene expression values, as determined by single-cell RNA sequencing (scRNA-seq). Results We designed cancer classifiers that can identify 21 types of cancers and normal tissues based on bulk RNA-seq as well as scRNA-seq data. Training was performed with 7398 cancer samples and 640 normal samples from 21 tumors and normal tissues in TCGA based on the 300 most significant genes expressed in each cancer. Then, we compared neural network (NN), support vector machine (SVM), k-nearest neighbors (kNN) and random forest (RF) methods. The NN performed consistently better than other methods. We further applied our approach to scRNA-seq transformed by kNN smoothing and found that our model successfully classified cancer types and normal samples. Availability and implementation Cancer classification by neural network. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Charlotte N. Miller ◽  
Jack Dumenil ◽  
Fu Hao Lu ◽  
Caroline Smith ◽  
Neil McKenzie ◽  
...  

Abstract Background The same species of plant can exhibit very diverse sizes and shapes of organs that are genetically determined. Characterising genetic variation underlying this morphological diversity is an important objective in evolutionary studies and it also helps identify the functions of genes influencing plant growth and development. Extensive screens of mutagenised Arabidopsis populations have identified multiple genes and mechanisms affecting organ size and shape, but relatively few studies have exploited the rich diversity of natural populations to identify genes involved in growth control. Results We screened a relatively well characterised collection of Arabidopsis thaliana accessions for variation in petal size. Association analyses identified sequence and gene expression variation on chromosome 4 that made a substantial contribution to differences in petal area. Variation in the expression of a previously uncharacterised gene At4g16850 (named as KSK) had a substantial role on variation in organ size by influencing cell size. Over-expression of KSK led to larger petals with larger cells and promoted the formation of stamenoid features. The expression of auxin-responsive genes known to limit cell growth was reduced in response to KSK over-expression. ANT expression was also reduced in KSK over-expression lines, consistent with altered floral identities. Auxin responses were reduced in KSK over-expressing cells, consistent with changes in auxin-responsive gene expression. KSK may therefore influence auxin responses during petal development. Conclusions Understanding how genetic variation influences plant growth is important for both evolutionary and mechanistic studies. We used natural populations of Arabidopsis thaliana to identify sequence variation in a promoter region of Arabidopsis accessions that mediated differences in the expression of a previously uncharacterised membrane protein. This variation contributed to altered auxin responses and cell size during petal growth.


2020 ◽  
Vol 36 (13) ◽  
pp. 4021-4029
Author(s):  
Hyundoo Jeong ◽  
Zhandong Liu

Abstract Summary Single-cell RNA sequencing technology provides a novel means to analyze the transcriptomic profiles of individual cells. The technique is vulnerable, however, to a type of noise called dropout effects, which lead to zero-inflated distributions in the transcriptome profile and reduce the reliability of the results. Single-cell RNA sequencing data, therefore, need to be carefully processed before in-depth analysis. Here, we describe a novel imputation method that reduces dropout effects in single-cell sequencing. We construct a cell correspondence network and adjust gene expression estimates based on transcriptome profiles for the local subnetwork of cells of the same type. We comprehensively evaluated this method, called PRIME (PRobabilistic IMputation to reduce dropout effects in Expression profiles of single-cell sequencing), on synthetic and eight real single-cell sequencing datasets and verified that it improves the quality of visualization and accuracy of clustering analysis and can discover gene expression patterns hidden by noise. Availability and implementation The source code for the proposed method is freely available at https://github.com/hyundoo/PRIME. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (15) ◽  
pp. 4233-4239
Author(s):  
Di Ran ◽  
Shanshan Zhang ◽  
Nicholas Lytal ◽  
Lingling An

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) has become an important tool to unravel cellular heterogeneity, discover new cell (sub)types, and understand cell development at single-cell resolution. However, one major challenge to scRNA-seq research is the presence of ‘drop-out’ events, which usually is due to extremely low mRNA input or the stochastic nature of gene expression. In this article, we present a novel single-cell RNA-seq drop-out correction (scDoc) method, imputing drop-out events by borrowing information for the same gene from highly similar cells. Results scDoc is the first method that directly involves drop-out information to accounting for cell-to-cell similarity estimation, which is crucial in scRNA-seq drop-out imputation but has not been appropriately examined. We evaluated the performance of scDoc using both simulated data and real scRNA-seq studies. Results show that scDoc outperforms the existing imputation methods in reference to data visualization, cell subpopulation identification and differential expression detection in scRNA-seq data. Availability and implementation R code is available at https://github.com/anlingUA/scDoc. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Anissa Guillemin ◽  
Angelique Richard ◽  
Sandrine Gonin-Giraud ◽  
Olivier Gandrillon

AbstractRecent rise of single-cell studies revealed the importance of understanding the role of cell-to-cell variability, especially at the transcriptomic level. One of the numerous sources of cell-to-cell variation in gene expression is the heterogeneity in cell proliferation state. How cell cycle and cell size influences gene expression variability at single-cell level is not yet clearly understood. To deconvolute such influences, most of the single-cell studies used dedicated methods that could include some bias. Here, we provide a universal and automatic toxic-free label method, compatible with single-cell high-throughput RT-qPCR. This led to an unbiased gene expression analysis and could be also used for improving single-cell tracking and imaging when combined with cell isolation. As an application for this technique, we showed that cell-to-cell variability in chicken erythroid progenitors was negligibly influenced by cell size nor cell cycle.


Sign in / Sign up

Export Citation Format

Share Document