scholarly journals Destin: toolkit for single-cell analysis of chromatin accessibility

2019 ◽  
Vol 35 (19) ◽  
pp. 3818-3820 ◽  
Author(s):  
Eugene Urrutia ◽  
Li Chen ◽  
Haibo Zhou ◽  
Yuchao Jiang

Abstract Summary Single-cell assay of transposase-accessible chromatin followed by sequencing (scATAC-seq) is an emerging new technology for the study of gene regulation with single-cell resolution. The data from scATAC-seq are unique—sparse, binary and highly variable even within the same cell type. As such, neither methods developed for bulk ATAC-seq nor single-cell RNA-seq data are appropriate. Here, we present Destin, a bioinformatic and statistical framework for comprehensive scATAC-seq data analysis. Destin performs cell-type clustering via weighted principle component analysis, weighting accessible chromatin regions by existing genomic annotations and publicly available regulomic datasets. The weights and additional tuning parameters are determined via model-based likelihood. We evaluated the performance of Destin using downsampled bulk ATAC-seq data of purified samples and scATAC-seq data from seven diverse experiments. Compared to existing methods, Destin was shown to outperform across all datasets and platforms. For demonstration, we further applied Destin to 2088 adult mouse forebrain cells and identified cell-type-specific association of previously reported schizophrenia GWAS loci. Availability and implementation Destin toolkit is freely available as an R package at https://github.com/urrutiag/destin. Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Author(s):  
Eugene Urrutia ◽  
Li Chen ◽  
Haibo Zhou ◽  
Yuchao Jiang

AbstractSummarySingle-cell assay of transposase-accessible chromatin followed by sequencing (scATAC-seq) is an emerging new technology for the study of gene regulation with single-cell resolution. The data from scATAC-seq are unique sparse, binary, and highly variable even within the same cell type. As such, neither methods developed for bulk ATAC-seq nor single-cell RNA-seq data are appropriate. Here, we present Destin, a bioinformatic and statistical framework for comprehensive scATAC-seq data analysis. Destin performs cell-type clustering via weighted principle component analysis, weighting accessible chromatin regions by existing genomic annotations and publicly available regulomic data sets. The weights and additional tuning parameters are determined via model-based likelihood. We evaluated the performance of Destin using downsampled bulk ATAC-seq data of purified samples and scATAC-seq data from seven diverse experiments. Compared to existing methods, Destin was shown to outperform across all data sets and platforms. For demonstration, we further applied Destin to 2,088 adult mouse forebrain cells and identified cell type-specific association of previously reported schizophrenia GWAS loci.AvailabilityDestin toolkit is freely available as an R package at https://github.com/urrutiag/[email protected].


Author(s):  
Yixuan Qiu ◽  
Jiebiao Wang ◽  
Jing Lei ◽  
Kathryn Roeder

Abstract Motivation Marker genes, defined as genes that are expressed primarily in a single cell type, can be identified from the single cell transcriptome; however, such data are not always available for the many uses of marker genes, such as deconvolution of bulk tissue. Marker genes for a cell type, however, are highly correlated in bulk data, because their expression levels depend primarily on the proportion of that cell type in the samples. Therefore, when many tissue samples are analyzed, it is possible to identify these marker genes from the correlation pattern. Results To capitalize on this pattern, we develop a new algorithm to detect marker genes by combining published information about likely marker genes with bulk transcriptome data in the form of a semi-supervised algorithm. The algorithm then exploits the correlation structure of the bulk data to refine the published marker genes by adding or removing genes from the list. Availability and implementation We implement this method as an R package markerpen, hosted on CRAN (https://CRAN.R-project.org/package=markerpen). Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 ◽  
Author(s):  
Zhe Cui ◽  
Ya Cui ◽  
Yan Gao ◽  
Tao Jiang ◽  
Tianyi Zang ◽  
...  

Single-cell Assay Transposase Accessible Chromatin sequencing (scATAC-seq) has been widely used in profiling genome-wide chromatin accessibility in thousands of individual cells. However, compared with single-cell RNA-seq, the peaks of scATAC-seq are much sparser due to the lower copy numbers (diploid in humans) and the inherent missing signals, which makes it more challenging to classify cell type based on specific expressed gene or other canonical markers. Here, we present svmATAC, a support vector machine (SVM)-based method for accurately identifying cell types in scATAC-seq datasets by enhancing peak signal strength and imputing signals through patterns of co-accessibility. We applied svmATAC to several scATAC-seq data from human immune cells, human hematopoietic system cells, and peripheral blood mononuclear cells. The benchmark results showed that svmATAC is free of literature-based markers and robust across datasets in different libraries and platforms. The source code of svmATAC is available at https://github.com/mrcuizhe/svmATAC under the MIT license.


2017 ◽  
Author(s):  
Bo Wang ◽  
Daniele Ramazzotti ◽  
Luca De Sano ◽  
Junjie Zhu ◽  
Emma Pierson ◽  
...  

AbstractMotivationWe here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a cell-to-cell similarity measure from single-cell RNA-seq data. SIMLR can be effectively used to perform tasks such as dimension reduction, clustering, and visualization of heterogeneous populations of cells. SIMLR was benchmarked against state-of-the-art methods for these three tasks on several public datasets, showing it to be scalable and capable of greatly improving clustering performance, as well as providing valuable insights by making the data more interpretable via better a visualization.Availability and ImplementationSIMLR is available on GitHub in both R and MATLAB implementations. Furthermore, it is also available as an R package on [email protected] or [email protected] InformationSupplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Ilya Korsunsky ◽  
Aparna Nathan ◽  
Nghia Millard ◽  
Soumya Raychaudhuri

AbstractSummaryThe related Wilcoxon rank sum test and area under the receiver operator curve are ubiquitous in high dimensional biological data analysis. Current implementations do not scale readily to the increasingly large datasets generated by novel high-throughput technologies, such as single cell RNAseq. We introduce a simple and scalable implementation of both analyses, available through the R package Presto. Presto scales to big datasets, with functions optimized for both dense and sparse matrices. On a sparse dataset of 1 million observations, 10 groups, and 1,000 features, Presto performed both rank-sum and auROC analyses in only 17 seconds, compared to 6.4 hours with base R functions. Presto also includes functions to seamlessly integrate with the Seurat single cell analysis pipeline and the Bioconductor SingleCellExperiment class. Presto enables the use of robust classical analyses on big data with a simple interface and optimized implementation.Availability and ImplementationPresto is available as an R package at https://github.com/immunogenomics/[email protected] InformationVignettes are available with the Presto package.


2019 ◽  
Author(s):  
Lucas T. Graybuck ◽  
Tanya L. Daigle ◽  
Adriana E. Sedeño-Cortés ◽  
Miranda Walker ◽  
Brian Kalmbach ◽  
...  

SummaryThe rapid pace of cell type identification by new single-cell analysis methods has not been met with efficient experimental access to the newly discovered types. To enable flexible and efficient access to specific neural populations in the mouse cortex, we collected chromatin accessibility data from individual cells and clustered the single-cell data to identify enhancers specific for cell classes and subclasses. When cloned into adeno-associated viruses (AAVs) and delivered to the brain by retro-orbital injections, these enhancers drive transgene expression in specific cell subclasses in the cortex. We characterize several enhancer viruses in detail to show that they result in labeling of different projection neuron subclasses in mouse cortex, and that one of them can be used to label the homologous projection neuron subclass in human cortical slices. To enable the combinatorial labeling of more than one cell type by enhancer viruses, we developed a three-color Cre-, Flp- and Nigri-recombinase dependent reporter mouse line, Ai213. The delivery of three enhancer viruses driving these recombinases via a single retroorbital injection into a single Ai213 transgenic mouse results in labeling of three different neuronal classes/subclasses in the same brain tissue. This approach combines unprecedented flexibility with specificity for investigation of cell types in the mouse brain and beyond.


Author(s):  
Tobias Tekath ◽  
Martin Dugas

Abstract Motivation Each year, the number of published bulk and single-cell RNA-seq data sets is growing exponentially. Studies analyzing such data are commonly looking at gene-level differences, while the collected RNA-seq data inherently represents reads of transcript isoform sequences. Utilizing transcriptomic quantifiers, RNA-seq reads can be attributed to specific isoforms, allowing for analysis of transcript-level differences. A differential transcript usage (DTU) analysis is testing for proportional differences in a gene’s transcript composition, and has been of rising interest for many research questions, such as analysis of differential splicing or cell type identification. Results We present the R package DTUrtle, the first DTU analysis workflow for both bulk and single-cell RNA-seq data sets, and the first package to conduct a ‘classical’ DTU analysis in a single-cell context. DTUrtle extends established statistical frameworks, offers various result aggregation and visualization options and a novel detection probability score for tagged-end data. It has been successfully applied to bulk and single-cell RNA-seq data of human and mouse, confirming and extending key results. Additionally, we present novel potential DTU applications like the identification of cell type specific transcript isoforms as biomarkers. Availability The R package DTUrtle is available at https://github.com/TobiTekath/DTUrtle with extensive vignettes and documentation at https://tobitekath.github.io/DTUrtle/. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Zeinab Navidi Ghaziani ◽  
Lin Zhang ◽  
Bo Wang

Single-cell Assay for Transposase-Accessible Chromatin sequencing (scATAC-seq) identifies regulated chromatin accessibility modules at the single-cell resolution. Robust evaluation is critical to the development of scATAC-seq pipelines, which calls for reproducible datasets for benchmarking. We hereby present the simATAC framework, an R package that generates a scATAC-seq count matrix, highly resembling real scATAC-seq datasets in library size, sparsity, and averaged chromatin accessibility signals. simATAC deploys statistical functions derived from analyzing 90 real scATAC-seq cell groups to model read distributions. simATAC provides a robust and systematic approach to generate in silico scATAC-seq samples with cell labels for a comprehensive tool assessment.


2019 ◽  
Vol 35 (14) ◽  
pp. i436-i445 ◽  
Author(s):  
Gregor Sturm ◽  
Francesca Finotello ◽  
Florent Petitprez ◽  
Jitao David Zhang ◽  
Jan Baumbach ◽  
...  

Abstract Motivation The composition and density of immune cells in the tumor microenvironment (TME) profoundly influence tumor progression and success of anti-cancer therapies. Flow cytometry, immunohistochemistry staining or single-cell sequencing are often unavailable such that we rely on computational methods to estimate the immune-cell composition from bulk RNA-sequencing (RNA-seq) data. Various methods have been proposed recently, yet their capabilities and limitations have not been evaluated systematically. A general guideline leading the research community through cell type deconvolution is missing. Results We developed a systematic approach for benchmarking such computational methods and assessed the accuracy of tools at estimating nine different immune- and stromal cells from bulk RNA-seq samples. We used a single-cell RNA-seq dataset of ∼11 000 cells from the TME to simulate bulk samples of known cell type proportions, and validated the results using independent, publicly available gold-standard estimates. This allowed us to analyze and condense the results of more than a hundred thousand predictions to provide an exhaustive evaluation across seven computational methods over nine cell types and ∼1800 samples from five simulated and real-world datasets. We demonstrate that computational deconvolution performs at high accuracy for well-defined cell-type signatures and propose how fuzzy cell-type signatures can be improved. We suggest that future efforts should be dedicated to refining cell population definitions and finding reliable signatures. Availability and implementation A snakemake pipeline to reproduce the benchmark is available at https://github.com/grst/immune_deconvolution_benchmark. An R package allows the community to perform integrated deconvolution using different methods (https://grst.github.io/immunedeconv). Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Xi Chen ◽  
Ricardo J Miragaia ◽  
Kedar Nath Natarajan ◽  
Sarah A Teichmann

AbstractThe assay for transposase-accessible chromatin using sequencing (ATAC-seq) is widely used to identify regulatory regions throughout the genome. However, very few studies have been performed at the single cell level (scATAC-seq) due to technical challenges. Here we developed a simple and robust plate-based scATAC-seq method, combining upfront bulk Tn5 tagging with single-nuclei sorting. We demonstrated that our method worked robustly across various systems, including fresh and cryopreserved cells from primary tissues. By profiling over 3,000 splenocytes, we identify distinct immune cell types and reveal cell type-specific regulatory regions and related transcription factors.


Sign in / Sign up

Export Citation Format

Share Document