scholarly journals Model-Based Approach to the Joint Analysis of Single-Cell Data on Chromatin Accessibility and Gene Expression

2020 ◽  
Vol 35 (1) ◽  
pp. 2-13 ◽  
Author(s):  
Zhixiang Lin ◽  
Mahdi Zamanighomi ◽  
Timothy Daley ◽  
Shining Ma ◽  
Wing Hung Wong
2021 ◽  
Author(s):  
Jiaxuan Wangwu ◽  
Zexuan Sun ◽  
Zhixiang Lin

AbstractThe advancement in technologies and the growth of available single-cell datasets motivate integrative analysis of multiple single-cell genomic datasets. Integrative analysis of multimodal single-cell datasets combines complementary information offered by single-omic datasets and can offer deeper insights on complex biological process. Clustering methods that identify the unknown cell types are among the first few steps in the analysis of single-cell datasets, and they are important for downstream analysis built upon the identified cell types. We propose scAMACE for the integrative analysis and clustering of single-cell data on chromatin accessibility, gene expression and methylation. We demonstrate that cell types are better identified and characterized through analyzing the three data types jointly. We develop an efficient expectation-maximization (EM) algorithm to perform statistical inference, and evaluate our methods on both simulation study and real data applications. We also provide the GPU implementation of scAMACE, making it scalable to large datasets. The software and datasets are available at https://github.com/cuhklinlab/scAMACE_py (pythom implementation) and https://github.com/cuhklinlab/scAMACE (R implementation).


2018 ◽  
Author(s):  
Tim Stuart ◽  
Andrew Butler ◽  
Paul Hoffman ◽  
Christoph Hafemeister ◽  
Efthymia Papalexi ◽  
...  

Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to “anchor” diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets.Availability: Installation instructions, documentation, and tutorials are available at: https://www.satijalab.org/seurat


2021 ◽  
Author(s):  
Boying Gong ◽  
Yun Zhou ◽  
Elizabeth Purdom

AbstractSingle-cell measurements of different cellular features or modalities from cells from the same system allow for a comprehensive understanding of a biological process. While the most common single-cell sequencing technologies require separate input cells for different modalities, there are a growing number of platforms that allow for measuring several modalities on a single cell. We present a novel method, Cobolt, for analyzing such multi-modality single-cell sequencing datasets. Cobolt jointly models the multiple modalities via a novel application of Multimodal Variational Autoencoder (MVAE) to a hierarchical generative model. We first demonstrate its performance on data from the multi-modality platform SNARE-seq, consisting of measurements of gene expression and chromatin accessibility on the same cells. We then illustrate the ability of Cobolt to integrate multi-modality platforms with single-modality platforms by jointly analyzing a SNARE-seq dataset, a single-cell gene expression dataset, and a single-cell chromatin accessibility dataset. We compared Cobolt with current options for analyzing such datasets and show that Cobolt provides robust and flexible results for integration of single-cell data on multiple modalities.


2021 ◽  
Author(s):  
Yang Xu ◽  
Edmon Begoli ◽  
Rachel Patton McCord

The booming single-cell technologies bring a surge of high dimensional data that come from different sources and represent cellular systems from different views. With advances in single-cell technologies, integrating single-cell data across modalities arises as a new computational challenge and gains more and more attention within the community. Here, we present a novel adversarial approach, sciCAN, to integrate single-cell chromatin accessibility and gene expression data in an unsupervised manner. We benchmarked sciCAN with 3 state-of-the-art (SOTA) methods in 5 scATAC-seq/scRNA-seq datasets, and we demonstrated that our method dealt with data integration with better balance of mutual transferring between modalities than the other 3 SOTA methods. We further applied sciCAN to 10X Multiome data and confirmed the integrated representation preserves information of the hematopoietic hierarchy. Finally, we investigated CRSIPR-perturbed single-cell K562 ATAC-seq and RNA-seq data to identify cells with related responses to different perturbations in these different modalities.


2021 ◽  
Vol 22 (S3) ◽  
Author(s):  
Yuanyuan Li ◽  
Ping Luo ◽  
Yi Lu ◽  
Fang-Xiang Wu

Abstract Background With the development of the technology of single-cell sequence, revealing homogeneity and heterogeneity between cells has become a new area of computational systems biology research. However, the clustering of cell types becomes more complex with the mutual penetration between different types of cells and the instability of gene expression. One way of overcoming this problem is to group similar, related single cells together by the means of various clustering analysis methods. Although some methods such as spectral clustering can do well in the identification of cell types, they only consider the similarities between cells and ignore the influence of dissimilarities on clustering results. This methodology may limit the performance of most of the conventional clustering algorithms for the identification of clusters, it needs to develop special methods for high-dimensional sparse categorical data. Results Inspired by the phenomenon that same type cells have similar gene expression patterns, but different types of cells evoke dissimilar gene expression patterns, we improve the existing spectral clustering method for clustering single-cell data that is based on both similarities and dissimilarities between cells. The method first measures the similarity/dissimilarity among cells, then constructs the incidence matrix by fusing similarity matrix with dissimilarity matrix, and, finally, uses the eigenvalues of the incidence matrix to perform dimensionality reduction and employs the K-means algorithm in the low dimensional space to achieve clustering. The proposed improved spectral clustering method is compared with the conventional spectral clustering method in recognizing cell types on several real single-cell RNA-seq datasets. Conclusions In summary, we show that adding intercellular dissimilarity can effectively improve accuracy and achieve robustness and that improved spectral clustering method outperforms the traditional spectral clustering method in grouping cells.


Author(s):  
Xiangtao Li ◽  
Shaochuan Li ◽  
Lei Huang ◽  
Shixiong Zhang ◽  
Ka-chun Wong

Abstract Single-cell RNA sequencing (scRNA-seq) technologies have been heavily developed to probe gene expression profiles at single-cell resolution. Deep imputation methods have been proposed to address the related computational challenges (e.g. the gene sparsity in single-cell data). In particular, the neural architectures of those deep imputation models have been proven to be critical for performance. However, deep imputation architectures are difficult to design and tune for those without rich knowledge of deep neural networks and scRNA-seq. Therefore, Surrogate-assisted Evolutionary Deep Imputation Model (SEDIM) is proposed to automatically design the architectures of deep neural networks for imputing gene expression levels in scRNA-seq data without any manual tuning. Moreover, the proposed SEDIM constructs an offline surrogate model, which can accelerate the computational efficiency of the architectural search. Comprehensive studies show that SEDIM significantly improves the imputation and clustering performance compared with other benchmark methods. In addition, we also extensively explore the performance of SEDIM in other contexts and platforms including mass cytometry and metabolic profiling in a comprehensive manner. Marker gene detection, gene ontology enrichment and pathological analysis are conducted to provide novel insights into cell-type identification and the underlying mechanisms. The source code is available at https://github.com/li-shaochuan/SEDIM.


2019 ◽  
Vol 17 (06) ◽  
pp. 1950035
Author(s):  
Huiqing Wang ◽  
Yuanyuan Lian ◽  
Chun Li ◽  
Yue Ma ◽  
Zhiliang Yan ◽  
...  

As a tool of interpreting and analyzing genetic data, gene regulatory network (GRN) could reveal regulatory relationships between genes, proteins, and small molecules, as well as understand physiological activities and functions within biological cells, interact in pathways, and how to make changes in the organism. Traditional GRN research focuses on the analysis of the regulatory relationships through the average of cellular gene expressions. These methods are difficult to identify the cell heterogeneity of gene expression. Existing methods for inferring GRN using single-cell transcriptional data lack expression information when genes reach steady state, and the high dimensionality of single-cell data leads to high temporal and spatial complexity of the algorithm. In order to solve the problem in traditional GRN inference methods, including the lack of cellular heterogeneity information, single-cell data complexity and lack of steady-state information, we propose a method for GRN inference using single-cell transcription and gene knockout data, called SINgle-cell transcription data-KNOckout data (SIN-KNO), which focuses on combining dynamic and steady-state information of regulatory relationship contained in gene expression. Capturing cell heterogeneity information could help understand the gene expression difference in different cells. So, we could observe gene expression changes more accurately. Gene knockout data could observe the gene expression levels at steady-state of all other genes when one gene is knockout. Classifying the genes before analyzing the single-cell data could determine a large number of non-existent regulation, greatly reducing the number of regulation required for inference. In order to show the efficiency, the proposed method has been compared with several typical methods in this area including GENIE3, JUMP3, and SINCERITIES. The results of the evaluation indicate that the proposed method can analyze the diversified information contained in the two types of data, establish a more accurate gene regulation network, and improve the computational efficiency. The method provides a new thinking for dealing with large datasets and high computational complexity of single-cell data in the GRN inference.


Blood ◽  
2019 ◽  
Vol 134 (Supplement_1) ◽  
pp. 575-575
Author(s):  
Alexandra M Poos ◽  
Jan-Philipp Mallm ◽  
Stephan M Tirier ◽  
Nicola Casiraghi ◽  
Hana Susak ◽  
...  

Introduction: Multiple myeloma (MM) is a heterogeneous malignancy of clonal plasma cells that accumulate in the bone marrow (BM). Despite new treatment approaches, in most patients resistant subclones are selected by therapy, resulting in the development of refractory disease. While the subclonal architecture in newly diagnosed patients has been investigated in great detail, intra-tumor heterogeneity in relapsed/refractory (RR) MM is poorly characterized. Recent technological and computational advances provide the opportunity to systematically analyze tumor samples at single-cell (sc) level with high accuracy and througput. Here, we present a pilot study for an integrative analysis of sc Assay for Transposase-Accessible Chromatin with high-throughput sequencing (scATAC-seq) and scRNA-seq with the aim to comprehensively study the regulatory landscape, gene expression, and evolution of individual subclones in RRMM patients. Methods: We have included 20 RRMM patients with longitudinally collected paired BM samples. scATAC- and scRNA-seq data were generated using the 10X Genomics platform. Pre-processing of the sc-seq data was performed with the CellRanger software (reference genome GRCh38). For downstream analyses the R-packages Seurat and Signac (Satija Lab) as well as Cicero (Trapnell Lab) were used. For all patients bulk whole genome sequencing (WGS) data was available, which we used for confirmatory studies of intra-tumor heterogeneity. Results: A comprehensive study at the sc level requires extensive quality controls (QC). All scATAC-seq files passed the QC, including the detected number of cells, number of fragments in peaks or the ratio of mononucleosomal to nucleosome-free fragments. Yet, unsupervised clustering of the differentially accessible regions resulted in two main clusters, strongly associated with sample processing time. Delay of sample processing by 1-2 days, e.g. due to shipment from participating centers, resulted in global change of chromatin accessibility with more than 10,000 regions showing differences compared to directly processed samples. The corresponding scRNA-seq files also consistently failed QC, including detectable genes per cell and the percentage of mitochondrial RNA. We excluded these samples from the study. Analysing scATAC-seq data, we observed distinct clusters before and after treatment of RRMM, indicating clonal adaptation or selection in all samples. Treatment with carfilzomib resulted in highly increased co-accessibility and >100 genes were differentially accessible upon treatment. These genes are related to the activation of immune cells (including T-, and B-cells), cell-cell adhesion, apoptosis and signaling pathways (e.g. NFκB) and include several chaperone proteins (e.g. HSPH1) which were upregulated in the scRNA-seq data upon proteasome inhibition. The power of our comprehensive approach for detection of individual subclones and their evolution is exemplarily illustrated in a patient who was treated with a MEK inhibitor and achieved complete remission. This patient showed two main clusters in the scATAC-seq data before treatment, suggesting presence of two subclones. Using copy number profiles based on WGS and scRNA-seq data and performing a trajectory analysis based on scATAC-seq data, we could confirm two different subclones. At relapse, a seemingly independent dominant clone emerged. Upon comprehensive integration of the datasets, one of the initial subclones could be identified as the precursor of this dominant clone. We observed increased accessibility for 108 regions (e.g. JUND, HSPA5, EGR1, FOSB, ETS1, FOXP2) upon MEK inhibition. The most significant differentially accessible region in this clone and its precursor included the gene coding for krüppel-like factor 2 (KLF2). scRNA-seq data showed overexpression of KLF2 in the MEK-inhibitor resistant clone, confirming KLF2 scATAC-seq data. KLF2 has been reported to play an essential role together with KDM3A and IRF1 for MM cell survival and adhesion to stromal cells in the BM. Conclusions: Our data strongly suggest to use only immediately processed samples for single cell technologies. Integrating scATAC- and scRNA-seq together with bulk WGS data showed that detection of individual clones and longitudinal changes in the activity of cis-regulatory regions and gene expression is feasible and informative in RRMM. Disclosures Goldschmidt: John-Hopkins University: Research Funding; Novartis: Membership on an entity's Board of Directors or advisory committees, Research Funding; John-Hopkins University: Research Funding; Bristol-Myers Squibb: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Mundipharma: Research Funding; Takeda: Membership on an entity's Board of Directors or advisory committees, Research Funding; MSD: Research Funding; Molecular Partners: Research Funding; Dietmar-Hopp-Stiftung: Research Funding; Janssen: Consultancy, Research Funding; Chugai: Honoraria, Research Funding; Janssen: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Sanofi: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Amgen: Consultancy, Research Funding; Celgene: Honoraria, Membership on an entity's Board of Directors or advisory committees, Research Funding; Adaptive Biotechnology: Membership on an entity's Board of Directors or advisory committees.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Hiroko Nomaru ◽  
Yang Liu ◽  
Christopher De Bono ◽  
Dario Righelli ◽  
Andrea Cirino ◽  
...  

AbstractThe poles of the heart and branchiomeric muscles of the face and neck are formed from the cardiopharyngeal mesoderm within the pharyngeal apparatus. They are disrupted in patients with 22q11.2 deletion syndrome, due to haploinsufficiency of TBX1, encoding a T-box transcription factor. Here, using single cell RNA-sequencing, we now identify a multilineage primed population within the cardiopharyngeal mesoderm, marked by Tbx1, which has bipotent properties to form cardiac and branchiomeric muscle cells. The multilineage primed cells are localized within the nascent mesoderm of the caudal lateral pharyngeal apparatus and provide a continuous source of cardiopharyngeal mesoderm progenitors. Tbx1 regulates the maturation of multilineage primed progenitor cells to cardiopharyngeal mesoderm derivatives while restricting ectopic non-mesodermal gene expression. We further show that TBX1 confers this balance of gene expression by direct and indirect regulation of enriched genes in multilineage primed progenitors and downstream pathways, partly through altering chromatin accessibility, the perturbation of which can lead to congenital defects in individuals with 22q11.2 deletion syndrome.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Giancarlo Bonora ◽  
Vijay Ramani ◽  
Ritambhara Singh ◽  
He Fang ◽  
Dana L. Jackson ◽  
...  

Abstract Background Mammalian development is associated with extensive changes in gene expression, chromatin accessibility, and nuclear structure. Here, we follow such changes associated with mouse embryonic stem cell differentiation and X inactivation by integrating, for the first time, allele-specific data from these three modalities obtained by high-throughput single-cell RNA-seq, ATAC-seq, and Hi-C. Results Allele-specific contact decay profiles obtained by single-cell Hi-C clearly show that the inactive X chromosome has a unique profile in differentiated cells that have undergone X inactivation. Loss of this inactive X-specific structure at mitosis is followed by its reappearance during the cell cycle, suggesting a “bookmark” mechanism. Differentiation of embryonic stem cells to follow the onset of X inactivation is associated with changes in contact decay profiles that occur in parallel on both the X chromosomes and autosomes. Single-cell RNA-seq and ATAC-seq show evidence of a delay in female versus male cells, due to the presence of two active X chromosomes at early stages of differentiation. The onset of the inactive X-specific structure in single cells occurs later than gene silencing, consistent with the idea that chromatin compaction is a late event of X inactivation. Single-cell Hi-C highlights evidence of discrete changes in nuclear structure characterized by the acquisition of very long-range contacts throughout the nucleus. Novel computational approaches allow for the effective alignment of single-cell gene expression, chromatin accessibility, and 3D chromosome structure. Conclusions Based on trajectory analyses, three distinct nuclear structure states are detected reflecting discrete and profound simultaneous changes not only to the structure of the X chromosomes, but also to that of autosomes during differentiation. Our study reveals that long-range structural changes to chromosomes appear as discrete events, unlike progressive changes in gene expression and chromatin accessibility.


Sign in / Sign up

Export Citation Format

Share Document