scholarly journals sepal: identifying transcript profiles with spatial patterns by diffusion-based modeling

Author(s):  
Alma Andersson ◽  
Joakim Lundeberg

Abstract Motivation Collection of spatial signals in large numbers has become a routine task in multiple omics-fields, but parsing of these rich datasets still pose certain challenges. In whole or near-full transcriptome spatial techniques, spurious expression profiles are intermixed with those exhibiting an organized structure. To distinguish profiles with spatial patterns from the background noise, a metric that enables quantification of spatial structure is desirable. Current methods designed for similar purposes tend to be built around a framework of statistical hypothesis testing, hence we were compelled to explore a fundamentally different strategy. Results We propose an unexplored approach to analyze spatial transcriptomics data, simulating diffusion of individual transcripts to extract genes with spatial patterns. The method performed as expected when presented with synthetic data. When applied to real data, it identified genes with distinct spatial profiles, involved in key biological processes or characteristic for certain cell types. Compared to existing methods, ours seemed to be less informed by the genes’ expression levels and showed better time performance when run with multiple cores. Availabilityand implementation Open-source Python package with a command line interface (CLI), freely available at https://github.com/almaan/sepal under an MIT licence. A mirror of the GitHub repository can be found at Zenodo, doi: 10.5281/zenodo.4573237. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (11) ◽  
pp. 3431-3438
Author(s):  
Ziyi Li ◽  
Zhenxing Guo ◽  
Ying Cheng ◽  
Peng Jin ◽  
Hao Wu

Abstract Motivation In the analysis of high-throughput omics data from tissue samples, estimating and accounting for cell composition have been recognized as important steps. High cost, intensive labor requirements and technical limitations hinder the cell composition quantification using cell-sorting or single-cell technologies. Computational methods for cell composition estimation are available, but they are either limited by the availability of a reference panel or suffer from low accuracy. Results We introduce TOols for the Analysis of heterogeneouS Tissues TOAST/-P and TOAST/+P, two partial reference-free algorithms for estimating cell composition of heterogeneous tissues based on their gene expression profiles. TOAST/-P and TOAST/+P incorporate additional biological information, including cell-type-specific markers and prior knowledge of compositions, in the estimation procedure. Extensive simulation studies and real data analyses demonstrate that the proposed methods provide more accurate and robust cell composition estimation than existing methods. Availability and implementation The proposed methods TOAST/-P and TOAST/+P are implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Daiwei Tang ◽  
Seyoung Park ◽  
Hongyu Zhao

Abstract Motivation A number of computational methods have been proposed recently to profile tumor microenvironment (TME) from bulk RNA data, and they have proved useful for understanding microenvironment differences among therapeutic response groups. However, these methods are not able to account for tumor proportion nor variable mRNA levels across cell types. Results In this article, we propose a Nonnegative Matrix Factorization-based Immune-TUmor MIcroenvironment Deconvolution (NITUMID) framework for TME profiling that addresses these limitations. It is designed to provide robust estimates of tumor and immune cells proportions simultaneously, while accommodating mRNA level differences across cell types. Through comprehensive simulations and real data analyses, we demonstrate that NITUMID not only can accurately estimate tumor fractions and cell types’ mRNA levels, which are currently unavailable in other methods; it also outperforms most existing deconvolution methods in regular cell type profiling accuracy. Moreover, we show that NITUMID can more effectively detect clinical and prognostic signals from gene expression profiles in tumor than other methods. Availability and implementation The algorithm is implemented in R. The source code can be downloaded at https://github.com/tdw1221/NITUMID. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
R. Gonzalo Parra ◽  
Nikolaos Papadopoulos ◽  
Laura Ahumada-Arranz ◽  
Jakob El Kholtei ◽  
Noah Mottelson ◽  
...  

AbstractAdvances in single-cell transcriptomics techniques are revolutionizing studies of cellular differentiation and heterogeneity. Consequently, it becomes possible to track the trajectory of thousands of genes across the cellular lineage trees that represent the temporal emergence of cell types during dynamic processes. However, reconstruction of cellular lineage trees with more than a few cell fates has proved challenging. We present MERLoT (https://github.com/soedinglab/merlot), a flexible and user-friendly tool to reconstruct complex lineage trees from single-cell transcriptomics data and further impute temporal gene expression profiles along the reconstructed tree structures. We demonstrate MERLoT’s capabilities on various real cases and hundreds of simulated datasets.


2019 ◽  
Vol 47 (17) ◽  
pp. 8961-8974 ◽  
Author(s):  
R Gonzalo Parra ◽  
Nikolaos Papadopoulos ◽  
Laura Ahumada-Arranz ◽  
Jakob El Kholtei ◽  
Noah Mottelson ◽  
...  

Abstract Advances in single-cell transcriptomics techniques are revolutionizing studies of cellular differentiation and heterogeneity. It has become possible to track the trajectory of thousands of genes across the cellular lineage trees that represent the temporal emergence of cell types during dynamic processes. However, reconstruction of cellular lineage trees with more than a few cell fates has proved challenging. We present MERLoT (https://github.com/soedinglab/merlot), a flexible and user-friendly tool to reconstruct complex lineage trees from single-cell transcriptomics data. It can impute temporal gene expression profiles along the reconstructed tree. We show MERLoT’s capabilities on various real cases and hundreds of simulated datasets.


2019 ◽  
Vol 35 (19) ◽  
pp. 3592-3598 ◽  
Author(s):  
Justin G Chitpin ◽  
Aseel Awdeh ◽  
Theodore J Perkins

Abstract Motivation Chromatin Immunopreciptation (ChIP)-seq is used extensively to identify sites of transcription factor binding or regions of epigenetic modifications to the genome. A key step in ChIP-seq analysis is peak calling, where genomic regions enriched for ChIP versus control reads are identified. Many programs have been designed to solve this task, but nearly all fall into the statistical trap of using the data twice—once to determine candidate enriched regions, and again to assess enrichment by classical statistical hypothesis testing. This double use of the data invalidates the statistical significance assigned to enriched regions, thus the true significance or reliability of peak calls remains unknown. Results Using simulated and real ChIP-seq data, we show that three well-known peak callers, MACS, SICER and diffReps, output biased P-values and false discovery rate estimates that can be many orders of magnitude too optimistic. We propose a wrapper algorithm, RECAP, that uses resampling of ChIP-seq and control data to estimate a monotone transform correcting for biases built into peak calling algorithms. When applied to null hypothesis data, where there is no enrichment between ChIP-seq and control, P-values recalibrated by RECAP are approximately uniformly distributed. On data where there is genuine enrichment, RECAP P-values give a better estimate of the true statistical significance of candidate peaks and better false discovery rate estimates, which correlate better with empirical reproducibility. RECAP is a powerful new tool for assessing the true statistical significance of ChIP-seq peak calls. Availability and implementation The RECAP software is available through www.perkinslab.ca or on github at https://github.com/theodorejperkins/RECAP. Supplementary information Supplementary data are available at Bioinformatics online.


Geophysics ◽  
1988 ◽  
Vol 53 (8) ◽  
pp. 1024-1033 ◽  
Author(s):  
Katherine M. Hansen ◽  
Kabir Roy‐Chowdhury ◽  
Robert A. Phinney

The theory of statistical hypothesis testing is used to develop and apply a seismic signal detection filter. The filter, herein named the sign filter, scans a stacked section and designates a linear segment as “signal” or “noise” based on the value of the sign test statistic evaluated over the amplitudes within the segment; only the signals are passed. The sign test statistic is nonparametric, so that probabilistic calculations related to the filtering process do not require rigid assumptions regarding the noise distribution. Consequently, it is possible to calculate both the probability that the filter will pass a segment containing only noise, and the expected number of noise‐only segments to be passed. These numbers may be adjusted by changing the tunable parameters of the filter. The detector was tested on both synthetic and field data. For synthetic data, all of the signals present in the data were identified, and the output did not contain any spurious signals, even for a signal‐to‐noise ratio smaller than 1. For field data, the events chosen by the filter, for the most part, agree closely with those visible in the input section; and much of the spatially incoherent energy is suppressed. A few of the passed segments were not visually coherent in the input stack; we suggest a method by which such segments might be identified and removed. The method is fairly general and may be modified for different definitions of signal. The case of linear alignments is the easiest to implement, and the detector promises to be useful in both the processing (automatic picking of first arrivals in source gathers) and interpretation (identification of primary reflections in stacked sections) phases of seismic data analysis.


Author(s):  
Ramon Viñas ◽  
Helena Andrés-Terré ◽  
Pietro Liò ◽  
Kevin Bryson

Abstract Motivation High-throughput gene expression can be used to address a wide range of fundamental biological problems, but datasets of an appropriate size are often unavailable. Moreover, existing transcriptomics simulators have been criticized because they fail to emulate key properties of gene expression data. In this article, we develop a method based on a conditional generative adversarial network to generate realistic transcriptomics data for Escherichia coli and humans. We assess the performance of our approach across several tissues and cancer-types. Results We show that our model preserves several gene expression properties significantly better than widely used simulators, such as SynTReN or GeneNetWeaver. The synthetic data preserve tissue- and cancer-specific properties of transcriptomics data. Moreover, it exhibits real gene clusters and ontologies both at local and global scales, suggesting that the model learns to approximate the gene expression manifold in a biologically meaningful way. Availability and implementation Code is available at: https://github.com/rvinas/adversarial-gene-expression. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Almut Luetge ◽  
Joanna Zyprych-Walczak ◽  
Urszula Brykczynska Kunzmann ◽  
Helena L Crowell ◽  
Daniela Calini ◽  
...  

A key challenge in single cell RNA-sequencing (scRNA-seq) data analysis are dataset- and batch-specific differences that can obscure the biological signal of interest. While there are various tools and methods to perform data integration and correct for batch effects, their performance can vary between datasets and according to the nature of the bias. Therefore, it is important to understand how batch effects manifest in order to adjust for them in a reliable way. Here, we systematically explore batch effects in a variety of scRNA-seq datasets according to magnitude, cell type specificity and complexity. We developed a cell-specific mixing score (cms) that quantifies how well cells from multiple batches are mixed. By considering distance distributions (in a lower dimensional space), the score is able to detect local batch bias and differentiate between unbalanced batches (i.e., when one cell type is more abundant in a batch) and systematic differences between cells of the same cell type. We implemented cms and related metrics to detect batch effects or measure structure preservation in the CellMixS R/Bioconductor package. We systematically compare different metrics that have been proposed to quantify batch effects or bias in scRNA-seq data using real datasets with known batch effects and synthetic data that mimic various real data scenarios. While these metrics target the same question and are used interchangeably, we find differences in inter- and intra-dataset scalability, sensitivity and in a metric's ability to handle batch effects with differentially abundant cell types. We find that cell-specific metrics outperform cell type-specific and global metrics and recommend them for both method benchmarks and batch exploration.


Author(s):  
Weiwei Zhang ◽  
Hao Wu ◽  
Ziyi Li

Abstract Motivation It is a common practice in epigenetics research to profile DNA methylation on tissue samples, which is usually a mixture of different cell types. To properly account for the mixture, estimating cell compositions has been recognized as an important first step. Many methods were developed for quantifying cell compositions from DNA methylation data, but they mostly have limited applications due to lack of reference or prior information. Results We develop Tsisal, a novel complete deconvolution method which accurately estimate cell compositions from DNA methylation data without any prior knowledge of cell types or their proportions. Tsisal is a full pipeline to estimate number of cell types, cell compositions, and identify cell-type-specific CpG sites. It can also assign cell type labels when (full or part of) reference panel is available. Extensive simulation studies and analyses of seven real data sets demonstrate the favorable performance of our proposed method compared with existing deconvolution methods serving similar purpose. Availability The proposed method Tsisal is implemented as part of the R/Bioconductor package TOAST at https://bioconductor.org/packages/TOAST. Contact [email protected] and [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Jonas Maaskola ◽  
Ludvig Bergenstråhle ◽  
Aleksandra Jurek ◽  
José Fernández Navarro ◽  
Jens Lagergren ◽  
...  

We create data-driven maps of transcriptomic anatomy with a probabilistic framework for unsupervised pattern discovery in spatial gene expression data. With convolved negative binomial regression we discover patterns which correspond to cell types, microenvironments, or tissue components, and that consist of gene expression profiles and spatial activity maps. Expression profiles quantify how strongly each gene is expressed in a given pattern, and spatial activity maps reflect where in space each pattern is active. Arbitrary covariates and prior hierarchies are supported to leverage complex experimental designs.We demonstrate the method with Spatial Transcriptomics data of mouse brain and olfactory bulb. The discovered transcriptomic patterns correspond to neuroanatomically distinct cell layers. Moreover, batch effects are successfully addressed, leading to consistent pattern inference for multi-sample analyses. On this basis, we identify known and uncharacterized genes that are spatially differentially expressed in the hippocampal field between Ammon’s horn and the dentate gyrus.


Sign in / Sign up

Export Citation Format

Share Document