Plant PhysioSpace: a robust tool to compare stress response across plant species
AbstractGeneralization of transcriptomics results can be achieved by comparison across experiments, which is based on integration of interrelated transcriptomics studies into a compendium. Both characterization of the fate of the organism under study as well as distinguishing between generic and specific responses can be gained in such a broader context. We have built such a compendium for plant stress response, which is based on integrating publicly available data sets for plant stress response to generalize results across studies and extract the most robust and meaningful information possible from them.There are numerous methods and tools to analyze such data sets, most focusing on gene-wise dimension reduction of data to obtain marker genes and gene sets, e.g. for pathway analysis. Relying only on isolated biological modules might lead to missing of important confounders and relevant context. Therefore, we have chosen a different approach: Our novel tool, which we called Plant PhysioSpace, provides the ability to compute experimental conditions across species and platforms without a priori reducing the reference information to specific gene-sets. It extracts physiologically relevant signatures from a reference data set, a collection of public data sets, by integrating and transforming heterogeneous reference gene expression data into a set of physiology-specific patterns, called PhysioSpace. New experimental data can be mapped to these PhysioSpaces, resulting in similarity scores, providing quantitative similarity of the new experiment to an a priori compendium.Here we report the implementation of two R packages, one software and one data package, and a shiny web application, which provides plant biologists convenient ways to access the method and a precomputed compendium of more than 900 PhysioSpace basis vectors from 4 different species (Arabidopsis thaliana, Oryza sativa, Glycine max, and Triticum aestivum).The tool reduces the dimensionality of data sample-wise (and not gene-wise), which results in a vector containing all genes. This method is very robust against noise and change of platform while still being sensitive. Plant PhysioSpace can therefore be used as an inter-species or cross-platform similarity measure. We demonstrate that Plant PhysioSpace can successfully translate stress responses between different species and platforms (including single cell technologies).