Dimensionality reduction and data integration for scRNA-seq data based on integrative hierarchical Poisson factorisation
Single-cell RNA sequencing (scRNA-seq) data sets consist of high-dimensional, sparse and noisy feature vectors, and pose a challenge for classic methods for dimensionality reduction. We show that application of Hierarchical Poisson Factorisation (HPF) to scRNA-seq data produces robust factors, and outperforms other popular methods. To account for batch variability in composite data sets, we introduce Integrative Hierarchical Poisson Factorisation (IHPF), an extension of HPF that makes use of a noise ratio hyper-parameter to tune the variability attributed to technical (batches) vs. biological (cell phenotypes) sources. We exemplify the advantageous application of IHPF under data integration scenarios with varying alignments of technical noise and cell diversity, and show that IHPF produces latent factors with a dual block structure in both cell and gene spaces for enhanced biological interpretability.