ABSTRACTWith the rapid development of single-cell ATAC-seq technology, it has become possible to profile the chromatin accessibility of massive individual cells. However, it remains challenging to characterize their regulatory heterogeneity due to the high-dimensional, sparse and near-binary nature of data. Most existing data representation methods were designed based on correlation, which may be ill-defined for sparse data. Moreover, these methods do not well address the issue of excessive zeros. Thus, a simple, fast and scalable approach is needed to analyze single-cell ATAC-seq data with massive cells, address the “missingness” and accurately categorize cell types. To this end, we developed a network diffusion method for scalable embedding of massive single-cell ATAC-seq data (named as scAND). Specifically, we considered the near-binary single-cell ATAC-seq data as a bipartite network that reflects the accessible relationship between cells and accessible regions, and further adopted a simple and scalable network diffusion method to embed it. scAND can take information from similar cells to alleviate the sparsity and improve cell type identification. Extensive tests and comparison with existing methods using synthetic and real data as benchmarks demonstrated its distinct superiorities in terms of clustering accuracy, robustness, scalability and data integration.AvailabilityThe Python-based scAND tool is freely available at http://page.amss.ac.cn/shihua.zhang/software.html.