Incremental community discovery via latent network representation and probabilistic inference
AbstractMost of the community detection algorithms assume that the complete network structure $$\mathcal {G}=(\mathcal {V},\mathcal {E})$$G=(V,E) is available in advance for analysis. However, in reality this may not be true due to several reasons, such as privacy constraints and restricted access, which result in a partial snapshot of the entire network. In addition, we may be interested in identifying the community information of only a selected subset of nodes (denoted by $$\mathcal {V}_{{\mathrm{T}}} \subseteq \mathcal {V}$$VT⊆V), rather than obtaining the community structure of all the nodes in $$\mathcal {G}$$G. To this end, we propose an incremental community detection method that repeats two stages—(i) network scan and (ii) community update. In the first stage, our method selects an appropriate node in such a way that the discovery of its local neighborhood structure leads to an accurate community detection in the second stage. We propose a novel criterion, called Information Gain, based on existing network embedding algorithms (Deepwalk and node2vec) to scan a node. The proposed community update stage consists of expectation–maximization and Markov Random Field-based denoising strategy. Experiments with 5 diverse networks with known ground-truth community structure show that our algorithm achieves 10.2% higher accuracy on average over state-of-the-art algorithms for both network scan and community update steps.