A Novel Scalable Signature Based Subspace Clustering Approach for Big Data
“Big data” as the name suggests is a collection of large and complicated data sets which are usually hard to process with on-hand data management tools or other conventional processing applications. A scalable signature based subspace clustering approach is presented in this article that would avoid identification of redundant clusters. Various distance measures are utilized to perform experiments that validate the performance of the proposed algorithm. Also, for the same purpose of validation, the synthetic data sets that are chosen have different dimensions, and their size will be distributed when opened with Weka. The F1 quality measure and the runtime of these synthetic data sets are computed. The performance of the proposed algorithm is compared with other existing clustering algorithms such as CLIQUE.INSCY and SUNCLU.