A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Yenny Villuendas-Rey; Eley Barroso-Cubas; Oscar Camacho-Nieto; Cornelio Yáñez-Márquez

doi:10.3390/math9070786

A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Mathematics ◽

10.3390/math9070786 ◽

2021 ◽

Vol 9 (7) ◽

pp. 786

Author(s):

Yenny Villuendas-Rey ◽

Eley Barroso-Cubas ◽

Oscar Camacho-Nieto ◽

Cornelio Yáñez-Márquez

Keyword(s):

Swarm Intelligence ◽

Data Clustering ◽

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Bat Algorithm ◽

Hybrid Features ◽

Bee Colony ◽

Learning Tasks ◽

Clustering Data

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.

Download Full-text

Experiments on Clustering Algorithms for Mixed and Incomplete Data

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b2551.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 4778-4784

Keyword(s):

Machine Learning ◽

Experimental Study ◽

Incomplete Data ◽

Clustering Algorithms ◽

Cluster Validation ◽

Clustering Data

Clustering mixed and incomplete data is a goal of frequent approaches in the last years because its common apparition in soft sciences problems. However, there is a lack of studies evaluating the performance of clustering algorithms for such kind of data. In this paper we present an experimental study about performance of seven clustering algorithms which used one of these techniques: partition, hierarchal or metaheuristic. All the methods ran over 15 databases from UCI Machine Learning Repository, having mixed and incomplete data descriptions. In external cluster validation using the indices Entropy and V-Measure, the algorithms that use the last technique showed the best results. Thus, we recommend metaheuristic based clustering algorithms for clustering data having mixed and incomplete descriptions.

Download Full-text

A COMPARISON OF CLUSTERING BY IMPUTATION AND SPECIAL CLUSTERING ALGORITHMS ON THE REAL INCOMPLETE DATA

Jurnal Ilmu Komputer dan Informasi ◽

10.21609/jiki.v13i2.818 ◽

2020 ◽

Vol 13 (2) ◽

pp. 65-75

Author(s):

Ridho Ananda ◽

Atika Ratna Dewi ◽

Nurlaili Nurlaili

Keyword(s):

Expectation Maximization ◽

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Distance Estimation ◽

Soft Constraints ◽

Fuzzy C Means ◽

Environmental Performance Index ◽

Silhouette Index ◽

Value Decomposition

The existence of missing values will really inhibit process of clustering. To overcome it, some of scientists have found several solutions. Both of them are imputation and special clustering algorithms. This paper compared the results of clustering by using them in incomplete data. K-means algorithms was utilized in the imputation data. The algorithms used were distribution free multiple imputation (DFMI), Gabriel eigen (GE), expectation maximization-singular value decomposition (EM-SVD), biplot imputation (BI), four algorithms of modified fuzzy c-means (FCM), k-means soft constraints (KSC), distance estimation strategy fuzzy c-means (DESFCM), k-means soft constraints imputed-observed (KSC-IO). The data used were the 2018 environmental performance index (EPI) and the simulation data. The optimal clustering on the 2018 EPI data would be chosen based on Silhouette index, where previously, it had been tested its capability in simulation dataset. The results showed that Silhouette index have the good capability to validate the clustering results in the incomplete dataset and the optimal clustering in the 2018 EPI dataset was obtained by k-means using BI where the silhouette index and time complexity were 0.613 and 0.063 respectively. Based on the results, k-means by using BI is suggested processing clustering analysis in the 2018 EPI dataset.

Download Full-text

Robust K-Median and K-Means Clustering Algorithms for Incomplete Data

Mathematical Problems in Engineering ◽

10.1155/2016/4321928 ◽

2016 ◽

Vol 2016 ◽

pp. 1-8 ◽

Cited By ~ 6

Author(s):

Jinhua Li ◽

Shiji Song ◽

Yuli Zhang ◽

Zhen Zhou

Keyword(s):

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Interval Data ◽

Accurate Estimation ◽

Data Sets ◽

Clustering Methods ◽

Estimation Errors ◽

Feature Values ◽

Time And Space Complexity

Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.

Download Full-text

Comparison of Selected Swarm Intelligence Algorithms in Student Courses Recommendation Application

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194014500041 ◽

2014 ◽

Vol 24 (01) ◽

pp. 91-109 ◽

Cited By ~ 4

Author(s):

Janusz Sobecki

Keyword(s):

Particle Swarm Optimization ◽

Swarm Intelligence ◽

Optimization Problems ◽

Bat Algorithm ◽

Problem Space ◽

Swarm Optimization ◽

Bee Colony ◽

Grade Prediction ◽

Bee Colony Optimization ◽

Swarm Intelligence Algorithm

In this paper a comparison of a few swarm intelligence algorithms applied in recommendation of student courses is presented. Swarm intelligence algorithms are nowadays successfully used in many areas, especially in optimization problems. To apply each swarm intelligence algorithm in recommender systems a special representation of the problem space is necessary. Here we present the comparison of efficiency of grade prediction of several evolutionary algorithms, such as: Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), Intelligent Weed Optimization (IWO), Bee Colony Optimization (BCO) and Bat Algorithm (BA).

Download Full-text

Lookahead selective sampling for incomplete data

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2016-0062 ◽

2016 ◽

Vol 26 (4) ◽

pp. 871-884 ◽

Cited By ~ 1

Author(s):

Loai Abdallah ◽

Ilan Shimshoni

Keyword(s):

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Mean Shift ◽

Ensemble Clustering ◽

Selective Sampling ◽

Mean Shift Clustering ◽

Sampling Algorithms ◽

Instance Space ◽

Incomplete Datasets

AbstractMissing values in data are common in real world applications. There are several methods that deal with this problem. In this paper we present lookahead selective sampling (LSS) algorithms for datasets with missing values. We developed two versions of selective sampling. The first one integrates a distance function that can measure the similarity between pairs of incomplete points within the framework of the LSS algorithm. The second algorithm uses ensemble clustering in order to represent the data in a cluster matrix without missing values and then run the LSS algorithm based on the ensemble clustering instance space (LSS-EC). To construct the cluster matrix, we use the k-means and mean shift clustering algorithms especially modified to deal with incomplete datasets. We tested our algorithms on six standard numerical datasets from different fields. On these datasets we simulated missing values and compared the performance of the LSS and LSS-EC algorithms for incomplete data to two other basic methods. Our experiments show that the suggested selective sampling algorithms outperform the other methods.

Download Full-text

PANTSA Influence in grouping Mixed and Incomplete Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b6534.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 579-583

Keyword(s):

Experimental Evidence ◽

Data Clustering ◽

Incomplete Data ◽

Clustering Algorithms ◽

Numerical Data ◽

Clustering Methods ◽

High Quality ◽

Before And After

Obtaining high quality groups and processing mixed and incomplete data (DMI) are still problems in the data clustering. Recently a method was proposed that improves the results obtained by clustering algorithms, the PAntSA; but this was only designed and tested for numerical data. For this reason, this paper analyzes the influence of applying the PAntSA in the performance of DMI restricted clustering algorithms. For this, the results of different algorithms are compared before and after applying the PAntSA. The comparisons made provide experimental evidence that the PAntSA algorithm improves the quality of the groups obtained by traditional DMI clustering methods.

Download Full-text

A Robust Fuzzy Approach For Gene Expression Data Clustering

10.21203/rs.3.rs-547452/v1 ◽

2021 ◽

Author(s):

Meskat Jahan ◽

Mahmudul Hasan

Keyword(s):

Gene Expression ◽

Data Clustering ◽

Missing Values ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Expression Data ◽

Mining Method ◽

Cluster Number ◽

Gene Expression Data Clustering ◽

Parameter Dependent

Abstract In the big data era, clustering is one of the most popular data mining method. The majority of clustering algorithms have complications like automatic cluster number determination, poor clustering precision, inconsistent clustering of various datasets and parameter-dependent etc. A new fuzzy autonomous solution for clustering named Meskat-Mahmudul (MM) clustering algorithm proposed to overcome the complexity of parameter–free automatic cluster number determination and clustering accuracy. MM clustering algorithm finds out the exact number of clusters based on Average Silhouette method in multivariate mixed attribute dataset, including real-time gene expression dataset and dealt missing values, noise and outliers. MM Extended K-Means (MMK) clustering algorithm is an enhancement of the K-Means algorithm, which serves the purpose for automatic cluster discovery and runtime cluster placement. Several validation methods used to evaluate cluster and certify optimum cluster partitioning and perfection. Some datasets used to assess the performance of the proposed algorithms to other algorithms in terms of time complexity and clustering efficiency. Finally, MM clustering and MMK clustering algorithms found superior over conventional algorithms.

Download Full-text

Data clustering algorithms based on Swarm Intelligence

2011 3rd International Conference on Electronics Computer Technology ◽

10.1109/icectech.2011.5941931 ◽

2011 ◽

Cited By ~ 7

Author(s):

Pankaj K. Bharne ◽

V. S. Gulhane ◽

Shweta K. Yewale

Keyword(s):

Swarm Intelligence ◽

Data Clustering ◽

Clustering Algorithms

Download Full-text

Improved Bat Algorithm Applied to Multilevel Image Thresholding

The Scientific World JOURNAL ◽

10.1155/2014/176718 ◽

2014 ◽

Vol 2014 ◽

pp. 1-16 ◽

Cited By ~ 69

Author(s):

Adis Alihodzic ◽

Milan Tuba

Keyword(s):

Swarm Intelligence ◽

State Of The Art ◽

Bat Algorithm ◽

Optimization Methods ◽

Processing Technique ◽

Image Processing Technique ◽

Computational Time ◽

Image Thresholding ◽

Bee Colony ◽

Intractable Problems

Multilevel image thresholding is a very important image processing technique that is used as a basis for image segmentation and further higher level processing. However, the required computational time for exhaustive search grows exponentially with the number of desired thresholds. Swarm intelligence metaheuristics are well known as successful and efficient optimization methods for intractable problems. In this paper, we adjusted one of the latest swarm intelligence algorithms, the bat algorithm, for the multilevel image thresholding problem. The results of testing on standard benchmark images show that the bat algorithm is comparable with other state-of-the-art algorithms. We improved standard bat algorithm, where our modifications add some elements from the differential evolution and from the artificial bee colony algorithm. Our new proposed improved bat algorithm proved to be better than five other state-of-the-art algorithms, improving quality of results in all cases and significantly improving convergence speed.

Download Full-text

A Survey of Swarm Intelligence Based Load Balancing Techniques in Cloud Computing Environment

Electronics ◽

10.3390/electronics10212718 ◽

2021 ◽

Vol 10 (21) ◽

pp. 2718

Author(s):

M. A. Elmagzoub ◽

Darakhshan Syed ◽

Asadullah Shaikh ◽

Noman Islam ◽

Abdullah Alghamdi ◽

...

Keyword(s):

Cloud Computing ◽

Load Balancing ◽

Swarm Intelligence ◽

Bat Algorithm ◽

Quality Parameters ◽

Smart Devices ◽

The Internet ◽

Time Data ◽

Shared Resources ◽

Bee Colony

Cloud computing offers flexible, interactive, and observable access to shared resources on the Internet. It frees users from the requirements of managing computing on their hardware. It enables users to not only store their data and computing over the internet but also can access it whenever and wherever it is required. The frequent use of smart devices has helped cloud computing to realize the need for its rapid growth. As more users are adapting to the cloud environment, the focus has been placed on load balancing. Load balancing allocates tasks or resources to different devices. In cloud computing, and load balancing has played a major role in the efficient usage of resources for the highest performance. This requirement results in the development of algorithms that can optimally assign resources while managing load and improving quality of service (QoS). This paper provides a survey of load balancing algorithms inspired by swarm intelligence (SI). The algorithms considered in the discussion are Genetic Algorithm, BAT Algorithm, Ant Colony, Grey Wolf, Artificial Bee Colony, Particle Swarm, Whale, Social Spider, Dragonfly, and Raven roosting Optimization. An analysis of the main objectives, area of applications, and targeted issues of each algorithm (with advancements) is presented. In addition, performance analysis has been performed based on average response time, data center processing time, and other quality parameters.

Download Full-text