scholarly journals A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 786
Author(s):  
Yenny Villuendas-Rey ◽  
Eley Barroso-Cubas ◽  
Oscar Camacho-Nieto ◽  
Cornelio Yáñez-Márquez

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.

Clustering mixed and incomplete data is a goal of frequent approaches in the last years because its common apparition in soft sciences problems. However, there is a lack of studies evaluating the performance of clustering algorithms for such kind of data. In this paper we present an experimental study about performance of seven clustering algorithms which used one of these techniques: partition, hierarchal or metaheuristic. All the methods ran over 15 databases from UCI Machine Learning Repository, having mixed and incomplete data descriptions. In external cluster validation using the indices Entropy and V-Measure, the algorithms that use the last technique showed the best results. Thus, we recommend metaheuristic based clustering algorithms for clustering data having mixed and incomplete descriptions.


2020 ◽  
Vol 13 (2) ◽  
pp. 65-75
Author(s):  
Ridho Ananda ◽  
Atika Ratna Dewi ◽  
Nurlaili Nurlaili

The existence of missing values will really inhibit process of clustering. To overcome it, some of scientists have found several solutions. Both of them are imputation and special clustering algorithms. This paper compared the results of clustering by using them in incomplete data. K-means algorithms was utilized in the imputation data. The algorithms used were distribution free multiple imputation (DFMI), Gabriel eigen (GE), expectation maximization-singular value decomposition (EM-SVD), biplot imputation (BI), four algorithms of modified fuzzy c-means (FCM), k-means soft constraints (KSC), distance estimation strategy fuzzy c-means (DESFCM), k-means soft constraints imputed-observed (KSC-IO). The data used were the 2018 environmental performance index (EPI) and the simulation data. The optimal clustering on the 2018 EPI data would be chosen based on Silhouette index, where previously, it had been tested its capability in simulation dataset. The results showed that Silhouette index have the good capability to validate the clustering results in the incomplete dataset and the optimal clustering in the 2018 EPI dataset was obtained by k-means using BI where the silhouette index and time complexity were 0.613 and 0.063 respectively. Based on the results, k-means by using BI is suggested processing clustering analysis in the 2018 EPI dataset.


2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Jinhua Li ◽  
Shiji Song ◽  
Yuli Zhang ◽  
Zhen Zhou

Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.


Author(s):  
Janusz Sobecki

In this paper a comparison of a few swarm intelligence algorithms applied in recommendation of student courses is presented. Swarm intelligence algorithms are nowadays successfully used in many areas, especially in optimization problems. To apply each swarm intelligence algorithm in recommender systems a special representation of the problem space is necessary. Here we present the comparison of efficiency of grade prediction of several evolutionary algorithms, such as: Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), Intelligent Weed Optimization (IWO), Bee Colony Optimization (BCO) and Bat Algorithm (BA).


2016 ◽  
Vol 26 (4) ◽  
pp. 871-884 ◽  
Author(s):  
Loai Abdallah ◽  
Ilan Shimshoni

AbstractMissing values in data are common in real world applications. There are several methods that deal with this problem. In this paper we present lookahead selective sampling (LSS) algorithms for datasets with missing values. We developed two versions of selective sampling. The first one integrates a distance function that can measure the similarity between pairs of incomplete points within the framework of the LSS algorithm. The second algorithm uses ensemble clustering in order to represent the data in a cluster matrix without missing values and then run the LSS algorithm based on the ensemble clustering instance space (LSS-EC). To construct the cluster matrix, we use the k-means and mean shift clustering algorithms especially modified to deal with incomplete datasets. We tested our algorithms on six standard numerical datasets from different fields. On these datasets we simulated missing values and compared the performance of the LSS and LSS-EC algorithms for incomplete data to two other basic methods. Our experiments show that the suggested selective sampling algorithms outperform the other methods.


Obtaining high quality groups and processing mixed and incomplete data (DMI) are still problems in the data clustering. Recently a method was proposed that improves the results obtained by clustering algorithms, the PAntSA; but this was only designed and tested for numerical data. For this reason, this paper analyzes the influence of applying the PAntSA in the performance of DMI restricted clustering algorithms. For this, the results of different algorithms are compared before and after applying the PAntSA. The comparisons made provide experimental evidence that the PAntSA algorithm improves the quality of the groups obtained by traditional DMI clustering methods.


2021 ◽  
Author(s):  
Meskat Jahan ◽  
Mahmudul Hasan

Abstract In the big data era, clustering is one of the most popular data mining method. The majority of clustering algorithms have complications like automatic cluster number determination, poor clustering precision, inconsistent clustering of various datasets and parameter-dependent etc. A new fuzzy autonomous solution for clustering named Meskat-Mahmudul (MM) clustering algorithm proposed to overcome the complexity of parameter–free automatic cluster number determination and clustering accuracy. MM clustering algorithm finds out the exact number of clusters based on Average Silhouette method in multivariate mixed attribute dataset, including real-time gene expression dataset and dealt missing values, noise and outliers. MM Extended K-Means (MMK) clustering algorithm is an enhancement of the K-Means algorithm, which serves the purpose for automatic cluster discovery and runtime cluster placement. Several validation methods used to evaluate cluster and certify optimum cluster partitioning and perfection. Some datasets used to assess the performance of the proposed algorithms to other algorithms in terms of time complexity and clustering efficiency. Finally, MM clustering and MMK clustering algorithms found superior over conventional algorithms.


2014 ◽  
Vol 2014 ◽  
pp. 1-16 ◽  
Author(s):  
Adis Alihodzic ◽  
Milan Tuba

Multilevel image thresholding is a very important image processing technique that is used as a basis for image segmentation and further higher level processing. However, the required computational time for exhaustive search grows exponentially with the number of desired thresholds. Swarm intelligence metaheuristics are well known as successful and efficient optimization methods for intractable problems. In this paper, we adjusted one of the latest swarm intelligence algorithms, the bat algorithm, for the multilevel image thresholding problem. The results of testing on standard benchmark images show that the bat algorithm is comparable with other state-of-the-art algorithms. We improved standard bat algorithm, where our modifications add some elements from the differential evolution and from the artificial bee colony algorithm. Our new proposed improved bat algorithm proved to be better than five other state-of-the-art algorithms, improving quality of results in all cases and significantly improving convergence speed.


Electronics ◽  
2021 ◽  
Vol 10 (21) ◽  
pp. 2718
Author(s):  
M. A. Elmagzoub ◽  
Darakhshan Syed ◽  
Asadullah Shaikh ◽  
Noman Islam ◽  
Abdullah Alghamdi ◽  
...  

Cloud computing offers flexible, interactive, and observable access to shared resources on the Internet. It frees users from the requirements of managing computing on their hardware. It enables users to not only store their data and computing over the internet but also can access it whenever and wherever it is required. The frequent use of smart devices has helped cloud computing to realize the need for its rapid growth. As more users are adapting to the cloud environment, the focus has been placed on load balancing. Load balancing allocates tasks or resources to different devices. In cloud computing, and load balancing has played a major role in the efficient usage of resources for the highest performance. This requirement results in the development of algorithms that can optimally assign resources while managing load and improving quality of service (QoS). This paper provides a survey of load balancing algorithms inspired by swarm intelligence (SI). The algorithms considered in the discussion are Genetic Algorithm, BAT Algorithm, Ant Colony, Grey Wolf, Artificial Bee Colony, Particle Swarm, Whale, Social Spider, Dragonfly, and Raven roosting Optimization. An analysis of the main objectives, area of applications, and targeted issues of each algorithm (with advancements) is presented. In addition, performance analysis has been performed based on average response time, data center processing time, and other quality parameters.


Sign in / Sign up

Export Citation Format

Share Document