scholarly journals Mining Negative Comment Data of Microblog Based on Merge-AP

2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Zhijun Chen ◽  
Weijian Jin ◽  
Shibiao Mu

A new depiction method based on the merge-AP algorithm is proposed to effectively improve the mining accuracy of negative comment data on microblog. In this method, we first employ the AP algorithm to analyze negative comment data on microblog and calculate the similarity value and the similarity matrix of data points by Euclidean distance. Then, we introduce the distance-based merge process to solve the problem of poor clustering effect of the AP algorithm for datasets with the complex clustering structure. Finally, we compare and analyze the performance of K-means, AP, and merge-AP algorithms by collecting the actual microblog data for algorithm evaluation. The results show that the merge-AP algorithm has good adaptability.

2011 ◽  
Vol 268-270 ◽  
pp. 811-816
Author(s):  
Yong Zhou ◽  
Yan Xing

Affinity Propagation(AP)is a new clustering algorithm, which is based on the similarity matrix between pairs of data points and messages are exchanged between data points until clustering result emerges. It is efficient and fast , and it can solve the clustering on large data sets. But the traditional Affinity Propagation has many limitations, this paper introduces the Affinity Propagation, and analyzes in depth the advantages and limitations of it, focuses on the improvements of the algorithm — improve the similarity matrix, adjust the preference and the damping-factor, combine with other algorithms. Finally, discusses the development of Affinity Propagation.


2013 ◽  
Vol 325-326 ◽  
pp. 1637-1640
Author(s):  
Dong Mei Li ◽  
Jing Lei Zhang

Images matching is the basis of image registration. For their difference, a improved SURF(speeded up robust features) algorithm was proposed for the infrared and visible images matching. Firstly, edges were extracted from the images to improve the similarity of infrared and visible images. Then SURF algorithm was used to detect interest points, and the dimension of the point descriptor was 64. Finally, found the matching points by Euclidean distance. Experimental results show that some invalid data points were eliminated.


Author(s):  
Rina Refianti ◽  
Achmad Benny Mutiara ◽  
Asep Juarna ◽  
Adang Suhendra

In recent years, two new data clustering algorithms have been proposed. One of them isAffinity Propagation (AP). AP is a new data clustering technique that use iterative message passing and consider all data points as potential exemplars. Two important inputs of AP are a similarity matrix (SM) of the data and the parameter ”preference” p. Although the original AP algorithm has shown much success in data clustering, it still suffer from one limitation: it is not easy to determine the value of the parameter ”preference” p which can result an optimal clustering solution. To resolve this limitation, we propose a new model of the parameter ”preference” p, i.e. it is modeled based on the similarity distribution. Having the SM and p, Modified Adaptive AP (MAAP) procedure is running. MAAP procedure means that we omit the adaptive p-scanning algorithm as in original Adaptive-AP (AAP) procedure. Experimental results on random non-partition and partition data sets show that (i) the proposed algorithm, MAAP-DDP, is slower than original AP for random non-partition dataset, (ii) for random 4-partition dataset and real datasets the proposed algorithm has succeeded to identify clusters according to the number of dataset’s true labels with the execution times that are comparable with those original AP. Beside that the MAAP-DDP algorithm demonstrates more feasible and effective than original AAP procedure.


2020 ◽  
Vol 2020 ◽  
pp. 1-6
Author(s):  
Wei Jiang ◽  
Xi Fang ◽  
Jianmei Ding

With respect to the cluster problem of the evaluation information of mass customers in service management, a cluster algorithm of new Gaussian kernel FCM (fuzzy C-means) is proposed based on the idea of FCM. First, the paper defines a Euclidean distance formula between two data points and makes them cluster adaptively based on the distance classification approach and nearest neighbors in deleting relative data. Second, the defects of the FCM algorithm are analyzed, and a solution algorithm is designed based on the dual goals of obtaining a short distance between whole classes and long distances between different classes. Finally, an example is given to illustrate the results compared with the existing FCM algorithm.


Author(s):  
Joy Christy A ◽  
Umamakeswari A

Outlier detection is a part of data analytics that helps users to find discrepancies in working machines by applying outlier detection algorithm on the captured data for every fixed interval. An outlier is a data point that exhibits different properties from other points due to some external or internal forces. These outliers can be detected by clustering the data points. To detect outliers, optimal clustering of data points is important. The problem that arises quite frequently in statistics is identification of groups or clusters of data within a population or sample. The most widely used procedure to identify clusters in a set of observations is k-means using Euclidean distance. Euclidean distance is not so efficient for finding anomaly in multivariate space. This chapter uses k-means algorithm with Mahalanobis distance metric to capture the variance structure of the clusters followed by the application of extreme value analysis (EVA) algorithm to detect the outliers for detecting rare items, events, or observations that raise suspicions from the majority of the data.


2014 ◽  
Vol 1022 ◽  
pp. 337-340
Author(s):  
Hong Bo Zhou ◽  
Jun Tao Gao

Clustering result is easily influenced by the initial clustering centers in the K-means algorithm,an improved algorithm about initial clustering centers selection is presented.The algorithm finds the maximun Euclidean distance of cluster firstly,and then makes the cluster to split by used two data objects which have the maximum distance as new clustering centers,repeat the above steps until the specified number of clustering centers are obtained.Compared to the original algorithm,the improved algorithm can solve the problem of the instability of clustering effect generated by randomness, and its time complexity was also decreased.


Author(s):  
Eric U.O. ◽  
Michael O.O. ◽  
Oberhiri-Orumah G. ◽  
Chike H. N.

Cluster analysis is an unsupervised learning method that classifies data points, usually multidimensional into groups (called clusters) such that members of one cluster are more similar (in some sense) to each other than those in other clusters. In this paper, we propose a new k-means clustering method that uses Minkowski’s distance as its metric in a normed vector space which is the generalization of both the Euclidean distance and the Manhattan distance. The k-means clustering methods discussed in this paper are Forgy’s method, Lloyd’s method, MacQueen’s method, Hartigan and Wong’s method, Likas’ method and Faber’s method which uses the usual Euclidean distance. It was observed that the new k-means clustering method performed favourably in comparison with the existing methods in terms of minimization of the total intra-cluster variance using simulated data and real-life data sets.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1550
Author(s):  
Ailin Zhu ◽  
Zexi Hua ◽  
Yu Shi ◽  
Yongchuan Tang ◽  
Lingwei Miao

The main influencing factors of the clustering effect of the k-means algorithm are the selection of the initial clustering center and the distance measurement between the sample points. The traditional k-mean algorithm uses Euclidean distance to measure the distance between sample points, thus it suffers from low differentiation of attributes between sample points and is prone to local optimal solutions. For this feature, this paper proposes an improved k-means algorithm based on evidence distance. Firstly, the attribute values of sample points are modelled as the basic probability assignment (BPA) of sample points. Then, the traditional Euclidean distance is replaced by the evidence distance for measuring the distance between sample points, and finally k-means clustering is carried out using UCI data. Experimental comparisons are made with the traditional k-means algorithm, the k-means algorithm based on the aggregation distance parameter, and the Gaussian mixture model. The experimental results show that the improved k-means algorithm based on evidence distance proposed in this paper has a better clustering effect and the convergence of the algorithm is also better.


2013 ◽  
Vol 433-435 ◽  
pp. 725-730
Author(s):  
Sheng Zhang ◽  
Xiao Qi He ◽  
Yang Guang Liu ◽  
Qi Chun Huang

Constructing the similarity matrix is the key step for spectral clustering, and its goal is to model the local neighborhood relationships between the data points. In order to evaluate the influence of similarity matrix on performance of the different spectral clustering algorithms and find the rules on how to construct an appropriate similarity matrix, a system empirical study was carried out. In the study, six recently proposed spectral clustering algorithms were selected as evaluation object, and normalized mutual information, F-measures and Rand Index were used as evaluation metrics. Then experiments were carried out on eight synthetic datasets and eleven real word datasets respectively. The experimental results show that with multiple metrics the results are more comprehensive and confident, and the comprehensive performance of locality spectral clustering algorithm is better than other five algorithms on synthetic datasets and real word datasets.


2021 ◽  
Vol 7 ◽  
pp. e679
Author(s):  
Kazuhisa Fujita

Spectral clustering (SC) is one of the most popular clustering methods and often outperforms traditional clustering methods. SC uses the eigenvectors of a Laplacian matrix calculated from a similarity matrix of a dataset. SC has serious drawbacks: the significant increases in the time complexity derived from the computation of eigenvectors and the memory space complexity to store the similarity matrix. To address the issues, I develop a new approximate spectral clustering using the network generated by growing neural gas (GNG), called ASC with GNG in this study. ASC with GNG uses not only reference vectors for vector quantization but also the topology of the network for extraction of the topological relationship between data points in a dataset. ASC with GNG calculates the similarity matrix from both the reference vectors and the topology of the network generated by GNG. Using the network generated from a dataset by GNG, ASC with GNG achieves to reduce the computational and space complexities and improve clustering quality. In this study, I demonstrate that ASC with GNG effectively reduces the computational time. Moreover, this study shows that ASC with GNG provides equal to or better clustering performance than SC.


Sign in / Sign up

Export Citation Format

Share Document