Mining Negative Comment Data of Microblog Based on Merge-AP

Mathematical Problems in Engineering ◽

10.1155/2020/9723780 ◽

2020 ◽

Vol 2020 ◽

pp. 1-7

Author(s):

Zhijun Chen ◽

Weijian Jin ◽

Shibiao Mu

Keyword(s):

Euclidean Distance ◽

Similarity Matrix ◽

Algorithm Evaluation ◽

Negative Comment ◽

Data Points ◽

Clustering Effect ◽

Merge Process

A new depiction method based on the merge-AP algorithm is proposed to effectively improve the mining accuracy of negative comment data on microblog. In this method, we first employ the AP algorithm to analyze negative comment data on microblog and calculate the similarity value and the similarity matrix of data points by Euclidean distance. Then, we introduce the distance-based merge process to solve the problem of poor clustering effect of the AP algorithm for datasets with the complex clustering structure. Finally, we compare and analyze the performance of K-means, AP, and merge-AP algorithms by collecting the actual microblog data for algorithm evaluation. The results show that the merge-AP algorithm has good adaptability.

Download Full-text

Summary of Affinity Propagation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.811 ◽

2011 ◽

Vol 268-270 ◽

pp. 811-816

Author(s):

Yong Zhou ◽

Yan Xing

Keyword(s):

Clustering Algorithm ◽

Large Data ◽

Large Data Sets ◽

Affinity Propagation ◽

Damping Factor ◽

Data Sets ◽

Similarity Matrix ◽

Data Points

Affinity Propagation(AP)is a new clustering algorithm, which is based on the similarity matrix between pairs of data points and messages are exchanged between data points until clustering result emerges. It is efficient and fast , and it can solve the clustering on large data sets. But the traditional Affinity Propagation has many limitations, this paper introduces the Affinity Propagation, and analyzes in depth the advantages and limitations of it, focuses on the improvements of the algorithm — improve the similarity matrix, adjust the preference and the damping-factor, combine with other algorithms. Finally, discusses the development of Affinity Propagation.

Download Full-text

A Improved Infrared and Visible Images Matching Based on SURF

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.325-326.1637 ◽

2013 ◽

Vol 325-326 ◽

pp. 1637-1640

Author(s):

Dong Mei Li ◽

Jing Lei Zhang

Keyword(s):

Image Registration ◽

Euclidean Distance ◽

Experimental Results ◽

Interest Points ◽

Speeded Up Robust Features ◽

Visible Images ◽

Data Points

Images matching is the basis of image registration. For their difference, a improved SURF(speeded up robust features) algorithm was proposed for the infrared and visible images matching. Firstly, edges were extracted from the images to improve the similarity of infrared and visible images. Then SURF algorithm was used to detect interest points, and the dimension of the point descriptor was 64. Finally, found the matching points by Euclidean distance. Experimental results show that some invalid data points were eliminated.

Download Full-text

A Preference Model on Adaptive Affinity Propagation

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i3.pp1805-1813 ◽

2018 ◽

Vol 8 (3) ◽

pp. 1805 ◽

Cited By ~ 1

Author(s):

Rina Refianti ◽

Achmad Benny Mutiara ◽

Asep Juarna ◽

Adang Suhendra

Keyword(s):

Data Clustering ◽

Message Passing ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Similarity Matrix ◽

Preference Model ◽

New Model ◽

Data Points ◽

Scanning Algorithm

In recent years, two new data clustering algorithms have been proposed. One of them isAffinity Propagation (AP). AP is a new data clustering technique that use iterative message passing and consider all data points as potential exemplars. Two important inputs of AP are a similarity matrix (SM) of the data and the parameter ”preference” p. Although the original AP algorithm has shown much success in data clustering, it still suffer from one limitation: it is not easy to determine the value of the parameter ”preference” p which can result an optimal clustering solution. To resolve this limitation, we propose a new model of the parameter ”preference” p, i.e. it is modeled based on the similarity distribution. Having the SM and p, Modified Adaptive AP (MAAP) procedure is running. MAAP procedure means that we omit the adaptive p-scanning algorithm as in original Adaptive-AP (AAP) procedure. Experimental results on random non-partition and partition data sets show that (i) the proposed algorithm, MAAP-DDP, is slower than original AP for random non-partition dataset, (ii) for random 4-partition dataset and real datasets the proposed algorithm has succeeded to identify clusters according to the number of dataset’s true labels with the execution times that are comparable with those original AP. Beside that the MAAP-DDP algorithm demonstrates more feasible and effective than original AAP procedure.

Download Full-text

Gaussian Kernel Fuzzy C-Means Algorithm for Service Resource Allocation

Scientific Programming ◽

10.1155/2020/8889480 ◽

2020 ◽

Vol 2020 ◽

pp. 1-6

Author(s):

Wei Jiang ◽

Xi Fang ◽

Jianmei Ding

Keyword(s):

Euclidean Distance ◽

Service Management ◽

Cluster Algorithm ◽

Gaussian Kernel ◽

Cluster Problem ◽

Fcm Algorithm ◽

Fuzzy C Means ◽

Service Resource ◽

Data Points ◽

Evaluation Information

With respect to the cluster problem of the evaluation information of mass customers in service management, a cluster algorithm of new Gaussian kernel FCM (fuzzy C-means) is proposed based on the idea of FCM. First, the paper defines a Euclidean distance formula between two data points and makes them cluster adaptively based on the distance classification approach and nearest neighbors in deleting relative data. Second, the defects of the FCM algorithm are analyzed, and a solution algorithm is designed based on the dual goals of obtaining a short distance between whole classes and long distances between different classes. Finally, an example is given to illustrate the results compared with the existing FCM algorithm.

Download Full-text

Performance Enhancement of Outlier Removal Using Extreme Value Analysis-Based Mahalonobis Distance

Handling Priority Inversion in Time-Constrained Distributed Databases - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-2491-6.ch014 ◽

2020 ◽

pp. 240-252

Author(s):

Joy Christy A ◽

Umamakeswari A

Keyword(s):

Outlier Detection ◽

Euclidean Distance ◽

Performance Enhancement ◽

Detection Algorithm ◽

Extreme Value ◽

Extreme Value Analysis ◽

Value Analysis ◽

Data Points ◽

Internal Forces ◽

Mahalonobis Distance

Outlier detection is a part of data analytics that helps users to find discrepancies in working machines by applying outlier detection algorithm on the captured data for every fixed interval. An outlier is a data point that exhibits different properties from other points due to some external or internal forces. These outliers can be detected by clustering the data points. To detect outliers, optimal clustering of data points is important. The problem that arises quite frequently in statistics is identification of groups or clusters of data within a population or sample. The most widely used procedure to identify clusters in a set of observations is k-means using Euclidean distance. Euclidean distance is not so efficient for finding anomaly in multivariate space. This chapter uses k-means algorithm with Mahalanobis distance metric to capture the variance structure of the clusters followed by the application of extreme value analysis (EVA) algorithm to detect the outliers for detecting rare items, events, or observations that raise suspicions from the majority of the data.

Download Full-text

An Improved Initial Clustering Center Selection Method for K-Means Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1022.337 ◽

2014 ◽

Vol 1022 ◽

pp. 337-340

Author(s):

Hong Bo Zhou ◽

Jun Tao Gao

Keyword(s):

Time Complexity ◽

Euclidean Distance ◽

Selection Method ◽

Maximum Distance ◽

Original Algorithm ◽

Clustering Center ◽

Clustering Effect ◽

Data Objects ◽

Improved Algorithm

Clustering result is easily influenced by the initial clustering centers in the K-means algorithm,an improved algorithm about initial clustering centers selection is presented.The algorithm finds the maximun Euclidean distance of cluster firstly,and then makes the cluster to split by used two data objects which have the maximum distance as new clustering centers,repeat the above steps until the specified number of clustering centers are obtained.Compared to the original algorithm,the improved algorithm can solve the problem of the instability of clustering effect generated by randomness, and its time complexity was also decreased.

Download Full-text

New K-means Clustering Method Using Minkowski’s Distance as its Metric

British Journal of Computer, Networking and Information Technology ◽

10.52589/bjcnit-xepsjbwx ◽

2021 ◽

Vol 4 (1) ◽

pp. 28-41

Author(s):

Eric U.O. ◽

Michael O.O. ◽

Oberhiri-Orumah G. ◽

Chike H. N.

Keyword(s):

Euclidean Distance ◽

Real Life ◽

Simulated Data ◽

Manhattan Distance ◽

Data Sets ◽

Normed Vector Space ◽

Clustering Methods ◽

Clustering Method ◽

Real Life Data ◽

Data Points

Cluster analysis is an unsupervised learning method that classifies data points, usually multidimensional into groups (called clusters) such that members of one cluster are more similar (in some sense) to each other than those in other clusters. In this paper, we propose a new k-means clustering method that uses Minkowski’s distance as its metric in a normed vector space which is the generalization of both the Euclidean distance and the Manhattan distance. The k-means clustering methods discussed in this paper are Forgy’s method, Lloyd’s method, MacQueen’s method, Hartigan and Wong’s method, Likas’ method and Faber’s method which uses the usual Euclidean distance. It was observed that the new k-means clustering method performed favourably in comparison with the existing methods in terms of minimization of the total intra-cluster variance using simulated data and real-life data sets.

Download Full-text

An Improved K-Means Algorithm Based on Evidence Distance

Entropy ◽

10.3390/e23111550 ◽

2021 ◽

Vol 23 (11) ◽

pp. 1550

Author(s):

Ailin Zhu ◽

Zexi Hua ◽

Yu Shi ◽

Yongchuan Tang ◽

Lingwei Miao

Keyword(s):

Euclidean Distance ◽

Gaussian Mixture ◽

Optimal Solutions ◽

Basic Probability ◽

Clustering Center ◽

Clustering Effect ◽

Sample Points ◽

Distance Parameter ◽

Selection Of ◽

Experimental Comparisons

The main influencing factors of the clustering effect of the k-means algorithm are the selection of the initial clustering center and the distance measurement between the sample points. The traditional k-mean algorithm uses Euclidean distance to measure the distance between sample points, thus it suffers from low differentiation of attributes between sample points and is prone to local optimal solutions. For this feature, this paper proposes an improved k-means algorithm based on evidence distance. Firstly, the attribute values of sample points are modelled as the basic probability assignment (BPA) of sample points. Then, the traditional Euclidean distance is replaced by the evidence distance for measuring the distance between sample points, and finally k-means clustering is carried out using UCI data. Experimental comparisons are made with the traditional k-means algorithm, the k-means algorithm based on the aggregation distance parameter, and the Gaussian mixture model. The experimental results show that the improved k-means algorithm based on evidence distance proposed in this paper has a better clustering effect and the convergence of the algorithm is also better.

Download Full-text

An Empirical Analysis of Similarity Matrix for Spectral Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.433-435.725 ◽

2013 ◽

Vol 433-435 ◽

pp. 725-730

Author(s):

Sheng Zhang ◽

Xiao Qi He ◽

Yang Guang Liu ◽

Qi Chun Huang

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Similarity Matrix ◽

Real Word ◽

Normalized Mutual Information ◽

Analysis Of Similarity ◽

Data Points ◽

Synthetic Datasets ◽

Multiple Metrics

Constructing the similarity matrix is the key step for spectral clustering, and its goal is to model the local neighborhood relationships between the data points. In order to evaluate the influence of similarity matrix on performance of the different spectral clustering algorithms and find the rules on how to construct an appropriate similarity matrix, a system empirical study was carried out. In the study, six recently proposed spectral clustering algorithms were selected as evaluation object, and normalized mutual information, F-measures and Rand Index were used as evaluation metrics. Then experiments were carried out on eight synthetic datasets and eleven real word datasets respectively. The experimental results show that with multiple metrics the results are more comprehensive and confident, and the comprehensive performance of locality spectral clustering algorithm is better than other five algorithms on synthetic datasets and real word datasets.

Download Full-text

Approximate spectral clustering using both reference vectors and topology of the network generated by growing neural gas

PeerJ Computer Science ◽

10.7717/peerj-cs.679 ◽

2021 ◽

Vol 7 ◽

pp. e679

Author(s):

Kazuhisa Fujita

Keyword(s):

Spectral Clustering ◽

Laplacian Matrix ◽

Computational Time ◽

Similarity Matrix ◽

Clustering Methods ◽

Growing Neural Gas ◽

Memory Space ◽

Neural Gas ◽

Clustering Quality ◽

Data Points

Spectral clustering (SC) is one of the most popular clustering methods and often outperforms traditional clustering methods. SC uses the eigenvectors of a Laplacian matrix calculated from a similarity matrix of a dataset. SC has serious drawbacks: the significant increases in the time complexity derived from the computation of eigenvectors and the memory space complexity to store the similarity matrix. To address the issues, I develop a new approximate spectral clustering using the network generated by growing neural gas (GNG), called ASC with GNG in this study. ASC with GNG uses not only reference vectors for vector quantization but also the topology of the network for extraction of the topological relationship between data points in a dataset. ASC with GNG calculates the similarity matrix from both the reference vectors and the topology of the network generated by GNG. Using the network generated from a dataset by GNG, ASC with GNG achieves to reduce the computational and space complexities and improve clustering quality. In this study, I demonstrate that ASC with GNG effectively reduces the computational time. Moreover, this study shows that ASC with GNG provides equal to or better clustering performance than SC.

Download Full-text