Statistical Clustering and Classification

Author(s):  
James Z. Wang
Author(s):  
Souad Azzouzi ◽  
Amal Hjouji ◽  
Jaouad EL- Mekkaoui ◽  
Ahmed EL Khalfi

The Fuzzy C-means (FCM) algorithm has been widely used in the field of clustering and classification but has encountered difficulties with noisy data and outliers. Other versions of algorithms related to possibilistic theory have given good results, such as Fuzzy C- Means(FCM), possibilistic C-means (PCM), Fuzzy possibilistic C-means (FPCM) and possibilistic fuzzy C- Means algorithm (PFCM).This last algorithm works effectively in some environments but encountered more shortcomings with noisy databases. To solve this problem, we propose in this manuscript, a new algorithm named Improved Possibilistic Fuzzy C-Means (ImPFCM) by combining the PFCM algorithm with a very powerful statistical method. The properties of this new ImPFCM algorithm show that it is not only applicable on clusters of spherical shapes, but also on clusters of different sizes and densities. The results of the comparative study with very recent algorithms indicate the performance and the superiority of the proposed approach to easily group the datasets in a large-dimensional space and to use not only the Euclidean distance but more sophisticated standards norms, capable to deal with much more complicated problems. On the other hand, we have demonstrated that the ImPFCM algorithm is also capable of detecting the cluster center with high accuracy and performing satisfactorily in multiple environments with noisy data and outliers.


Author(s):  
Charles Bouveyron ◽  
Gilles Celeux ◽  
T. Brendan Murphy ◽  
Adrian E. Raftery

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Ali A. Amer ◽  
Hassan I. Abdalla

Abstract Similarity measures have long been utilized in information retrieval and machine learning domains for multi-purposes including text retrieval, text clustering, text summarization, plagiarism detection, and several other text-processing applications. However, the problem with these measures is that, until recently, there has never been one single measure recorded to be highly effective and efficient at the same time. Thus, the quest for an efficient and effective similarity measure is still an open-ended challenge. This study, in consequence, introduces a new highly-effective and time-efficient similarity measure for text clustering and classification. Furthermore, the study aims to provide a comprehensive scrutinization for seven of the most widely used similarity measures, mainly concerning their effectiveness and efficiency. Using the K-nearest neighbor algorithm (KNN) for classification, the K-means algorithm for clustering, and the bag of word (BoW) model for feature selection, all similarity measures are carefully examined in detail. The experimental evaluation has been made on two of the most popular datasets, namely, Reuters-21 and Web-KB. The obtained results confirm that the proposed set theory-based similarity measure (STB-SM), as a pre-eminent measure, outweighs all state-of-art measures significantly with regards to both effectiveness and efficiency.


Author(s):  
Marianne van Hage ◽  
Peter Schmid-Grendelmeier ◽  
Chrysanthi Skevaki ◽  
Mario Plebani ◽  
Walter Canonica ◽  
...  

Abstract Background: After the re-introduction of ImmunoCAP Methods: The study was carried out at 22 European and one South African site. Microarrays from different batches, eight specific IgE (sIgE) positive, three sIgE negative serum samples and a calibration sample were sent to participating laboratories where assays were performed according to the manufacturer’s instructions. Results: For both the negative and positive samples results were consistent between sites, with a very low frequency of false positive results (0.014%). A similar pattern of results for each of the samples was observed across the 23 sites. Homogeneity analysis of all measurements for each sample were well clustered, indicating good reproducibility; unsupervised hierarchical clustering and classification via random forests, showed clustering of identical samples independent of the assay site. Analysis of raw continuous data confirmed the good accuracy across the study sites; averaged standardized, site-specific ISU-E values fell close to the center of the distribution of measurements from all sites. After outlier filtering, variability across the whole study was estimated at 25.5%, with values of 22%, 27.1% and 22.4% for the ‘Low’, ‘Moderate to High’ and ‘Very High’ concentration categories, respectively. Conclusions: The study shows a robust performance of the ImmunoCAP


2016 ◽  
Vol 9 (9) ◽  
pp. 4425-4445 ◽  
Author(s):  
Nikola Besic ◽  
Jordi Figueras i Ventura ◽  
Jacopo Grazioli ◽  
Marco Gabella ◽  
Urs Germann ◽  
...  

Abstract. Polarimetric radar-based hydrometeor classification is the procedure of identifying different types of hydrometeors by exploiting polarimetric radar observations. The main drawback of the existing supervised classification methods, mostly based on fuzzy logic, is a significant dependency on a presumed electromagnetic behaviour of different hydrometeor types. Namely, the results of the classification largely rely upon the quality of scattering simulations. When it comes to the unsupervised approach, it lacks the constraints related to the hydrometeor microphysics. The idea of the proposed method is to compensate for these drawbacks by combining the two approaches in a way that microphysical hypotheses can, to a degree, adjust the content of the classes obtained statistically from the observations. This is done by means of an iterative approach, performed offline, which, in a statistical framework, examines clustered representative polarimetric observations by comparing them to the presumed polarimetric properties of each hydrometeor class. Aside from comparing, a routine alters the content of clusters by encouraging further statistical clustering in case of non-identification. By merging all identified clusters, the multi-dimensional polarimetric signatures of various hydrometeor types are obtained for each of the studied representative datasets, i.e. for each radar system of interest. These are depicted by sets of centroids which are then employed in operational labelling of different hydrometeors. The method has been applied on three C-band datasets, each acquired by different operational radar from the MeteoSwiss Rad4Alp network, as well as on two X-band datasets acquired by two research mobile radars. The results are discussed through a comparative analysis which includes a corresponding supervised and unsupervised approach, emphasising the operational potential of the proposed method.


Sign in / Sign up

Export Citation Format

Share Document