Dynamic and Optimized Prototype Clustering for Relational Data based on Multiple Prototypes

In artificial intelligence related applications such as bio-medical, bio-informatics, data clustering is an important and complex task with different situations. Prototype based clustering is the reasonable and simplicity to describe and evaluate data which can be treated as non-vertical representation of relational data. Because of Barycentric space present in prototype clustering, maintain and update the structure of the cluster with different data points is still challenging task for different data points in bio-medical relational data. So that in this paper we propose and introduce A Novel Optimized Evidential C-Medoids (NOEC) which is relates to family o prototype based clustering approach for update and proximity of medical relational data. We use Ant Colony Optimization approach to enable the services of similarity with different features for relational update cluster medical data. Perform our approach on different bio-medical related synthetic data sets. Experimental results of proposed approach give better and efficient results with comparison of different parameters in terms of accuracy and time with processing of medical relational data sets.

Download Full-text

A Preference Model on Adaptive Affinity Propagation

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v8i3.pp1805-1813 ◽

2018 ◽

Vol 8 (3) ◽

pp. 1805 ◽

Cited By ~ 1

Author(s):

Rina Refianti ◽

Achmad Benny Mutiara ◽

Asep Juarna ◽

Adang Suhendra

Keyword(s):

Data Clustering ◽

Message Passing ◽

Clustering Algorithms ◽

Experimental Results ◽

Data Sets ◽

Similarity Matrix ◽

Preference Model ◽

New Model ◽

Data Points ◽

Scanning Algorithm

In recent years, two new data clustering algorithms have been proposed. One of them isAffinity Propagation (AP). AP is a new data clustering technique that use iterative message passing and consider all data points as potential exemplars. Two important inputs of AP are a similarity matrix (SM) of the data and the parameter ”preference” p. Although the original AP algorithm has shown much success in data clustering, it still suffer from one limitation: it is not easy to determine the value of the parameter ”preference” p which can result an optimal clustering solution. To resolve this limitation, we propose a new model of the parameter ”preference” p, i.e. it is modeled based on the similarity distribution. Having the SM and p, Modified Adaptive AP (MAAP) procedure is running. MAAP procedure means that we omit the adaptive p-scanning algorithm as in original Adaptive-AP (AAP) procedure. Experimental results on random non-partition and partition data sets show that (i) the proposed algorithm, MAAP-DDP, is slower than original AP for random non-partition dataset, (ii) for random 4-partition dataset and real datasets the proposed algorithm has succeeded to identify clusters according to the number of dataset’s true labels with the execution times that are comparable with those original AP. Beside that the MAAP-DDP algorithm demonstrates more feasible and effective than original AAP procedure.

Download Full-text

A Novel Scalable Signature Based Subspace Clustering Approach for Big Data

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2019040103 ◽

2019 ◽

Vol 14 (2) ◽

pp. 41-51 ◽

Cited By ~ 1

Author(s):

T. Gayathri ◽

D. Lalitha Bhaskari

Keyword(s):

Big Data ◽

Data Management ◽

Clustering Algorithms ◽

Synthetic Data ◽

Subspace Clustering ◽

Distance Measures ◽

Data Sets ◽

Management Tools ◽

Clustering Approach ◽

Different Dimensions

“Big data” as the name suggests is a collection of large and complicated data sets which are usually hard to process with on-hand data management tools or other conventional processing applications. A scalable signature based subspace clustering approach is presented in this article that would avoid identification of redundant clusters. Various distance measures are utilized to perform experiments that validate the performance of the proposed algorithm. Also, for the same purpose of validation, the synthetic data sets that are chosen have different dimensions, and their size will be distributed when opened with Weka. The F1 quality measure and the runtime of these synthetic data sets are computed. The performance of the proposed algorithm is compared with other existing clustering algorithms such as CLIQUE.INSCY and SUNCLU.

Download Full-text

GENERALIZED PARTICLE MODEL USED FOR DATA CLUSTERING

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001406005101 ◽

2006 ◽

Vol 20 (07) ◽

pp. 1001-1028 ◽

Cited By ~ 3

Author(s):

DIANXUN SHUAI ◽

XUE FANGLIANG

Keyword(s):

Data Clustering ◽

Large Scale ◽

Hardware Implementation ◽

Learning Ability ◽

High Dimensional ◽

Particle Model ◽

Massive Data ◽

Data Sets ◽

Massive Data Sets ◽

Clustering Approach

Data clustering has been widely used in many areas, such as data mining, statistics, machine learning and so on. A variety of clustering approaches have been proposed so far, but most of them are not qualified to quickly cluster a large-scale high-dimensional database. This paper is devoted to a novel data clustering approach based on a generalized particle model (GPM). The GPM transforms the data clustering process into a stochastic process over the configuration space on a GPM array. The proposed approach is characterized by the self-organizing clustering and many advantages in terms of the insensitivity to noise, quality robustness to clustered data, suitability for high-dimensional and massive data sets, learning ability, openness and easier hardware implementation with the VLSI systolic technology. The analysis and simulations have shown the effectiveness and good performance of the proposed GPM approach to data clustering.

Download Full-text

Noises Cutting and Natural Neighbors Spectral Clustering Based on Coupling P System

Processes ◽

10.3390/pr9030439 ◽

2021 ◽

Vol 9 (3) ◽

pp. 439

Author(s):

Xiaoling Zhang ◽

Xiyu Liu

Keyword(s):

Spectral Clustering ◽

Critical Density ◽

Synthetic Data ◽

P System ◽

Data Sets ◽

Affinity Matrix ◽

Clustering Method ◽

Data Points ◽

Comparison Algorithms ◽

Searching Method

Clustering analysis, a key step for many data mining problems, can be applied to various fields. However, no matter what kind of clustering method, noise points have always been an important factor affecting the clustering effect. In addition, in spectral clustering, the construction of affinity matrix affects the formation of new samples, which in turn affects the final clustering results. Therefore, this study proposes a noise cutting and natural neighbors spectral clustering method based on coupling P system (NCNNSC-CP) to solve the above problems. The whole algorithm process is carried out in the coupled P system. We propose a natural neighbors searching method without parameters, which can quickly determine the natural neighbors and natural characteristic value of data points. Then, based on it, the critical density and reverse density are obtained, and noise identification and cutting are performed. The affinity matrix constructed using core natural neighbors greatly improve the similarity between data points. Experimental results on nine synthetic data sets and six UCI datasets demonstrate that the proposed algorithm is better than other comparison algorithms.

Download Full-text

Affinity Learning for Mixed Data Clustering

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/302 ◽

2017 ◽

Cited By ~ 1

Author(s):

Nan Li ◽

Longin Jan Latecki

Keyword(s):

Data Clustering ◽

Mixed Type ◽

Original Data ◽

Mixed Data ◽

Abstract Objects ◽

Data Sets ◽

Process Data ◽

Real World Data ◽

Specific Data ◽

Data Points

In this paper, we propose a novel affinity learning based framework for mixed data clustering, which includes: how to process data with mixed-type attributes, how to learn affinities between data points, and how to exploit the learned affinities for clustering. In the proposed framework, each original data attribute is represented with several abstract objects defined according to the specific data type and values. Each attribute value is transformed into the initial affinities between the data point and the abstract objects of attribute. We refine these affinities and infer the unknown affinities between data points by taking into account the interconnections among the attribute values of all data points. The inferred affinities between data points can be exploited for clustering. Alternatively, the refined affinities between data points and the abstract objects of attributes can be transformed into new data features for clustering. Experimental results on many real world data sets demonstrate that the proposed framework is effective for mixed data clustering.

Download Full-text

Applying Visual Analysis Procedures to Multidimensional Medical Data

10.30987/graphicon-2019-2-122-126 ◽

2019 ◽

Cited By ~ 1

Author(s):

Александр Бондарев ◽

Aleksandr Bondarev ◽

Владимир Галактионов ◽

Vladimir Galaktionov

Keyword(s):

Preliminary Data ◽

Visual Analysis ◽

Original Data ◽

Medical Data ◽

Multidimensional Data ◽

Data Sets ◽

Data Filtering ◽

Data Points ◽

Multidimensional Data Sets

The paper considers the tasks of visual analysis of multidimensional data sets of medical origin. For visual analysis, the approach of building elastic maps is used. The elastic maps are used as the methods of original data points mapping to enclosed manifolds having less dimensionality. Diminishing the elasticity parameters one can design map surface which approximates the multidimensional dataset in question much better. To improve the results, a number of previously developed procedures are used - preliminary data filtering, removal of separated clusters (flotation). To solve the scalability problem, when the elastic map is adjusted both to the region of condensation of data points and to separately located points of the data cloud, the quasi-Zoom approach is applied. The illustrations of applying elastic maps to various sets of medical data are presented.

Download Full-text

Estimation of mutual information for real-valued data with error bars and controlled bias

10.1101/589929 ◽

2019 ◽

Cited By ~ 2

Author(s):

Caroline M. Holmes ◽

Ilya Nemenman

Keyword(s):

Mutual Information ◽

Free Parameter ◽

Probability Distributions ◽

Synthetic Data ◽

Quantum Systems ◽

Data Sets ◽

Hard Problem ◽

Self Consistent ◽

Data Points ◽

Error Bars

Estimation of mutual information between (multidimensional) real-valued variables is used in analysis of complex systems, biological systems, and recently also quantum systems. This estimation is a hard problem, and universally good estimators provably do not exist. Kraskov et al. (PRE, 2004) introduced a successful mutual information estimation approach based on the statistics of distances between neighboring data points, which empirically works for a wide class of underlying probability distributions. Here we improve this estimator by (i) expanding its range of applicability, and by providing (ii) a self-consistent way of verifying the absence of bias, (iii) a method for estimation of its variance, and (iv) a criterion for choosing the values of the free parameter of the estimator. We demonstrate the performance of our estimator on synthetic data sets, as well as on neurophysiological and systems biology data sets.

Download Full-text

AN ALGORITHMIC COMPUTATION OF CORRELATION DIMENSION FROM TIME SERIES

Modern Physics Letters B ◽

10.1142/s0217984907012517 ◽

2007 ◽

Vol 21 (02n03) ◽

pp. 129-138 ◽

Cited By ~ 5

Author(s):

K. P. HARIKRISHNAN ◽

G. AMBIKA ◽

R. MISRA

Keyword(s):

Time Series ◽

Hypothesis Testing ◽

Correlation Dimension ◽

Visual Inspection ◽

Chaotic Systems ◽

Synthetic Data ◽

Data Sets ◽

Data Points ◽

Scaling Region ◽

Low Dimensional

We present an algorithmic scheme to compute the correlation dimension D2 of a time series, without requiring the visual inspection of the scaling region in the correlation sum. It is based on the standard Grassberger–Proccacia [GP] algorithm for computing D2. The scheme is tested using synthetic data sets from several standard chaotic systems as well as by adding noise to low-dimensional chaotic data. We show that the scheme is efficient with a few thousand data points and is most suitable when a nonsubjective comparison of D2 values of two time series is required, such as, in hypothesis testing.

Download Full-text

A posteriori noise estimation in variable data sets

Astronomy and Astrophysics ◽

10.1051/0004-6361/201730618 ◽

2018 ◽

Vol 609 ◽

pp. A39 ◽

Cited By ~ 7

Author(s):

S. Czesla ◽

T. Molle ◽

J. H. M. M. Schmitt

Keyword(s):

Standard Deviation ◽

Synthetic Data ◽

Light Curves ◽

Weighted Sums ◽

Data Sets ◽

A Posteriori ◽

Sampled Data ◽

Data Set ◽

Specific Parameter ◽

Data Points

Most physical data sets contain a stochastic contribution produced by measurement noise or other random sources along with the signal. Usually, neither the signal nor the noise are accurately known prior to the measurement so that both have to be estimated a posteriori. We have studied a procedure to estimate the standard deviation of the stochastic contribution assuming normality and independence, requiring a sufficiently well-sampled data set to yield reliable results. This procedure is based on estimating the standard deviation in a sample of weighted sums of arbitrarily sampled data points and is identical to the so-called DER_SNR algorithm for specific parameter settings. To demonstrate the applicability of our procedure, we present applications to synthetic data, high-resolution spectra, and a large sample of space-based light curves and, finally, give guidelines to apply the procedure in situation not explicitly considered here to promote its adoption in data analysis.

Download Full-text

Antarctic Ice Mass Change Products from GRACE/GRACE-FO Using Tailored Sensitivity Kernels

Remote Sensing ◽

10.3390/rs13091736 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1736

Author(s):

Andreas Groh ◽

Martin Horwath

Keyword(s):

Regional Integration ◽

Synthetic Data ◽

Mass Change ◽

Data Sets ◽

Optimization Approach ◽

External Data ◽

Error Covariance ◽

Gravity Recovery ◽

The Antarctic ◽

Grace Mission

We derived gravimetric mass change products, i.e., gridded and basin-averaged mass changes, for the Antarctic Ice Sheet (AIS) from time-variable gravity-field solutions acquired by the Gravity Recovery and Climate Experiment (GRACE) mission and its successor GRACE-FO, covering more than 18 years. For this purpose, tailored sensitivity kernels (TSKs) were generated for the application in a regional integration approach. The TSKs were inferred in a formal optimization approach minimizing the sum of both propagated mission errors and leakage errors. We accounted for mission errors by means of an empirical error covariance model, while assumptions on signal variances of potential sources of leakage were used to minimize leakage errors. To identify the optimal parameters to be used in the TSK generation, we assessed a set of TSKs by quantifying signal leakage from the processing of synthetic data and by inferring the noise level of the derived basin products. The finally selected TSKs were used to calculate mass change products from GRACE/GRACE-FO Level-2 spherical harmonic solutions covering 2002-04 to 2020-07. These products were compared to external data sets from satellite altimetry and the input–output method. For the period under investigation, the mass balance of the AIS was quantified to be −90.9±43.5 Gt a−1, corresponding to a mean sea-level rise of 0.25±0.12 mm a−1.

Download Full-text