An Instance Selection Algorithm Based On Reverse k Nearest Neighbor

Classification is one of the most important data mining techniques. It belongs to supervised learning. The objective of classification is to assign class label to unlabelled data. As data is growing rapidly, handling it has become a major concern. So preprocessing should be done before classification and hence data reduction is essential. Data reduction is to extract a subset of features from a set of features of a data set. Data reduction helps in decreasing the storage requirement and increases the efficiency of classification. A way to measure data reduction is reduction rate. The main thing here is choosing representative samples to the final data set. There are many instance selection algorithms which are based on nearest neighbor decision rule (NN). These algorithms select samples on incremental strategy or decremental strategy. Both the incremental algorithms and decremental algorithms take much processing time as they iteratively scan the dataset. There is another instance selection algorithm, reverse nearest neighbor reduction (RNNR) based on the concept of reverse nearest neighbor (RNN). RNNR does not iteratively scan the data set. In this paper, we extend the RNN to RkNN and we use the concept of RNNR to RkNN. RkNN finds the query objects that has the query point as their k nearest-neighbors. Our approach utilizes the advantage of RNN and proposes to use the concept of RkNN. We have taken the dataset of theatres, hospitals and restaurants and extracted the sample set. Classification has been done the resultant sample data set. We observe two parameters here they are classification accuracy and reduction rate.

Download Full-text

An efficient instance selection algorithm for k nearest neighbor regression

Neurocomputing ◽

10.1016/j.neucom.2017.04.018 ◽

2017 ◽

Vol 251 ◽

pp. 26-34 ◽

Cited By ~ 49

Author(s):

Yunsheng Song ◽

Jiye Liang ◽

Jing Lu ◽

Xingwang Zhao

Keyword(s):

Nearest Neighbor ◽

Instance Selection ◽

Selection Algorithm ◽

K Nearest Neighbor

Download Full-text

An Instance Selection Algorithm Based on Reverse Nearest Neighbor

Advances in Knowledge Discovery and Data Mining - Lecture Notes in Computer Science ◽

10.1007/978-3-642-20841-6_1 ◽

2011 ◽

pp. 1-12 ◽

Cited By ~ 3

Author(s):

Bi-Ru Dai ◽

Shu-Ming Hsu

Keyword(s):

Nearest Neighbor ◽

Instance Selection ◽

Selection Algorithm ◽

Reverse Nearest Neighbor

Download Full-text

An instance selection algorithm for fuzzy K-nearest neighbor

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200124 ◽

2021 ◽

Vol 40 (1) ◽

pp. 521-533

Author(s):

Junhai Zhai ◽

Jiaxing Qi ◽

Sufang Zhang

Keyword(s):

Nearest Neighbor ◽

Nearest Neighbors ◽

Fuzzy Membership ◽

Instance Selection ◽

Selection Algorithm ◽

K Nearest Neighbor ◽

Training Set ◽

K Nearest Neighbors ◽

K Nearest Neighbor Algorithm ◽

Testing Accuracy

The condensed nearest neighbor (CNN) is a pioneering instance selection algorithm for 1-nearest neighbor. Many variants of CNN for K-nearest neighbor have been proposed by different researchers. However, few studies were conducted on condensed fuzzy K-nearest neighbor. In this paper, we present a condensed fuzzy K-nearest neighbor (CFKNN) algorithm that starts from an initial instance set S and iteratively selects informative instances from training set T, moving them from T to S. Specifically, CFKNN consists of three steps. First, for each instance x ∈ T, it finds the K-nearest neighbors in S and calculates the fuzzy membership degrees of the K nearest neighbors using S rather than T. Second it computes the fuzzy membership degrees of x using the fuzzy K-nearest neighbor algorithm. Finally, it calculates the information entropy of x and selects an instance according to the calculated value. Extensive experiments on 11 datasets are conducted to compare CFKNN with four state-of-the-art algorithms (CNN, edited nearest neighbor (ENN), Tomeklinks, and OneSidedSelection) regarding the number of selected instances, the testing accuracy, and the compression ratio. The experimental results show that CFKNN provides excellent performance and outperforms the other four algorithms.

Download Full-text

Machine Learning Verdict of EEG Signals in Brain Computer Interface

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1838114 ◽

2018 ◽

pp. 429-441

Author(s):

M. Jeyanthi ◽

C. Velayutham

Keyword(s):

Nearest Neighbor ◽

Technology Development ◽

Vital Role ◽

Svm Classifier ◽

K Nearest Neighbor ◽

Data Mining Technique ◽

Data Set ◽

Eeg Data ◽

Irrelevant Attributes

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.

Download Full-text

Plant leaf recognition method based on clonal selection algorithm and K-nearest neighbor

Journal of Computer Applications ◽

10.3724/sp.j.1087.2013.02009 ◽

2013 ◽

Vol 33 (7) ◽

pp. 2009-2013 ◽

Cited By ~ 1

Author(s):

Ning ZHANG ◽

Wenping LIU

Keyword(s):

Nearest Neighbor ◽

Clonal Selection ◽

Clonal Selection Algorithm ◽

Selection Algorithm ◽

K Nearest Neighbor ◽

Recognition Method ◽

Plant Leaf

Download Full-text

Symmetry Breaking and Training from Incomplete Data with Radial Basis Boltzmann Machines

International Journal of Neural Systems ◽

10.1142/s0129065797000318 ◽

1997 ◽

Vol 08 (03) ◽

pp. 301-315 ◽

Cited By ~ 8

Author(s):

Marcel J. Nijman ◽

Hilbert J. Kappen

Keyword(s):

Symmetry Breaking ◽

Incomplete Data ◽

Missing Values ◽

Nearest Neighbor ◽

Boltzmann Machine ◽

K Nearest Neighbor ◽

Data Set ◽

Input Space ◽

Learning Rules ◽

Radial Basis

A Radial Basis Boltzmann Machine (RBBM) is a specialized Boltzmann Machine architecture that combines feed-forward mapping with probability estimation in the input space, and for which very efficient learning rules exist. The hidden representation of the network displays symmetry breaking as a function of the noise in the dynamics. Thus, generalization can be studied as a function of the noise in the neuron dynamics instead of as a function of the number of hidden units. We show that the RBBM can be seen as an elegant alternative of k-nearest neighbor, leading to comparable performance without the need to store all data. We show that the RBBM has good classification performance compared to the MLP. The main advantage of the RBBM is that simultaneously with the input-output mapping, a model of the input space is obtained which can be used for learning with missing values. We derive learning rules for the case of incomplete data, and show that they perform better on incomplete data than the traditional learning rules on a 'repaired' data set.

Download Full-text

Determination of Reactivity Ratios from Binary Copolymerization Using the k-Nearest Neighbor Non-Parametric Regression

Polymers ◽

10.3390/polym13213811 ◽

2021 ◽

Vol 13 (21) ◽

pp. 3811

Author(s):

Iosif Sorin Fazakas-Anca ◽

Arina Modrea ◽

Sorin Vlase

Keyword(s):

Experimental Data ◽

Nearest Neighbor ◽

Optimization Method ◽

Reactivity Ratios ◽

Data Sets ◽

K Nearest Neighbor ◽

Integration Algorithm ◽

Data Set ◽

Parametric Regression ◽

Non Parametric

This paper proposes a new method for calculating the monomer reactivity ratios for binary copolymerization based on the terminal model. The original optimization method involves a numerical integration algorithm and an optimization algorithm based on k-nearest neighbour non-parametric regression. The calculation method has been tested on simulated and experimental data sets, at low (<10%), medium (10–35%) and high conversions (>40%), yielding reactivity ratios in a good agreement with the usual methods such as intersection, Fineman–Ross, reverse Fineman–Ross, Kelen–Tüdös, extended Kelen–Tüdös and the error in variable method. The experimental data sets used in this comparative analysis are copolymerization of 2-(N-phthalimido) ethyl acrylate with 1-vinyl-2-pyrolidone for low conversion, copolymerization of isoprene with glycidyl methacrylate for medium conversion and copolymerization of N-isopropylacrylamide with N,N-dimethylacrylamide for high conversion. Also, the possibility to estimate experimental errors from a single experimental data set formed by n experimental data is shown.

Download Full-text

A MODIFIED MODEL BASED ON FLOWER POLLINATION ALGORITHM AND K-NEAREST NEIGHBOR FOR DIAGNOSING DISEASES

IIUM Engineering Journal ◽

10.31436/iiumej.v19i1.854 ◽

2018 ◽

Vol 19 (1) ◽

pp. 144-157

Author(s):

Mehdi Zekriyapanah Gashti

Keyword(s):

Breast Cancer ◽

Nearest Neighbor ◽

Heart Diseases ◽

Critical Role ◽

Clinical Manifestations ◽

Flower Pollination Algorithm ◽

K Nearest Neighbor ◽

Data Set ◽

Modified Model ◽

Flower Pollination

Exponential growth of medical data and recorded resources from patients with different diseases can be exploited to establish an optimal association between disease symptoms and diagnosis. The main issue in diagnosis is the variability of the features that can be attributed for particular diseases, since some of these features are not essential for the diagnosis and may even lead to a delay in diagnosis. For instance, diabetes, hepatitis, breast cancer, and heart disease, that express multitudes of clinical manifestations as symptoms, are among the diseases with higher morbidity rate. Timely diagnosis of such diseases can play a critical role in decreasing their effect on patientsâ€™ quality of life and on the costs of their treatment. Thanks to the large data set available, computer aided diagnosis can be an advanced option for early diagnosis of the diseases. In this paper, using a Flower Pollination Algorithm (FPA) and K-Nearest Neighbor (KNN), a new method is suggested for diagnosis. The modified model can diagnose diseases more accurately by reducing the number of features. The main purpose of the modified model is that the Feature Selection (FS) should be done by FPA and data classification should be performed using KNN. The results showed higher efficiency of the modified model on diagnosis of diabetes, hepatitis, breast cancer, and heart diseases compared to the KNN models. ABSTRAK: Pertumbuhan eksponen dalam data perubatan dan sumber direkodkan daripada pesakit dengan penyakit berbeza boleh disalah guna bagi membentuk kebersamaan optimum antara simptom penyakit dan mengenal pasti gejala penyakit (diagnosis). Isu utama dalam diagnosis adalah kepelbagaian ciri yang dimiliki pada penyakit tertentu, sementara ciri-ciri ini tidak penting untuk didiagnosis dan boleh mengarah kepada penangguhan dalam diagnosis. Sebagai contoh, penyakit kencing manis, radang hati, barah payudara dan penyakit jantung, menunjukkan banyak klinikal simptom jelas dan merupakan penyakit tertinggi berlaku dalam masyarakat. Diagnosis tepat pada penyakit tersebut boleh memainkan peranan penting dalam mengurangkan kesan kualitiÂ hidup dan kos rawatan pesakit. Terima kasih kepada set data yang banyak, diagnosis dengan bantuan komputer boleh menjadi pilihan maju menuju ke arah diagnosis awal kepada penyakit. Kertas ini menggunakan Algoritma Flower Pollination (FPA) dan K-Nearest Neighbor (KNN), iaitu kaedah baru dicadangkan bagi diagnosis. Model yang diubah suai boleh mendiagnosis penyakit lebih tepat dengan mengurangkan bilangan ciri-ciri. Tujuan utama model yang diubah suai ini adalah bagi Pemilihan Ciri (FS) perlu dilakukan menggunakan FPA and pengkhususan data perlu dijalankan menggunakan KNN. Keputusan menunjukkan model yang diubah suai lebih cekap dalam mendiagnosis penyakit kencing manis, radang hati, barah payudara dan penyakit jantung berbanding model KNN.

Download Full-text

An Incremental Isomap Method for Hyperspectral Dimensionality Reduction and Classification

Photogrammetric Engineering & Remote Sensing ◽

10.14358/pers.87.7.445 ◽

2021 ◽

Vol 87 (6) ◽

pp. 445-455

Author(s):

Yi Ma ◽

Zezhong Zheng ◽

Yutang Ma ◽

Mingcang Zhu ◽

Ran Huang ◽

...

Keyword(s):

Manifold Learning ◽

Nearest Neighbor ◽

Hyperspectral Image ◽

Hyperspectral Data ◽

Training Data ◽

Support Vector ◽

Data Sets ◽

K Nearest Neighbor ◽

Data Set ◽

Data Points

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.

Download Full-text

Improving the Performance of kNN in the MapReduce Framework Using Locality Sensitive Hashing

International Journal of Distributed Systems and Technologies ◽

10.4018/ijdst.2019100101 ◽

2019 ◽

Vol 10 (4) ◽

pp. 1-16

Author(s):

Sikha Bagui ◽

Arup Kumar Mondal ◽

Subhash Bagui

Keyword(s):

Nearest Neighbor ◽

Parallel Implementation ◽

Block Size ◽

Computation Time ◽

Locality Sensitive Hashing ◽

K Nearest Neighbor ◽

Mapreduce Framework ◽

Data Set ◽

Data Object ◽

Very Large Datasets

In this work the authors present a parallel k nearest neighbor (kNN) algorithm using locality sensitive hashing to preprocess the data before it is classified using kNN in Hadoop's MapReduce framework. This is compared with the sequential (conventional) implementation. Using locality sensitive hashing's similarity measure with kNN, the iterative procedure to classify a data object is performed within a hash bucket rather than the whole data set, greatly reducing the computation time needed for classification. Several experiments were run that showed that the parallel implementation performed better than the sequential implementation on very large datasets. The study also experimented with a few map and reduce side optimization features for the parallel implementation and presented some optimum map and reduce side parameters. Among the map side parameters, the block size and input split size were varied, and among the reduce side parameters, the number of planes were varied, and their effects were studied.

Download Full-text