scholarly journals An Instance Selection Algorithm Based On Reverse k Nearest Neighbor

2013 ◽  
Vol 10 (7) ◽  
pp. 1858-1861
Author(s):  
Y. Jagruthi ◽  
Dr. Y. Ramadevi ◽  
A. Sangeeta

Classification is one of the most important data mining techniques. It belongs to supervised learning. The objective of classification is to assign class label to unlabelled data. As data is growing rapidly, handling it has become a major concern. So preprocessing should be done before classification and hence data reduction is essential. Data reduction is to extract a subset of features from a set of features of a data set. Data reduction helps in decreasing the storage requirement and increases the efficiency of classification. A way to measure data reduction is reduction rate. The main thing here is choosing representative samples to the final data set. There are many instance selection algorithms which are based on nearest neighbor decision rule (NN). These algorithms select samples on incremental strategy or decremental strategy. Both the incremental algorithms and decremental algorithms take much processing time as they iteratively scan the dataset. There is another instance selection algorithm, reverse nearest neighbor reduction (RNNR) based on the concept of reverse nearest neighbor (RNN). RNNR does not iteratively scan the data set. In this paper, we extend the RNN to RkNN and we use the concept of RNNR to RkNN. RkNN finds the query objects that has the query point as their k nearest-neighbors. Our approach utilizes the advantage of RNN and proposes to use the concept of RkNN. We have taken the dataset of theatres, hospitals and restaurants and extracted the sample set. Classification has been done the resultant sample data set. We observe two parameters here they are classification accuracy and reduction rate.

2017 ◽  
Vol 251 ◽  
pp. 26-34 ◽  
Author(s):  
Yunsheng Song ◽  
Jiye Liang ◽  
Jing Lu ◽  
Xingwang Zhao

2021 ◽  
Vol 40 (1) ◽  
pp. 521-533
Author(s):  
Junhai Zhai ◽  
Jiaxing Qi ◽  
Sufang Zhang

The condensed nearest neighbor (CNN) is a pioneering instance selection algorithm for 1-nearest neighbor. Many variants of CNN for K-nearest neighbor have been proposed by different researchers. However, few studies were conducted on condensed fuzzy K-nearest neighbor. In this paper, we present a condensed fuzzy K-nearest neighbor (CFKNN) algorithm that starts from an initial instance set S and iteratively selects informative instances from training set T, moving them from T to S. Specifically, CFKNN consists of three steps. First, for each instance x ∈ T, it finds the K-nearest neighbors in S and calculates the fuzzy membership degrees of the K nearest neighbors using S rather than T. Second it computes the fuzzy membership degrees of x using the fuzzy K-nearest neighbor algorithm. Finally, it calculates the information entropy of x and selects an instance according to the calculated value. Extensive experiments on 11 datasets are conducted to compare CFKNN with four state-of-the-art algorithms (CNN, edited nearest neighbor (ENN), Tomeklinks, and OneSidedSelection) regarding the number of selected instances, the testing accuracy, and the compression ratio. The experimental results show that CFKNN provides excellent performance and outperforms the other four algorithms.


Author(s):  
M. Jeyanthi ◽  
C. Velayutham

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.


1997 ◽  
Vol 08 (03) ◽  
pp. 301-315 ◽  
Author(s):  
Marcel J. Nijman ◽  
Hilbert J. Kappen

A Radial Basis Boltzmann Machine (RBBM) is a specialized Boltzmann Machine architecture that combines feed-forward mapping with probability estimation in the input space, and for which very efficient learning rules exist. The hidden representation of the network displays symmetry breaking as a function of the noise in the dynamics. Thus, generalization can be studied as a function of the noise in the neuron dynamics instead of as a function of the number of hidden units. We show that the RBBM can be seen as an elegant alternative of k-nearest neighbor, leading to comparable performance without the need to store all data. We show that the RBBM has good classification performance compared to the MLP. The main advantage of the RBBM is that simultaneously with the input-output mapping, a model of the input space is obtained which can be used for learning with missing values. We derive learning rules for the case of incomplete data, and show that they perform better on incomplete data than the traditional learning rules on a 'repaired' data set.


Polymers ◽  
2021 ◽  
Vol 13 (21) ◽  
pp. 3811
Author(s):  
Iosif Sorin Fazakas-Anca ◽  
Arina Modrea ◽  
Sorin Vlase

This paper proposes a new method for calculating the monomer reactivity ratios for binary copolymerization based on the terminal model. The original optimization method involves a numerical integration algorithm and an optimization algorithm based on k-nearest neighbour non-parametric regression. The calculation method has been tested on simulated and experimental data sets, at low (<10%), medium (10–35%) and high conversions (>40%), yielding reactivity ratios in a good agreement with the usual methods such as intersection, Fineman–Ross, reverse Fineman–Ross, Kelen–Tüdös, extended Kelen–Tüdös and the error in variable method. The experimental data sets used in this comparative analysis are copolymerization of 2-(N-phthalimido) ethyl acrylate with 1-vinyl-2-pyrolidone for low conversion, copolymerization of isoprene with glycidyl methacrylate for medium conversion and copolymerization of N-isopropylacrylamide with N,N-dimethylacrylamide for high conversion. Also, the possibility to estimate experimental errors from a single experimental data set formed by n experimental data is shown.


2018 ◽  
Vol 19 (1) ◽  
pp. 144-157
Author(s):  
Mehdi Zekriyapanah Gashti

Exponential growth of medical data and recorded resources from patients with different diseases can be exploited to establish an optimal association between disease symptoms and diagnosis. The main issue in diagnosis is the variability of the features that can be attributed for particular diseases, since some of these features are not essential for the diagnosis and may even lead to a delay in diagnosis. For instance, diabetes, hepatitis, breast cancer, and heart disease, that express multitudes of clinical manifestations as symptoms, are among the diseases with higher morbidity rate. Timely diagnosis of such diseases can play a critical role in decreasing their effect on patients’ quality of life and on the costs of their treatment. Thanks to the large data set available, computer aided diagnosis can be an advanced option for early diagnosis of the diseases. In this paper, using a Flower Pollination Algorithm (FPA) and K-Nearest Neighbor (KNN), a new method is suggested for diagnosis. The modified model can diagnose diseases more accurately by reducing the number of features. The main purpose of the modified model is that the Feature Selection (FS) should be done by FPA and data classification should be performed using KNN. The results showed higher efficiency of the modified model on diagnosis of diabetes, hepatitis, breast cancer, and heart diseases compared to the KNN models. ABSTRAK: Pertumbuhan eksponen dalam data perubatan dan sumber direkodkan daripada pesakit dengan penyakit berbeza boleh disalah guna bagi membentuk kebersamaan optimum antara simptom penyakit dan mengenal pasti gejala penyakit (diagnosis). Isu utama dalam diagnosis adalah kepelbagaian ciri yang dimiliki pada penyakit tertentu, sementara ciri-ciri ini tidak penting untuk didiagnosis dan boleh mengarah kepada penangguhan dalam diagnosis. Sebagai contoh, penyakit kencing manis, radang hati, barah payudara dan penyakit jantung, menunjukkan banyak klinikal simptom jelas dan merupakan penyakit tertinggi berlaku dalam masyarakat. Diagnosis tepat pada penyakit tersebut boleh memainkan peranan penting dalam mengurangkan kesan kualiti  hidup dan kos rawatan pesakit. Terima kasih kepada set data yang banyak, diagnosis dengan bantuan komputer boleh menjadi pilihan maju menuju ke arah diagnosis awal kepada penyakit. Kertas ini menggunakan Algoritma Flower Pollination (FPA) dan K-Nearest Neighbor (KNN), iaitu kaedah baru dicadangkan bagi diagnosis. Model yang diubah suai boleh mendiagnosis penyakit lebih tepat dengan mengurangkan bilangan ciri-ciri. Tujuan utama model yang diubah suai ini adalah bagi Pemilihan Ciri (FS) perlu dilakukan menggunakan FPA and pengkhususan data perlu dijalankan menggunakan KNN. Keputusan menunjukkan model yang diubah suai lebih cekap dalam mendiagnosis penyakit kencing manis, radang hati, barah payudara dan penyakit jantung berbanding model KNN.


2021 ◽  
Vol 87 (6) ◽  
pp. 445-455
Author(s):  
Yi Ma ◽  
Zezhong Zheng ◽  
Yutang Ma ◽  
Mingcang Zhu ◽  
Ran Huang ◽  
...  

Many manifold learning algorithms conduct an eigen vector analysis on a data-similarity matrix with a size of N×N, where N is the number of data points. Thus, the memory complexity of the analysis is no less than O(N2). We pres- ent in this article an incremental manifold learning approach to handle large hyperspectral data sets for land use identification. In our method, the number of dimensions for the high-dimensional hyperspectral-image data set is obtained with the training data set. A local curvature varia- tion algorithm is utilized to sample a subset of data points as landmarks. Then a manifold skeleton is identified based on the landmarks. Our method is validated on three AVIRIS hyperspectral data sets, outperforming the comparison algorithms with a k–nearest-neighbor classifier and achieving the second best performance with support vector machine.


Author(s):  
Sikha Bagui ◽  
Arup Kumar Mondal ◽  
Subhash Bagui

In this work the authors present a parallel k nearest neighbor (kNN) algorithm using locality sensitive hashing to preprocess the data before it is classified using kNN in Hadoop's MapReduce framework. This is compared with the sequential (conventional) implementation. Using locality sensitive hashing's similarity measure with kNN, the iterative procedure to classify a data object is performed within a hash bucket rather than the whole data set, greatly reducing the computation time needed for classification. Several experiments were run that showed that the parallel implementation performed better than the sequential implementation on very large datasets. The study also experimented with a few map and reduce side optimization features for the parallel implementation and presented some optimum map and reduce side parameters. Among the map side parameters, the block size and input split size were varied, and among the reduce side parameters, the number of planes were varied, and their effects were studied.


Sign in / Sign up

Export Citation Format

Share Document