Natural Neighbor Reduction Algorithm for Instance-based Learning

Author(s):  
Lijun Yang ◽  
Qingsheng Zhu ◽  
Jinlong Huang ◽  
Dongdong Cheng ◽  
Cheng Zhang

Instance reduction is aimed at reducing prohibitive computational costs and the storage space for instance-based learning. The most frequently used methods include the condensation and edition approaches. Condensation method removes the patterns far from the decision boundary and do not contribute to better classification accuracy, while edition method removes noisy patterns to improve the classification accuracy. In this paper, a new hybrid algorithm called instance reduction algorithm based on natural neighbor and nearest enemy is presented. At first, an edition algorithm is proposed to filter noisy patterns and smooth the class boundaries by using natural neighbor. The main advantage of the algorithm is that it does not require any user-defined parameters. Then, using a new condensation method based on nearest enemy to reduce instances far from decision line. Through this algorithm, interior instances are discarded. Experiments show that the hybrid approach effectively reduces the number of instances while achieves higher classification accuracy along with competitive algorithms.

Author(s):  
Manmohan Singh ◽  
Rajendra Pamula ◽  
Alok Kumar

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.


Author(s):  
Divya Jain ◽  
Vijendra Singh

A two-phase diagnostic framework based on hybrid classification for the diagnosis of chronic disease is proposed. In the first phase, feature selection via ReliefF method and feature extraction via PCA method are incorporated. In the second phase, efficient optimization of SVM parameters via grid search method is performed. The proposed hybrid classification approach is then tested with seven popular chronic disease datasets using a cross-validation method. Experiments are then conducted to evaluate the presented classification method vis-à-vis four other existing classifiers that are applied on the same chronic disease datasets. Results show that the presented approach reduces approximately 40% of the extraneous and surplus features with substantial reduction in the execution time for mining all datasets, achieving the highest classification accuracy of 98.5%. It is concluded that with the presented approach, excellent classification accuracy is achieved for each chronic disease dataset while irrelevant and redundant features may be eliminated, thereby substantially reducing the diagnostic complexity and resulting computational time.


2014 ◽  
Vol 556-562 ◽  
pp. 4820-4824
Author(s):  
Ying Xia ◽  
Le Mi ◽  
Hae Young Bae

In study of image affective semantic classification, one problem is the low classification accuracy caused by low-level redundant features. To eliminate the redundancy, a novel image affective classification method based on attributes reduction is proposed. In this method, a decision table is built from the extraction of image features first. And then valid low-level features are determined through the feature selection process using the rough set attribute reduction algorithm. Finally, the semantic recognition is done using SVM. Experiment results show that the proposed method improves the accuracy in image affective semantic classification significantly.


2011 ◽  
Vol 21 (04) ◽  
pp. 297-309 ◽  
Author(s):  
WEI-WEN WU

Numerous studies have contributed to efforts to boost the accuracy of the credit scoring model. Especially interesting are recent studies which have successfully developed the hybrid approach, which advances classification accuracy by combining different machine learning techniques. However, to achieve better credit decisions, it is not enough merely to increase the accuracy of the credit scoring model. It is necessary to conduct meaningful supplementary analyses in order to obtain knowledge of causal relations, particularly in terms of significant conceptual patterns or structures involving attributes used in the credit scoring model. This paper proposes a solution of integrating data preprocessing strategies and the Bayesian network classifier with the tree augmented Na"ıve Bayes search algorithm, in order to improve classification accuracy and to obtain improved knowledge of causal patterns, thus enhancing the validity of credit decisions.


Author(s):  
Nadjla Elong ◽  
Sidi Ahmed Rahal

For a deeper and richer analytic processing of medical datasets, feature selection aims to eliminate redundant and irrelevant features from the data. While filter has been touted as one of the simplest methods for feature selection, its applications have generally failed to identify and deal with embedded similarities among features. In this research, a hybrid approach for feature selection based on combining the filter method with the hierarchical agglomerative clustering method is proposed to eliminate irrelevant and redundant features in four medical datasets. A formal evaluation of the proposed approach unveils major improvements in the classification accuracy when results are compared to those obtained via only the applications of the filter methods and/or more classical-based feature selection approaches.


1997 ◽  
Vol 6 ◽  
pp. 1-34 ◽  
Author(s):  
D. R. Wilson ◽  
T. R. Martinez

Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
D. Jalal Nouri ◽  
M. Saniee Abadeh ◽  
F. Ghareh Mohammadi

In recent years, imperialist competitive algorithm (ICA), genetic algorithm (GA), and hybrid fuzzy classification systems have been successfully and effectively employed for classification tasks of data mining. Due to overcoming the gaps related to ineffectiveness of current algorithms for analysing high-dimension independent datasets, a new hybrid approach, named HYEI, is presented to discover generic rule-based systems in this paper. This proposed approach consists of three stages and combines an evolutionary-based fuzzy system with two ICA procedures to generate high-quality fuzzy-classification rules. Initially, the best feature subset is selected by using the embedded ICA feature selection, and then these features are used to generate basic fuzzy-classification rules. Finally, all rules are optimized by using an ICA algorithm to reduce their length or to eliminate some of them. The performance of HYEI has been evaluated by using several benchmark datasets from the UCI machine learning repository. The classification accuracy attained by the proposed algorithm has the highest classification accuracy in 6 out of the 7 dataset problems and is comparative to the classification accuracy of the 5 other test problems, as compared to the best results previously published.


2018 ◽  
Vol 70 ◽  
pp. 279-287 ◽  
Author(s):  
Lijun Yang ◽  
Qingsheng Zhu ◽  
Jinlong Huang ◽  
Dongdong Cheng ◽  
Quanwang Wu ◽  
...  

2020 ◽  
pp. 1458-1479
Author(s):  
Nabil M. Hewahi ◽  
Enas Abu Hamra

Artificial Neural Network (ANN) has played a significant role in many areas because of its ability to solve many complex problems that mathematical methods failed to solve. However, it has some shortcomings that lead it to stop working in some cases or decrease the result accuracy. In this research the authors propose a new approach combining particle swarm optimization algorithm (PSO) and genetic algorithm (GA), to increase the classification accuracy of ANN. The proposed approach utilizes the advantages of both PSO and GA to overcome the local minima problem of ANN, which prevents ANN from improving the classification accuracy. The algorithms start with using backpropagation algorithm, then it keeps repeating applying GA followed by PSO until the optimum classification is reached. The proposed approach is domain independent and has been evaluated by applying it using nine datasets with various domains and characteristics. A comparative study has been performed between the authors' proposed approach and other previous approaches, the results show the superiority of our approach.


Sign in / Sign up

Export Citation Format

Share Document