Natural Neighbor Reduction Algorithm for Instance-based Learning

Lijun Yang; Qingsheng Zhu; Jinlong Huang; Dongdong Cheng; Cheng Zhang

doi:10.4018/ijcini.2016100103

Natural Neighbor Reduction Algorithm for Instance-based Learning

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/ijcini.2016100103 ◽

2016 ◽

Vol 10 (4) ◽

pp. 59-73 ◽

Cited By ~ 2

Author(s):

Lijun Yang ◽

Qingsheng Zhu ◽

Jinlong Huang ◽

Dongdong Cheng ◽

Cheng Zhang

Keyword(s):

Classification Accuracy ◽

Hybrid Approach ◽

Reduction Algorithm ◽

Storage Space ◽

Computational Costs ◽

Condensation Method ◽

Instance Reduction ◽

Instance Based Learning ◽

Natural Neighbor ◽

Competitive Algorithms

Instance reduction is aimed at reducing prohibitive computational costs and the storage space for instance-based learning. The most frequently used methods include the condensation and edition approaches. Condensation method removes the patterns far from the decision boundary and do not contribute to better classification accuracy, while edition method removes noisy patterns to improve the classification accuracy. In this paper, a new hybrid algorithm called instance reduction algorithm based on natural neighbor and nearest enemy is presented. At first, an edition algorithm is proposed to filter noisy patterns and smooth the class boundaries by using natural neighbor. The main advantage of the algorithm is that it does not require any user-defined parameters. Then, using a new condensation method based on nearest enemy to reduce instances far from decision line. Through this algorithm, interior instances are discarded. Experiments show that the hybrid approach effectively reduces the number of instances while achieves higher classification accuracy along with competitive algorithms.

Download Full-text

A Clustering Algorithm in Stream Data Using Strong Coreset

Journal of Interconnection Networks ◽

10.1142/s0219265921430118 ◽

2021 ◽

Author(s):

Manmohan Singh ◽

Rajendra Pamula ◽

Alok Kumar

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Local Optimum ◽

Reduction Algorithm ◽

Stream Data ◽

Stream Data Mining ◽

Clustering Approach ◽

Approximation Guarantee ◽

Competitive Algorithms ◽

Learning Data

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.

Download Full-text

A Novel Hybrid Approach for Chronic Disease Classification

International Journal of Healthcare Information Systems and Informatics ◽

10.4018/ijhisi.2020010101 ◽

2020 ◽

Vol 15 (1) ◽

pp. 1-19 ◽

Cited By ~ 1

Author(s):

Divya Jain ◽

Vijendra Singh

Keyword(s):

Chronic Disease ◽

Classification Accuracy ◽

Hybrid Approach ◽

Substantial Reduction ◽

Disease Classification ◽

Computational Time ◽

Second Phase ◽

Two Phase ◽

Hybrid Classification ◽

Diagnostic Framework

A two-phase diagnostic framework based on hybrid classification for the diagnosis of chronic disease is proposed. In the first phase, feature selection via ReliefF method and feature extraction via PCA method are incorporated. In the second phase, efficient optimization of SVM parameters via grid search method is performed. The proposed hybrid classification approach is then tested with seven popular chronic disease datasets using a cross-validation method. Experiments are then conducted to evaluate the presented classification method vis-à-vis four other existing classifiers that are applied on the same chronic disease datasets. Results show that the presented approach reduces approximately 40% of the extraneous and surplus features with substantial reduction in the execution time for mining all datasets, achieving the highest classification accuracy of 98.5%. It is concluded that with the presented approach, excellent classification accuracy is achieved for each chronic disease dataset while irrelevant and redundant features may be eliminated, thereby substantially reducing the diagnostic complexity and resulting computational time.

Download Full-text

A Method of Image Affective Semantic Classification Based on Attribution Reduction

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.4820 ◽

2014 ◽

Vol 556-562 ◽

pp. 4820-4824

Author(s):

Ying Xia ◽

Le Mi ◽

Hae Young Bae

Keyword(s):

Feature Selection ◽

Rough Set ◽

Classification Accuracy ◽

Selection Process ◽

Attribute Reduction ◽

Image Features ◽

Decision Table ◽

Reduction Algorithm ◽

Semantic Classification ◽

Low Level

In study of image affective semantic classification, one problem is the low classification accuracy caused by low-level redundant features. To eliminate the redundancy, a novel image affective classification method based on attributes reduction is proposed. In this method, a decision table is built from the extraction of image features first. And then valid low-level features are determined through the feature selection process using the rough set attribute reduction algorithm. Finally, the semantic recognition is done using SVM. Experiment results show that the proposed method improves the accuracy in image affective semantic classification significantly.

Download Full-text

An efficient reduction algorithm based on natural neighbor and nearest enemy

2016 IEEE 15th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC) ◽

10.1109/icci-cc.2016.7862037 ◽

2016 ◽

Author(s):

Lijun Yang ◽

Qingsheng Zhu ◽

Jinlong Huang ◽

Dongdong Cheng

Keyword(s):

Reduction Algorithm ◽

Natural Neighbor ◽

Efficient Reduction

Download Full-text

IMPROVING CLASSIFICATION ACCURACY AND CAUSAL KNOWLEDGE FOR BETTER CREDIT DECISIONS

International Journal of Neural Systems ◽

10.1142/s0129065711002845 ◽

2011 ◽

Vol 21 (04) ◽

pp. 297-309 ◽

Cited By ~ 16

Author(s):

WEI-WEN WU

Keyword(s):

Classification Accuracy ◽

Search Algorithm ◽

Credit Scoring ◽

Hybrid Approach ◽

Machine Learning Techniques ◽

Causal Knowledge ◽

Scoring Model ◽

Learning Techniques ◽

Bayesian Network Classifier ◽

Credit Scoring Model

Numerous studies have contributed to efforts to boost the accuracy of the credit scoring model. Especially interesting are recent studies which have successfully developed the hybrid approach, which advances classification accuracy by combining different machine learning techniques. However, to achieve better credit decisions, it is not enough merely to increase the accuracy of the credit scoring model. It is necessary to conduct meaningful supplementary analyses in order to obtain knowledge of causal relations, particularly in terms of significant conceptual patterns or structures involving attributes used in the credit scoring model. This paper proposes a solution of integrating data preprocessing strategies and the Bayesian network classifier with the tree augmented Na"ıve Bayes search algorithm, in order to improve classification accuracy and to obtain improved knowledge of causal patterns, thus enhancing the validity of credit decisions.

Download Full-text

The Effect of Clustering in Filter Method Results Applied in Medical Datasets

International Journal of Healthcare Information Systems and Informatics ◽

10.4018/ijhisi.2021010103 ◽

2021 ◽

Vol 16 (1) ◽

pp. 38-57

Author(s):

Nadjla Elong ◽

Sidi Ahmed Rahal

Keyword(s):

Feature Selection ◽

Classification Accuracy ◽

Hybrid Approach ◽

Filter Method ◽

Agglomerative Clustering ◽

Clustering Method ◽

Formal Evaluation ◽

Filter Methods ◽

Hierarchical Agglomerative Clustering ◽

Analytic Processing

For a deeper and richer analytic processing of medical datasets, feature selection aims to eliminate redundant and irrelevant features from the data. While filter has been touted as one of the simplest methods for feature selection, its applications have generally failed to identify and deal with embedded similarities among features. In this research, a hybrid approach for feature selection based on combining the filter method with the hierarchical agglomerative clustering method is proposed to eliminate irrelevant and redundant features in four medical datasets. A formal evaluation of the proposed approach unveils major improvements in the classification accuracy when results are compared to those obtained via only the applications of the filter methods and/or more classical-based feature selection approaches.

Download Full-text

Improved Heterogeneous Distance Functions

Journal of Artificial Intelligence Research ◽

10.1613/jair.346 ◽

1997 ◽

Vol 6 ◽

pp. 1-34 ◽

Cited By ~ 652

Author(s):

D. R. Wilson ◽

T. R. Martinez

Keyword(s):

Classification Accuracy ◽

Distance Functions ◽

Distance Metrics ◽

Instance Based Learning ◽

Learning Techniques ◽

Value Difference Metric

Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes.

Download Full-text

HYEI: A New Hybrid Evolutionary Imperialist Competitive Algorithm for Fuzzy Knowledge Discovery

Advances in Fuzzy Systems ◽

10.1155/2014/970541 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 1

Author(s):

D. Jalal Nouri ◽

M. Saniee Abadeh ◽

F. Ghareh Mohammadi

Keyword(s):

Classification Accuracy ◽

Hybrid Approach ◽

Imperialist Competitive Algorithm ◽

Classification Systems ◽

Fuzzy Classification ◽

Test Problems ◽

Feature Subset ◽

Classification Rules ◽

Competitive Algorithm ◽

Benchmark Datasets

In recent years, imperialist competitive algorithm (ICA), genetic algorithm (GA), and hybrid fuzzy classification systems have been successfully and effectively employed for classification tasks of data mining. Due to overcoming the gaps related to ineffectiveness of current algorithms for analysing high-dimension independent datasets, a new hybrid approach, named HYEI, is presented to discover generic rule-based systems in this paper. This proposed approach consists of three stages and combines an evolutionary-based fuzzy system with two ICA procedures to generate high-quality fuzzy-classification rules. Initially, the best feature subset is selected by using the embedded ICA feature selection, and then these features are used to generate basic fuzzy-classification rules. Finally, all rules are optimized by using an ICA algorithm to reduce their length or to eliminate some of them. The performance of HYEI has been evaluated by using several benchmark datasets from the UCI machine learning repository. The classification accuracy attained by the proposed algorithm has the highest classification accuracy in 6 out of the 7 dataset problems and is comparative to the classification accuracy of the 5 other test problems, as compared to the best results previously published.

Download Full-text

Natural neighborhood graph-based instance reduction algorithm without parameters

Applied Soft Computing ◽

10.1016/j.asoc.2018.05.029 ◽

2018 ◽

Vol 70 ◽

pp. 279-287 ◽

Cited By ~ 6

Author(s):

Lijun Yang ◽

Qingsheng Zhu ◽

Jinlong Huang ◽

Dongdong Cheng ◽

Quanwang Wu ◽

...

Keyword(s):

Reduction Algorithm ◽

Neighborhood Graph ◽

Instance Reduction

Download Full-text

A Hybrid Approach Based on Genetic Algorithm and Particle Swarm Optimization to Improve Neural Network Classification

Deep Learning and Neural Networks ◽

10.4018/978-1-7998-0414-7.ch082 ◽

2020 ◽

pp. 1458-1479

Author(s):

Nabil M. Hewahi ◽

Enas Abu Hamra

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

Particle Swarm Optimization ◽

Classification Accuracy ◽

Hybrid Approach ◽

Particle Swarm ◽

Mathematical Methods ◽

Swarm Optimization ◽

Artificial Neural Network Ann ◽

Domain Independent

Artificial Neural Network (ANN) has played a significant role in many areas because of its ability to solve many complex problems that mathematical methods failed to solve. However, it has some shortcomings that lead it to stop working in some cases or decrease the result accuracy. In this research the authors propose a new approach combining particle swarm optimization algorithm (PSO) and genetic algorithm (GA), to increase the classification accuracy of ANN. The proposed approach utilizes the advantages of both PSO and GA to overcome the local minima problem of ANN, which prevents ANN from improving the classification accuracy. The algorithms start with using backpropagation algorithm, then it keeps repeating applying GA followed by PSO until the optimum classification is reached. The proposed approach is domain independent and has been evaluated by applying it using nine datasets with various domains and characteristics. A comparative study has been performed between the authors' proposed approach and other previous approaches, the results show the superiority of our approach.

Download Full-text