An Intelligent Fusion Algorithm and Its Application Based on Subgroup Migration and Adaptive Boosting

Imbalanced data and feature redundancies are common problems in many fields, especially in software defect prediction, data mining, machine learning, and industrial big data application. To resolve these problems, we propose an intelligent fusion algorithm, SMPSO-HS-AdaBoost, which combines particle swarm optimization based on subgroup migration and adaptive boosting based on hybrid-sampling. In this paper, we apply the proposed intelligent fusion algorithm to software defect prediction to improve the prediction efficiency and accuracy by solving the issues caused by imbalanced data and feature redundancies. The results show that the proposed algorithm resolves the coexisting problems of imbalanced data and feature redundancies, and ensures the efficiency and accuracy of software defect prediction.

Download Full-text

Impact of imbalanced data on the performance of software defect prediction classifiers

Journal of Physics Conference Series ◽

10.1088/1742-6596/1345/2/022026 ◽

2019 ◽

Vol 1345 ◽

pp. 022026

Author(s):

Lichao Wang ◽

Wei Wang ◽

Bingyou Liu ◽

Shuqiao Geng

Keyword(s):

Imbalanced Data ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Handling Imbalanced Data using Ensemble Learning in Software Defect Prediction

2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence) ◽

10.1109/confluence47617.2020.9058124 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ruchika Malhotra ◽

Juhi Jain

Keyword(s):

Ensemble Learning ◽

Imbalanced Data ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Applying Weighted Particle Swarm Optimization to Imbalanced Data in Software Defect Prediction

Lecture Notes in Networks and Systems - New Technologies, Development and Application ◽

10.1007/978-3-319-90893-9_35 ◽

2018 ◽

pp. 289-296 ◽

Cited By ~ 2

Author(s):

Lucija Brezočnik ◽

Vili Podgorelec

Keyword(s):

Particle Swarm Optimization ◽

Particle Swarm ◽

Imbalanced Data ◽

Defect Prediction ◽

Software Defect Prediction ◽

Swarm Optimization ◽

Software Defect

Download Full-text

An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data

Neurocomputing ◽

10.1016/j.neucom.2018.04.090 ◽

2019 ◽

Vol 343 ◽

pp. 120-140 ◽

Cited By ~ 11

Author(s):

Ruchika Malhotra ◽

Shine Kamal

Keyword(s):

Empirical Study ◽

Imbalanced Data ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Imbalanced Data Processing Model for Software Defect Prediction

Wireless Personal Communications ◽

10.1007/s11277-017-5117-z ◽

2017 ◽

Vol 102 (2) ◽

pp. 937-950 ◽

Cited By ~ 2

Author(s):

Lijuan Zhou ◽

Ran Li ◽

Shudong Zhang ◽

Hua Wang

Keyword(s):

Data Processing ◽

Imbalanced Data ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Attribute Selection and Imbalanced Data: Problems in Software Defect Prediction

2010 22nd IEEE International Conference on Tools with Artificial Intelligence ◽

10.1109/ictai.2010.27 ◽

2010 ◽

Cited By ~ 53

Author(s):

Taghi M. Khoshgoftaar ◽

Kehan Gao ◽

Naeem Seliya

Keyword(s):

Imbalanced Data ◽

Attribute Selection ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Software Defect Prediction Using Ensemble Learning: An ANP Based Evaluation Method

FUOYE Journal of Engineering and Technology ◽

10.46792/fuoyejet.v3i2.200 ◽

2018 ◽

Vol 3 (2) ◽

Cited By ~ 7

Author(s):

Abdullateef O Balogun ◽

Amos O Bajeh ◽

Victor A Orie ◽

Ayisat W Yusuf-Asaju

Keyword(s):

Decision Tree ◽

Evaluation Method ◽

Performance Metrics ◽

Ensemble Methods ◽

Analytic Network Process ◽

Defect Prediction ◽

Mining Machine ◽

Software Defect Prediction ◽

Software Defect ◽

Boosting Method

Software defect prediction (SDP) is the process of predicting defects in software modules, it identifies the modules that are defective and require extensive testing. Classification algorithms that help to predict software defects play a major role in software engineering process. Some studies have depicted that the use of ensembles is often more accurate than using single classifiers. However, variations exist from studies, which posited that the efficiency of learning algorithms might vary using different performance measures. This is because most studies on SDP consider the accuracy of the model or classifier above other performance metrics. This paper evaluated the performance of single classifiers (SMO, MLP, kNN and Decision Tree) and ensembles (Bagging, Boosting, Stacking and Voting) in SDP considering major performance metrics using Analytic Network Process (ANP) multi-criteria decision method. The experiment was based on 11 performance metrics over 11 software defect datasets. Boosted SMO, Voting and Stacking Ensemble methods ranked highest with a priority level of 0.0493, 0.0493 and 0.0445 respectively. Decision tree ranked highest in single classifiers with 0.0410. These clearly show that ensemble methods can give better classification results in SDP and Boosting method gave the best result. In essence, it is valid to say that before deciding which model or classifier is better for software defect prediction, all performance metrics should be considered.Keywords— Data mining, Machine Learning, Multi Criteria Decision Making, Software Defect Prediction

Download Full-text

Feature Selection with Imbalanced Data for Software Defect Prediction

2009 International Conference on Machine Learning and Applications ◽

10.1109/icmla.2009.18 ◽

2009 ◽

Cited By ~ 21

Author(s):

Taghi M. Khoshgoftaar ◽

Kehan Gao

Keyword(s):

Feature Selection ◽

Imbalanced Data ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Download Full-text

Software Defect Prediction Based on Cost-Sensitive Dictionary Learning

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194019500384 ◽

2019 ◽

Vol 29 (09) ◽

pp. 1219-1243 ◽

Cited By ~ 1

Author(s):

Hongyan Wan ◽

Guoqing Wu ◽

Mali Yu ◽

Mengting Yuan

Keyword(s):

Sparse Representation ◽

Dictionary Learning ◽

Class Imbalance ◽

Imbalanced Data ◽

Prediction Method ◽

Elastic Net ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Software Modules

Software defect prediction technology has been widely used in improving the quality of software system. Most real software defect datasets tend to have fewer defective modules than defective-free modules. Highly class-imbalanced data typically make accurate predictions difficult. The imbalanced nature of software defect datasets makes the prediction model classifying a defective module as a defective-free one easily. As there exists the similarity during the different software modules, one module can be represented by the sparse representation coefficients over the pre-defined dictionary which consists of historical software defect datasets. In this study, we make use of dictionary learning method to predict software defect. We optimize the classifier parameters and the dictionary atoms iteratively, to ensure that the extracted features (sparse representation) are optimal for the trained classifier. We prove the optimal condition of the elastic net which is used to solve the sparse coding coefficients and the regularity of the elastic net solution. Due to the reason that the misclassification of defective modules generally incurs much higher cost risk than the misclassification of defective-free ones, we take the different misclassification costs into account, increasing the punishment on misclassification defective modules in the procedure of dictionary learning, making the classification inclining to classify a module as a defective one. Thus, we propose a cost-sensitive software defect prediction method using dictionary learning (CSDL). Experimental results on the 10 class-imbalance datasets of NASA show that our method is more effective than several typical state-of-the-art defect prediction methods.

Download Full-text

Solving the Imbalanced Class Problem in Software Defect Prediction Using GANS

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a2165.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 8683-8687

Keyword(s):

Data Distribution ◽

Imbalanced Data ◽

Local Information ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Minority Class ◽

Software Defect ◽

Learning Techniques ◽

Conventional Machine

Prediction of software defects is a highly researched and important domain for cost - saving advantage in software development. Different methods of classification using attributes of static code were used to predict defects in software.However, the defective instances count is very minimal compared to the count of non - defective instances and this leads to imbalanced data, where the ratio of data class is not equal. For such data, conventional machine learning techniques give poor results.While there are different strategies to address this issue, normal oversampling methods are different versions of the SMOTE algorithm, These approaches are based on local information,instead of the complete distribution of minority class.GANs is used to approximate the true data distribution of minority class data used for software defect prediction.

Download Full-text