scholarly journals An Intelligent Fusion Algorithm and Its Application Based on Subgroup Migration and Adaptive Boosting

Symmetry ◽  
2021 ◽  
Vol 13 (4) ◽  
pp. 569
Author(s):  
Timing Li ◽  
Lei Yang ◽  
Kewen Li ◽  
Jiannan Zhai

Imbalanced data and feature redundancies are common problems in many fields, especially in software defect prediction, data mining, machine learning, and industrial big data application. To resolve these problems, we propose an intelligent fusion algorithm, SMPSO-HS-AdaBoost, which combines particle swarm optimization based on subgroup migration and adaptive boosting based on hybrid-sampling. In this paper, we apply the proposed intelligent fusion algorithm to software defect prediction to improve the prediction efficiency and accuracy by solving the issues caused by imbalanced data and feature redundancies. The results show that the proposed algorithm resolves the coexisting problems of imbalanced data and feature redundancies, and ensures the efficiency and accuracy of software defect prediction.

2017 ◽  
Vol 102 (2) ◽  
pp. 937-950 ◽  
Author(s):  
Lijuan Zhou ◽  
Ran Li ◽  
Shudong Zhang ◽  
Hua Wang

Author(s):  
Abdullateef O Balogun ◽  
Amos O Bajeh ◽  
Victor A Orie ◽  
Ayisat W Yusuf-Asaju

Software defect prediction (SDP) is the process of predicting defects in software modules, it identifies the modules that are defective and require extensive testing. Classification algorithms that help to predict software defects play a major role in software engineering process. Some studies have depicted that the use of ensembles is often more accurate than using single classifiers. However, variations exist from studies, which posited that the efficiency of learning algorithms might vary using different performance measures. This is because most studies on SDP consider the accuracy of the model or classifier above other performance metrics. This paper evaluated the performance of single classifiers (SMO, MLP, kNN and Decision Tree) and ensembles (Bagging, Boosting, Stacking and Voting) in SDP considering major performance metrics using Analytic Network Process (ANP) multi-criteria decision method. The experiment was based on 11 performance metrics over 11 software defect datasets. Boosted SMO, Voting and Stacking Ensemble methods ranked highest with a priority level of 0.0493, 0.0493 and 0.0445 respectively. Decision tree ranked highest in single classifiers with 0.0410. These clearly show that ensemble methods can give better classification results in SDP and Boosting method gave the best result. In essence, it is valid to say that before deciding which model or classifier is better for software defect prediction, all performance metrics should be considered.Keywords— Data mining, Machine Learning,  Multi Criteria Decision Making, Software Defect Prediction


Author(s):  
Hongyan Wan ◽  
Guoqing Wu ◽  
Mali Yu ◽  
Mengting Yuan

Software defect prediction technology has been widely used in improving the quality of software system. Most real software defect datasets tend to have fewer defective modules than defective-free modules. Highly class-imbalanced data typically make accurate predictions difficult. The imbalanced nature of software defect datasets makes the prediction model classifying a defective module as a defective-free one easily. As there exists the similarity during the different software modules, one module can be represented by the sparse representation coefficients over the pre-defined dictionary which consists of historical software defect datasets. In this study, we make use of dictionary learning method to predict software defect. We optimize the classifier parameters and the dictionary atoms iteratively, to ensure that the extracted features (sparse representation) are optimal for the trained classifier. We prove the optimal condition of the elastic net which is used to solve the sparse coding coefficients and the regularity of the elastic net solution. Due to the reason that the misclassification of defective modules generally incurs much higher cost risk than the misclassification of defective-free ones, we take the different misclassification costs into account, increasing the punishment on misclassification defective modules in the procedure of dictionary learning, making the classification inclining to classify a module as a defective one. Thus, we propose a cost-sensitive software defect prediction method using dictionary learning (CSDL). Experimental results on the 10 class-imbalance datasets of NASA show that our method is more effective than several typical state-of-the-art defect prediction methods.


2019 ◽  
Vol 8 (3) ◽  
pp. 8683-8687

Prediction of software defects is a highly researched and important domain for cost - saving advantage in software development. Different methods of classification using attributes of static code were used to predict defects in software.However, the defective instances count is very minimal compared to the count of non - defective instances and this leads to imbalanced data, where the ratio of data class is not equal. For such data, conventional machine learning techniques give poor results.While there are different strategies to address this issue, normal oversampling methods are different versions of the SMOTE algorithm, These approaches are based on local information,instead of the complete distribution of minority class.GANs is used to approximate the true data distribution of minority class data used for software defect prediction.


Sign in / Sign up

Export Citation Format

Share Document