The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study

Mohammad Alshayeb; Mashaan A. Alshammari

doi:10.4114/intartif.vol24iss68pp72-88

The Effect of the Dataset Size on the Accuracy of Software Defect Prediction Models: An Empirical Study

INTELIGENCIA ARTIFICIAL ◽

10.4114/intartif.vol24iss68pp72-88 ◽

2021 ◽

Vol 24 (68) ◽

pp. 72-88

Author(s):

Mohammad Alshayeb ◽

Mashaan A. Alshammari

Keyword(s):

Feature Selection ◽

Prediction Model ◽

Prediction Models ◽

Fault Prediction ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Dataset Size ◽

Defect Prediction Models ◽

Selection Algorithms

The ongoing development of computer systems requires massive software projects. Running the components of these huge projects for testing purposes might be a costly process; therefore, parameter estimation can be used instead. Software defect prediction models are crucial for software quality assurance. This study investigates the impact of dataset size and feature selection algorithms on software defect prediction models. We use two approaches to build software defect prediction models: a statistical approach and a machine learning approach with support vector machines (SVMs). The fault prediction model was built based on four datasets of different sizes. Additionally, four feature selection algorithms were used. We found that applying the SVM defect prediction model on datasets with a reduced number of measures as features may enhance the accuracy of the fault prediction model. Also, it directs the test effort to maintain the most influential set of metrics. We also found that the running time of the SVM fault prediction model is not consistent with dataset size. Therefore, having fewer metrics does not guarantee a shorter execution time. From the experiments, we found that dataset size has a direct influence on the SVM fault prediction model. However, reduced datasets performed the same or slightly lower than the original datasets.

Download Full-text

Incremental Feature Selection Method for Software Defect Prediction

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1252.0782s319 ◽

2019 ◽

Vol 8 (2S3) ◽

pp. 1345-1353 ◽

Cited By ~ 1

Keyword(s):

Feature Selection ◽

Software Metrics ◽

Prediction Models ◽

Search Algorithm ◽

Feature Selection Method ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Defect Prediction Models ◽

Selection Of

Software defect prediction models are essential for understanding quality attributes relevant for software organization to deliver better software reliability. This paper focuses mainly based on the selection of attributes in the perspective of software quality estimation for incremental database. A new dimensionality reduction method Wilk’s Lambda Average Threshold (WLAT) is presented for selection of optimal features which are used for classifying modules as fault prone or not. This paper uses software metrics and defect data collected from benchmark data sets. The comparative results confirm that the statistical search algorithm (WLAT) outperforms the other relevant feature selection methods for most classifiers. The main advantage of the proposed WLAT method is: The selected features can be reused when there is increase or decrease in database size, without the need of extracting features afresh. In addition, performances of the defect prediction models either remains unchanged or improved even after eliminating 85% of the software metrics.

Download Full-text

Class Imbalance Issue in Software Defect Prediction Models by various Machine Learning Techniques: An Empirical Study

10.1109/icscc51209.2021.9528170 ◽

2021 ◽

Author(s):

Sushant Kumar Pandey ◽

Anil Kumar Tripathi

Keyword(s):

Machine Learning ◽

Empirical Study ◽

Prediction Models ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Learning Techniques ◽

Defect Prediction Models

Download Full-text

Impact of the Distribution Parameter of Data Sampling Approaches on Software Defect Prediction Models

2017 24th Asia-Pacific Software Engineering Conference (APSEC) ◽

10.1109/apsec.2017.76 ◽

2017 ◽

Author(s):

Kwabena Ebo Bennin ◽

Jacky Keung ◽

Akito Monden

Keyword(s):

Prediction Models ◽

Distribution Parameter ◽

Defect Prediction ◽

Software Defect Prediction ◽

Data Sampling ◽

Software Defect ◽

Defect Prediction Models

Download Full-text

On the assessment of software defect prediction models via ROC curves

Empirical Software Engineering ◽

10.1007/s10664-020-09861-4 ◽

2020 ◽

Vol 25 (5) ◽

pp. 3977-4019

Author(s):

Sandro Morasca ◽

Luigi Lavazza

Keyword(s):

Prediction Models ◽

Roc Curves ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Defect Prediction Models

Download Full-text

Evaluating Performance of Software Defect Prediction Models Using Area Under Precision-Recall Curve (AUC-PR)

2019 2nd International Conference on Advancements in Computational Sciences (ICACS) ◽

10.23919/icacs.2019.8689135 ◽

2019 ◽

Cited By ~ 1

Author(s):

Shahzad Ali Khan ◽

Zeeshan Ali Rana

Keyword(s):

Prediction Models ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Precision Recall Curve ◽

Defect Prediction Models ◽

Recall Curve

Download Full-text

A Review of Software Defect Prediction Models

Data Management, Analytics and Innovation - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-13-1402-5_7 ◽

2018 ◽

pp. 89-97 ◽

Cited By ~ 1

Author(s):

Harshita Tanwar ◽

Misha Kakkar

Keyword(s):

Prediction Models ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Defect Prediction Models

Download Full-text

Toward comprehensible software defect prediction models using fuzzy logic

2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS) ◽

10.1109/icsess.2016.7883031 ◽

2016 ◽

Cited By ~ 1

Author(s):

Hamdi A. Al-Jamimi

Keyword(s):

Fuzzy Logic ◽

Prediction Models ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Defect Prediction Models

Download Full-text

The Use of Ensemble-Based Data Preprocessing Techniques for Software Defect Prediction

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194014400105 ◽

2014 ◽

Vol 24 (09) ◽

pp. 1229-1253 ◽

Cited By ~ 3

Author(s):

Kehan Gao ◽

Taghi M. Khoshgoftaar ◽

Amri Napolitano

Keyword(s):

Feature Selection ◽

Prediction Models ◽

Measurement Data ◽

Class Imbalance ◽

Data Preprocessing ◽

High Dimensionality ◽

Training Dataset ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect

Software defect prediction models that use software metrics such as code-level measurements and defect data to build classification models are useful tools for identifying potentially-problematic program modules. Effectiveness of detecting such modules is affected by the software measurements used, making data preprocessing an important step during software quality prediction. Generally, there are two problems affecting software measurement data: high dimensionality (where a training dataset has an extremely large number of independent attributes, or features) and class imbalance (where a training dataset has one class with relatively many more members than the other class). In this paper, we present a novel form of ensemble learning based on boosting that incorporates data sampling to alleviate class imbalance and feature (software metric) selection to address high dimensionality. As we adopt two different sampling methods (Random Undersampling (RUS) and Synthetic Minority Oversampling (SMOTE)) in the technique, we have two forms of our new ensemble-based approach: selectRUSBoost and selectSMOTEBoost. To evaluate the effectiveness of these new techniques, we apply them to two groups of datasets from two real-world software systems. In the experiments, four learners and nine feature selection techniques are employed to build our models. We also consider versions of the technique which do not incorporate feature selection, and compare all four techniques (the two different ensemble-based approaches which utilize feature selection and the two versions which use sampling only). The experimental results demonstrate that selectRUSBoost is generally more effective in improving defect prediction performance than selectSMOTEBoost, and that the techniques with feature selection do help for getting better prediction than the techniques without feature selection.

Download Full-text

Towards Design and Feasibility Analysis of DePaaS: AI Based Global Unified Software Defect Prediction Framework

Applied Sciences ◽

10.3390/app12010493 ◽

2022 ◽

Vol 12 (1) ◽

pp. 493

Author(s):

Mahesha Pandit ◽

Deepali Gupta ◽

Divya Anand ◽

Nitin Goyal ◽

Hani Moaiteq Aljahdali ◽

...

Keyword(s):

Software Development ◽

Prediction Models ◽

Easy Access ◽

Defect Prediction ◽

Software Defect Prediction ◽

Research And Practice ◽

Software Defect ◽

Software Modules ◽

Software Development Teams ◽

Defect Prediction Models

Using artificial intelligence (AI) based software defect prediction (SDP) techniques in the software development process helps isolate defective software modules, count the number of software defects, and identify risky code changes. However, software development teams are unaware of SDP and do not have easy access to relevant models and techniques. The major reason for this problem seems to be the fragmentation of SDP research and SDP practice. To unify SDP research and practice this article introduces a cloud-based, global, unified AI framework for SDP called DePaaS—Defects Prediction as a Service. The article describes the usage context, use cases and detailed architecture of DePaaS and presents the first response of the industry practitioners to DePaaS. In a first of its kind survey, the article captures practitioner’s belief into SDP and ability of DePaaS to solve some of the known challenges of the field of software defect prediction. This article also provides a novel process for SDP, detailed description of the structure and behaviour of DePaaS architecture components, six best SDP models offered by DePaaS, a description of algorithms that recommend SDP models, feature sets and tunable parameters, and a rich set of challenges to build, use and sustain DePaaS. With the contributions of this article, SDP research and practice could be unified enabling building and using more pragmatic defect prediction models leading to increase in the efficiency of software testing.

Download Full-text

Performance of Genetic Programming-based Software Defect Prediction Models

International Journal of Performability Engineering ◽

10.23940/ijpe.21.09.p5.787795 ◽

2021 ◽

Vol 17 (9) ◽

pp. 787

Author(s):

Pandit Mahesha ◽

Gupta Deepali

Keyword(s):

Genetic Programming ◽

Prediction Models ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defect ◽

Defect Prediction Models

Download Full-text