Conference PaperPDF Available

An Ensemble Model for Software Defect Prediction

Authors:

Abstract and Figures

Software testing is one of the important ways to ensure the quality of software. It is found that testing cost more than 50% of overall project cost. Effective and efficient software testing utilizes the minimum resources of software. Therefore, it is important to construct the procedure which is not only able to perform the efficient testing but also minimizes the utilization of project resources. The goal of software testing is to find maximum defects in the software system. As world is continuously moving toward data driven approach for making important decision. Therefore, in this research paper we performed the machine learning analysis on the publicly available datasets and tried to achieve the maximum accuracy. The major focus of the paper is to apply different machine learning techniques on the datasets and find out which technique produce efficient result. Particularly, we proposed an ensemble learning models and perform comparative analysis among KNN, Decision tree, SVM and Naïve Bayes on different datasets and it is demonstrated that performance of Ensemble method is more than other methods in term of accuracy, precision, recall and F1-score. The classification accuracy of ensemble model trained on CM1 is 98.56%, classification accuracy of ensemble model trained on KM2 is 98.18% similarly, the classification accuracy of ensemble learning model trained on PC1 is 99.27%. This reveals that ensemble learning is more efficient method for making the defect prediction as compared other techniques.
Content may be subject to copyright.
An Ensemble Model for Software Defect Prediction
Amad Rizwan Ali
UserMaven INC
amad.ali@usermaven.com
Tahir Muhammad Ali
Department of Computer Science
College of Science and Arts
GULF University for Science and Technology
Kuwait
ali.t@gust.edu.kw
Attique Ur Rehman
Department of Software Engineering
University Of Sialkot
Sialkot, Pakistan
attique.urrehman@uskt.edu.pk
Muhammad Abbas
College of Electrical
and Mechanical Engineering
National University of Science
and Technology
Rawapindi, Pakistan
Ali Nawaz
College of Electrical
and Mechanical Engineering
National University of Science
and Technology
Rawapindi, Pakistan
anawazcse19ceme.ce.ceme.edu.pk
Abstract—Software testing is one of the important ways to
ensure the quality of software. It is found that testing cost more
than 50% of overall project cost. Effective and efficient software
testing utilizes the minimum resources of software. Therefore, it
is important to construct the procedure which is not only able to
perform the efficient testing but also minimizes the utilization of
project resources. The goal of software testing is to find maximum
defects in the software system. As world is continuously moving
toward data driven approach for making important decision.
Therefore, in this research paper we performed the machine
learning analysis on the publicly available datasets and tried
to achieve the maximum accuracy. The major focus of the
paper is to apply different machine learning techniques on the
datasets and find out which technique produce efficient result.
Particularly, we proposed an ensemble learning models and
perform comparative analysis among KNN, Decision tree, SVM
and Na¨
ıve Bayes on different datasets and it is demonstrated
that performance of Ensemble method is more than other
methods in term of accuracy, precision, recall and F1-score. The
classification accuracy of ensemble model trained on CM1 is
98.56%, classification accuracy of ensemble model trained on
KM2 is 98.18% similarly, the classification accuracy of ensemble
learning model trained on PC1 is 99.27%. This reveals that
ensemble learning is more efficient method for making the defect
prediction as compared other techniques.
Index Terms—Software Quality Engineering, Software testing,
Machine learning, Supervised learning, Software Defects
I. INT ROD UC TI ON
The success of any software completely depends on proper
software development process and testing is the important
phase of software development life cycle. It is important
step to ensure quality in the software as minute defect in
the software effects the later stages of software development
tremendously. Software testing consumes more than 50% of
the overall development cost [1] and the effort required by
the software testing is approximately 40-60% of the overall
development process [2]. Therefore, software testing needs
to manage efficiently so that resources would be effectively
utilize. Software testing can be performed either manually
or automated. Manual testing is time consuming as well as
inaccurate due to involvement of human being as compared to
automated testing, which is more accurate and time sufficient.
The goal of software testing is to find maximum number
of defects in the software. Software defect prediction is the
process to identify the defects in the software. Early finding
of defects not only effects the quality of software but also
helps the effective utilization of resources. Machine learning
is the widely used technique to predict or find defects in
the software [8], [9]. From last few decades the applicability
of machine learning in the real-world problems rises due to
availability of huge amount of labelled data. Machine learning
is the ability of computer to learn from data [2]. Machine
learning is classified into three categories 1) Supervised learn-
ing 2) Unsupervised learning 3) Semi-supervised learning. In
supervised learning, there is both features and labels while
in unsupervised learning there is feature only and in semi-
supervised learning there is small amount of labelled data and
huge amount of unlabeled data. The application of supervised
learning is more than unsupervised and semi supervised learn-
ing. Unsupervised learning is also widely used for software
defect detection [3], [4], [5], [6]. Supervised learning is further
categorized into classification and regression. In regression, the
labels are continuous variables while the labels in classification
are discrete variables. In this paper we are dealing with classi-
fication task as datasets are labels with discrete variables. This
paper proposed the ensemble leaning technique for increasing
the accuracy of defect prediction. The datasets used in the
experiment is the open-source data of PROMISE repository
[7] CM1, KC1, PC1. The result of the proposed technique
is compared with few prominent machine learning algorithms
such as K nearest neighbors (KNN), support vector machine
(SVM), decision tree (DT) and it is shown that results of
proposed ensemble learning method is effective and sound.
The evaluation metrics used for comparison of results are
precision, recall, accuracy, F1-score and ROC curve.
978-1-6654-9891-7/22/$31.00 ©2022 IEEE
2022 2nd International Conference on Digital Futures and Transformative Technologies (ICoDT2) | 978-1-6654-9819-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/ICoDT255437.2022.9787439
Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on June 09,2022 at 07:23:26 UTC from IEEE Xplore. Restrictions apply.
Fig. 1: Research Paper Overview
The main contributions of the papers are summarized as;
Proposed ensemble learning model for the defect predic-
tion.
Detailed comparison of proposed model with models of
machine learning.
Achieve defect prediction accuracy of maximum 99%
The rest of the research paper is organized as follows. The next
section will review some literature. The proposed methodology
is described in Section III and the corresponding experimental
results are presented in Section IV. The final section concludes
the research paper. The research paper overview is shown in
Fig 1.
II. LITERATURE REVIEW
According to [10], software testing is the integral part
pf software development life cycle. The overall success of
software relies on the testing phase. A large number of
machine learning techniques are proposed for efficient defect
detection in the software. C. Manjula et al. [11] proposed a
hybrid machine learning approach in which Genetic algorithm
is presented to improve fitness function and for the better
optimization of the features then the optimized features are
processed through decision tree (DT). The performance is
compared with ID3 based decision tree and it is proven that
proposed hybrid approach achieve better results. The proposed
approach successfully addresses the performance challenge. I
Laradji et al. [12] proposed ensemble learning technique with
more emphasis on feature engineering. The proposed method
combines ensemble learning and efficient feature selection to
address the robustness problem of previous defect prediction
techniques. The proposed method also presents average prob-
ability ensemble (APE) which is ensemble of seven machine
learning models and reveals that the efficiency of APE is im-
proved over other machine learning models such as weighted
SVM and random forests. It is found that the APE combines
with greedy forward selection produce better results for PC2,
PC4 and MC1 datasets. O Arar et al. [13] proposed a hybrid
technique for defect prediction in which Artificial Neural
Network (ANN) is use for making prediction and the weight
of ANN is optimized through Artificial Bee Colony (ABC)
algorithm. The performance of proposed method is compared
with Na¨
ıve Bayes, Random Forest, C4.5, Immunos and AIR-
Sand algorithms and found that the performance of proposed
technique and random forest is equal on KC1 datasets and
produces high accuracy on KC2 and CM1 datasets. However,
performance is not good on PC1 and JM1 datasets as compared
to other techniques. The author suggests that more focus on
the feature engineering may reveal better results. M Siers et
al. [14] proposed ensemble method for classification of defect.
The proposed method is the ensemble of decision tree called
as CSForest. For minimizing the classification cost a cost
sensitive coting technique called CSVoting is proposed. The
evaluation of proposed technique is performed on six promi-
nent classifiers C4.5, SVM, SysFor+Voting1, SysFor+Voting2,
CSC+C4.5, CSTree and six publicly available datasets. The
proposed method shows that the lower prediction cost is
achieved by combining CSForest and CSVoting. P Singh et
al. [15] performed an analysis of prominent machine learning
i.e ANN, PSO (Particle Swarm Optimization), DT, NB and
LC (Linear classifier) and evaluated on seven PROMISE
[16] datasets. These algorithms are analyzed by using KEEL
tools and validated using k-fold cross validated technique.
The results of the analysis reveal that LC has highest defect
prediction accuracy in four out of seven datasets then followed
by Na¨
ıve Bayesian, Decision Tree and Neural Network having
second highest accuracy in one dataset therefore LC is more
dominant over other classifiers. Manjula et al. [17] proposed a
hybrid machine learning approach in which Genetic algorithm
is presented to improve fitness function and for the better
optimization of the features then the optimized features are
processed through Deep Neural Network (DNN). The perfor-
mance is compared with Na¨
ıve Bayes, SVM, Decision Tree,
KNN and other several ML algorithms and it is proven that
proposed hybrid approach achieve better results. The proposed
approach successfully addresses the performance challenge.
The classification accuracy of proposed approach is 97.82% on
KC1 dataset, 97.59% on CM1 dataset, 97.96% for PC3 dataset
and 98.0% for PC4 dataset which is better than previous
techniques.
Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on June 09,2022 at 07:23:26 UTC from IEEE Xplore. Restrictions apply.
III. METHODOLOGY
In this section, the proposed methodology is described in
details.
A. Feature Engineering
In this section, a detailed analysis of datasets and basic fea-
ture processing performed of the datasets are also explained.
1) Feature Description: The datasets used in this paper is
the open-source data which is created and distributed by the
NASA Data Metric Program [16]. The datasets are available
in different versions and for our experimentation we are just
using CM1, KC2 and PC1. The CM1 dataset is composed of
498 instances and 21 attributes similarly, KC2 is composed of
522 instances and 21 attributes and PC1 composed of 1109
instances and 21 attributes.
2) Feature Selection: The main step of feature engineering
is to select feature relevant [18] to the domain of problem
we are going to solve and ignore the irrelevant features.
In this task, a statistic based filter based ranking methods
is used, which use statistics measures to assign a score to
each feature and present the users with a ranked list of
features. Specifically, a Chi-Squared statistic method is applied
to evaluate the importance of the feature and selection.
3) Handling Class Imbalance: During the analysis of the
datasets, it is observed that data is not balanced in some classes
there are more instances and other have less instances. The
Cm1 has approximately 90% false (not-defect) values and
10% true (defect) values. Similarly, KC2 has 20% true (defect)
values and 80% (non-defect) values as compared to PC1 which
has 93%(defect) and 7% (non-defect) values. So, the problem
of class imbalanced is solved by sample Bootstrapping [19]
and the number of observations used for sampling is 7.
Bootstrapping is also known as Bootstrap aggregation is a
random sampling with replacement method for creating the
random samples of the data.
B. Proposed Learning Model
The proposed ensemble learning model trained on CM1
is shown in Fig 2. which shows that after performing basic
feature preprocessing an ensemble learning is applied which is
ensemble of Classification by Regression [20] and KNN model
[21]. After the ensemble method, KNN is applied again for
achieving better classification accuracy. The classification by
regression model has multiple subprocesses and subprocesses
has operators that is trained on regression model then the
operator of the regression model is trained by classification
model. KNN acronym of K nearest neighbor is the prominent
classification model. It finds the distance between test point
and every point on the training data, then find the k nearest
neighbor between the points.
IV. RES ULTS AND DISCUSSION
In this section, the results of proposed ensemble learning is
presented.
Fig. 2: Proposed learning model
A. Evaluation Metrices
The most important evaluation metrics [22] we are using in
the experiment are described in below equations.
Precision is the ratio of TP to TP and FP as illustrated
in equation (1).
P recision =T P
T P +F P (1)
Recall is the ratio TP to TP and FN as illustrated in
equation (2).
Recall =T P +T P
F N (2)
Accuracy is the ratio of TP and TN to total positive (P)
and negative (N) as illustrated (3).
Accuracy =T P +T N
P+N(3)
B. Accuracy on Test Set
It is observed that accuracy of ensemble learning model
trained on CM1 is 98.56% similarly, accuracy of model trained
on KC2 is 98.17% and accuracy of ensemble learning model
trained on PC1 is 99.27%. As it reviewed that the maximum
percentage of accuracies achieved previously was 97.59%
[17] on CM1. Our experimentation results demonstrate that
ensemble learning produce more better results than simple
learning models [23].
Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on June 09,2022 at 07:23:26 UTC from IEEE Xplore. Restrictions apply.
TABLE I: Comparative analysis of prominent ML models w.r.t
CM1 Dataset
Model Accuracy Precision Recall F-score
SVM 89.87% 89.95% 99.89% Unknown
KNN 94.41% 99.36% 99.57% 95.41%
Decision Tree 94.26% 94.31% 99.68% 57.75%
Random Forest 92.54% 92.34% 100% 40.91%
Proposed Ensemble 98.56% 98.8% 99.62% 99.62%
TABLE II: Comparative analysis of prominent ML models
w.r.t KC2 Dataset
Model Accuracy Precision Recall F-score
SVM 83.85% 84.41% 97.7% 43.81%
KNN 96.41% 95.51% 100% 99.93%
Decision Tree 88.59% 92.69% 93% 71.91%
Random Forest 92.43% 91.26% 100% 77.87%
Proposed Ensemble 98.17% 97.75% 100% 95.22%
TABLE III: Comparative analysis of prominent ML models
w.r.t PC1 Dataset
Model Accuracy Precision Recall F-score
SVM 92.66% 92.66% 100% Unknown
KNN 98.5% 98.68% 99.72% 88.22%
Decision Tree 94.89% 94.86% 99.91% 45.16%
Random Forest 95.36% 95.22% 100% 56.10%
Proposed Ensemble 99.27% 99.86% 99.35% 95.13%
C. Accuracy of Train and Test Datasets
It is observed that accuracy of ensemble learning model
trained on CM1 is 98.56% similarly, accuracy of model trained
on KC2 is 98.17% and accuracy of ensemble learning model
trained on PC1 is 99.27%. As it reviewed that the maximum
percentage of accuracies achieved previously was 97.59%
[17] on CM1. Our experimentation results demonstrate that
ensemble learning produce more better results than simple
learning models [24].
D. Comparison of Precision, Recall, F1-Score of Proposed
Learning Model
The formulae to calculate precision, recall and accuracy are
given in equation (1), (2), and (3) respectively. Experimen-
tal results demonstrate that our proposed ensemble learning
method achieved more accuracy than the previous methods.
The comparison of results of precision, recall and F1-score
is demonstrated in TABLE I, TABLE II, and TABLE III
respectively.
E. ROC curve
ROC curve acronym of receiver operating characteristics
curve is the machine learning evaluation tool for analyzing
the behavior of different classifiers at different threshold
[25]. It is the graphical representation between False Positive
Rate (FPR) and True Positive Rate (TPR). The ROC shows
that Ensemble learning model is achieve more accuracy than
previous proposed model.
F. Comparison of Proposed Learning Model with Prominent
Machine Learning Model
In this section, the comparison between proposed ensemble
learning model is compared with prominent machine such as
SVM, KNN, Decision Tree and Random Forest [26], [27],
[28], [29]. The comparison is performed on the basis of train
and test accuracy, precision, recall, F-score. The accuracy
achieved on CM1 by applying SVM is 89.87%, KNN is
95.41% and accuracy of DT is 94.26% and on Random Forest
accuracy is 92.54%. Similarly, accuracy achieved on KC2 by
applying SVM is 83.85%, KNN is 89.92% and accuracy of
DT is 88.59% and on Random Forest accuracy is 92.43%.
Also, the accuracy achieved on PC1 by applying SVM is
92.66%, KNN is 98.5% and accuracy of DT is 94.89% and on
Random Forest accuracy is 95.36%. The result of comparison
demonstrates that ensemble learning model is more suitable
model for this type of datasets for software defect prediction.
V. DISCUSSION
Software testing consumes more than 50% resources of
overall software development process. Defect detection is one
of the important activities of software testing. Early detection
of defects reduces the consumption of resources. There are
many techniques proposed for prediction of defects. Machine
learning is widely used technique for the detection of software
defects and produces efficient accuracy as well. Particularly,
Machine learning algorithm i.e., SVM, KNN, DT, RF, NN,
DNN, GA and their different variation are widely used for
defect prediction. In this paper, we proposed an Ensemble
learning models and trained on three different datasets i.e.,
CM1, KC2, PC1. The datasets are publicly available in
promise repository which is created by NASA Data Metric
Program (DMP) and composed of 21 continuous variables
and 1 label with two classes i.e., Defect or Not Defect.
Before applying machine learning model, the class imbalanced
problem which is the common problem in machine learning
is handled by using sample with replacement technique called
Bootstrapping. It is observed that ensemble learning model
trained on CM1 produces 98.56% accuracy and ensemble
learning model trained on KC2 produces 98.18% accuracy
similarly ensemble learning model of ensemble learning model
trained on PC1 produces accuracy of 99.27%. The proposed
model is compared with prominent machine learning model
i.e., SVM, KNN, DT and RF and it is observed that pro-
posed model produces high result than all other models. The
comparison between models is performed on the basis of
evaluation metrics i.e., precision, recall, F1-score and accuracy.
The experimental results demonstrate that ensemble learning
models are efficient for software defect prediction over other
machine learning techniques
VI. CONCLUSION
It is concluded that software testing is the most important
way to ensure quality in the software system. Therefore, in
this research paper, we proposed ensemble learning model for
predicting the defects in the software. The proposed model is
Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on June 09,2022 at 07:23:26 UTC from IEEE Xplore. Restrictions apply.
trained on the Promise datasets. The performance of proposed
model in term of accuracy metrics i.e., accuracy, precision,
recall and F1-score is compared with other prominent ma-
chine learning algorithms i.e., SVM, KNN, Decision Tree and
Random Forest and it is observed that proposed ensemble
learning model produces better results. Classification accuracy
of ensemble model trained on CM1 is 98.56%, Classification
accuracy of ensemble model trained on KC2 is 98.18% and
classification accuracy on PC1 is 99.27%. The proposed en-
semble learning model can help Software Engineer for efficient
detection and prediction of software defects earlier.
ACKNOWLEDGMENT
This work was supported by the ”GULF UNIVERSITY
FOR SCIENCE AND TECHNOLOGY (GUST) under grant
number ”223565”. The preprint of this paper is also available
online [29].
REFERENCES
[1] Felderer, M., Ramler, R. “Risk Orientation in Software Testing
Processes of Small and Medium Enterprises: An Exploratory and
Comparative Study”. Software Qual Journal 24, 519–548 (2016).
https://doi.org/10.1007/s11219-015-9289-z.
[2] Kassab, Mohamad, Joanna F. DeFranco, and Phillip A. Laplante “Soft-
ware Testing: The State of the Practice” IEEE Software 34.5 (2017):
46-52.
[3] Bishnu, Partha S., and Vandana Bhattacherjee. “Software Fault Pre-
diction Using Quad Treebased K-Means Clustering Algorithm” IEEE
Transactions on knowledge and data engineering 24.6 (2011): 1146-
1150.
[4] Park, Mikyeong, and Euyseok Hong. “Software Fault Prediction Model
Using Clustering Algorithms Determining the Number of Clusters
Automatically” International Journal of Software Engineering and Its
Applications 8.7 (2014): 199-204.
[5] Hong, Euyseok, and Mikyeong Park. “Unsupervised Learning Model
for Fault Prediction Using Representative Clustering Algorithms” KIPS
Transactions on Software and Data Engineering 3.2 (2014): 57-64.
[6] Hong, Euyseok. “Severity-based Fault Prediction Using Unsu-
pervised Learning” The Journal of The Institute of Internet,
Broadcasting and Communication 18.3 (2018): 151 -157. doi:
10.1109/TSE.2014.2322358.
[7] Chen, Mingming, and Yutao Ma. “An Empirical Study on Predicting
Defect Numbers” In SEKE, pp. 397-402. 2015.
[8] Shepperd, Martin, David Bowes, and Tracy Hall. “Researcher bias:
The Use of Machine Learning in Software Defect Prediction” IEEE
Transactions on Software Engineering 40, no. 6 (2014): 603-616.
[9] Z. Tian, J. Xiang, S. Zhenxiao, Z. Yi and Y. Yunqiang, “Soft-
ware Defect Prediction Based on Machine Learning Algorithms”
2019 IEEE 5th International Conference on Computer and Com-
munications (ICCC), Chengdu, China, 2019, pp. 520-525, doi:
10.1109/ICCC47050.2019.9064412.
[10] V. Garousi, R. ¨
Ozkan, A. J. I. Betin-Can, and S. “Technology, Multi-
Objective Regression Test Selection in Practice: An Empirical Study in
the Defense Software Industry” vol. 103, pp. 40-54, 2018.
[11] Manjula, C., and Lilly Florence. “Hybrid Approach for Software Defect
Prediction Using Machine Learning with Optimization Technique” Inter-
national Journal of Computer and Information Engineering 12.1 (2018):
28-32.
[12] Laradji, Issam H., Mohammad Alshayeb, and Lahouari Ghouti. “Soft-
ware Defect Prediction Using Ensemble Learning on Selected Features”
Information and Software Technology 58 (2015): 388-402.
[13] Arar, ¨
Omer Faruk, and K¨
urs¸at Ayan. “Software Defect Prediction Using
Cost-Sensitive Neural Network” Applied Soft Computing 33 (2015):
263-277.
[14] Siers, Michael J., and Md Zahidul Islam. “Software Defect Prediction
Using a Cost Sensitive Decision Forest and Voting, and a Aotential
Solution to the Class Imbalance Problem” Information Systems 51
(2015): 62-71.
[15] P. Deep Singh and A. Chug, “Software Defect Prediction Analysis Using
Machine Learning Algorithms” 7th International Conference on Cloud
Computing, Data Science Engineering - Confluence, Noida, 2017, pp.
775-781, doi: 10.1109/CONFLUENCE.2017.7943255.
[16] Sayyad Shirabad, J. and Menzies, T.J. (2005) “The PROMISE Reposi-
tory of Software Engineering Databases” School of Information Tech-
nology and Engineering, University of Ottawa, Canada. Available:
http://promise.site.uottawa.ca/SERepository
[17] Manjula, C., and Lilly Florence. “Deep Neural Network-Based Hybrid
Approach for Software Defect Prediction Using Software Metrics”
Cluster Computing 22.4 (2019): 9847-9863.
[18] Ni, C., Chen, X., Wu, F., Shen, Y., Gu, Q. “An Empirical Study
on Pareto Based Multiobjective Feature Selection for Software Defect
Prediction” Journal of Systems and Software, 152, (2019), 215-238.
[19] Huda, S., Liu, K., Abdelrazek, M., Ibrahim, A., Alyahya, S., Al-Dossari,
H., Ahmad, S. “An Ensemble Oversampling Model for Class Imbalance
Problem in Software Defect Prediction” IEEE access, 6, 24184-24195.
(2018).
[20] “RapidMiner Documentation” Last Accessed (June 2020)
https://docs.rapidminer.com/latest/studio/operators/modeling/
predictive/ensembles/classification by regression.html
[21] J. Friedman, T. Hastie, R. Tibshirani, “The Elements of Statistical
Learning” volume 1, Springer series in statistics New York, 2001.
[22] M¨
uller, Andreas C., and Sarah Guido. “Introduction to Machine Learn-
ing with Python: a guide for data scientists” O’Reilly Media, Inc.”, 2016.
[23] Wang, Tiejian, Zhiwu Zhang, Xiaoyuan Jing, and Liqiang Zhang.
“Multiple Kernel Ensemble Learning for Software Defect Prediction”
Automated Software Engineering 23, no. 4 (2016): 569590.
[24] Khan, Shahzad Ali, and Zeeshan Ali Rana. “Evaluating Performance
of Software Defect Prediction Models Using Area Under Precision-
Recall Curve (AUC-PR)” In 2019 2nd International Conference on
Advancements in Computational Sciences (ICACS), pp. 1-6. IEEE,
2019.
[25] “Rapid Miner Documentation” Last Accessed (June 2020)
https://docs.rapidminer.com/latest/studio/operators/validation/performance
[26] Reshi, Junaid Ali and Satwinder Singh. “Predicting Software Defects
through SVM: An Empirical Approach” ArXiv abs/1803.03220 (2018):
n. pag.
[27] Gong, Lina, Shujuan Jiang, Rongcun Wang, and Li Jiang. “Empirical
Evaluation of the Impact of Class Overlap on Software Defect Predic-
tion” In 2019 34th IEEE/ACM International Conference on Automated
Software Engineering (ASE), pp. 698 -709. IEEE, 2019.
[28] SGe, Jianxin, Jiaomin Liu, and Wenyuan Liu. “Comparative Study on
Defect Prediction Algorithms of Supervised Learning Software Based
on Imbalanced Classification Data Sets” In 2018 19th IEEE/ACIS In-
ternational Conference on Software Engineering, Artificial Intelligence,
Networking and Parallel/Distributed Computing (SNPD), pp. 399 -406.
IEEE, 2018
[29] Nawaz, Ali, Attique Ur Rehman, and Muhammad Abbas. ”A Novel
Multiple Ensemble Learning Models Based on Different Datasets for
Software Defect Prediction.” arXiv preprint arXiv:2008.13114 (2020).
Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on June 09,2022 at 07:23:26 UTC from IEEE Xplore. Restrictions apply.
... Bugs have been a matter of research interest in software engineering for a long time, primarily due to their role as a critical indicator of software quality (Ma et al, 2020;Jahanshahi et al, 2020). The ever-evolving challenges in Software Engineering enhance allows ensemble approaches to provide more accurate and robust results for bug prediction because ensemble models leverage the collective strengths of several algorithms to improve the overall prediction capability (Ali et al, 2022). ...
Preprint
Full-text available
Choosing the most appropriate machine learning model for bug prediction tasks is critical. This paper primarily compares the predictive power of individual models versus ensemble models. We begin by experimenting with popular single-machine learning models commonly used in bug prediction, like neural networks and support vector machines. Additionally, we test with ensemble models that combine individual models' unique strengths, aiming to maximize each's benefits. Our evaluation is based on datasets containing historical development data from well-known open-source software projects. We rely on various metrics when assessing the models, encompassing accuracy, precision, recall, and F1 score. Based on our research findings, it has been observed that ensemble models tend to outperform single models, particularly when it comes to maintaining resilience across various datasets.Nevertheless, factors like the project's complexity, data availability, and computational resources all play a role in determining whether to use single or ensemble models. This paper offers a thorough analysis of the factors to consider when selecting machine learning models and approaches for bug prediction, providing valuable insights into the field. Furthermore, it offers practical advice for professionals, enabling them to make informed choices.
... In paper [60], an ensemble learning approach for SDP using various ML methods is presented. The aim is to increase accuracy and reliability of defect prediction by leveraging the strengths of different models of machine learning. ...
Article
Full-text available
Software defect prediction is a crucial area of study focused on enhancing software quality and cutting down on software upkeep expenses. Cross Project Defect Prediction (CPDP) is a method meant to use information from different source projects to spot software issues in a specific project. CPDP comes in handy when the project being analyzed lacks enough or any data about defects for creating a dependable defect prediction model. Machine learning that is a part of artificial intelligence learns from data and then makes forecasts or choices. Machine learning (ML) is a key component of CPDP because it can learn from heterogeneous and imbalanced data sources. However, there are many challenges and open issues in applying machine learning to CPDP, such as data selection, feature extraction, model selection, evaluation metrics, and transfer learning. In this study, we provide a complete review of existing literature from 2018 to 2023 on Defect Prediction using Machine Learning, covering the main methods, applications, and limitations. We also use ML to identify current research gaps and future directions for CPDP. This paper will serve as a useful reference for researchers interested in using ML for CPDP.
Article
Full-text available
Context Executing an entire regression test-suite after every code change is often costly in large software projects. To cope with this challenge, researchers have proposed various regression test-selection techniques. Objective This paper was motivated by a real industrial need to improve regression-testing practices in the context of a safety-critical industrial software in the defence domain in Turkey. To address our objective, we set up and conducted an “action-research” collaborative project between industry and academia. Method After a careful literature review, we selected a conceptual multi-objective regression-test selection framework (called MORTO) and adopted it to our industrial context by developing a custom-built genetic algorithm (GA) based on that conceptual framework. GA is able to provide full coverage of the affected (changed) requirements while considering multiple cost and benefit factors of regression testing. e.g., minimizing the number of test cases, and maximizing cumulative number of detected faults by each test suite. Results The empirical results of applying the approach on the Software Under Test (SUT) demonstrate that this approach yields a more efficient test suite (in terms of costs and benefits) compared to the old (manual) test-selection approach, used in the company, and another applicable approach chosen from the literature. With this new approach, regression selection process in the project under study is not ad-hoc anymore. Furthermore, we have been able to eliminate the subjectivity of regression testing and its dependency on expert opinions. Conclusion Since the proposed approach has been beneficial in saving the costs of regression testing, it is currently in active use in the company. We believe that other practitioners can apply our approach in their regression-testing contexts too, when applicable. Furthermore, this paper contributes to the body of evidence in regression testing by offering a success story of successful implementation and application of multi-objective regression testing in practice<br/
Article
Full-text available
Software systems are now ubiquitous and are used every day for automation purposes in personal and enterprise applications; they are also essential to many safety-critical and mission-critical systems, e.g., air traffic control systems, autonomous cars, and SCADA systems. With the availability of massive storage capabilities, high speed Internet, and the advent of Internet of Things (IoT) devices, modern software systems are growing in both size and complexity. Maintaining a high quality of such complex systems while manually keeping the error rate at a minimum is a challenge. Therefore, automated detection of faulty components in a software system is important during software development and also post-delivery. Fault detection models usually needs to be trained on a labeled-balanced dataset with both faulty and non-faulty samples. Earlier work, e.g. Mohsin et al. (2016), showed that most real fault detection training dataset are imbalanced. Thereby, the trained model gets over-fitted and classifies faulty components as non-faulty components. The consequence of a high false negative rate is cumulative and results in generating more errors when using the model in other software systems – never seen before, which is very expensive. In this paper, we propose a software defect prediction ensemble model which considers the class imbalance problem in real software datasets. We use different oversampling techniques to build an ensemble classifier that can reduce the effect of low minority samples in the defective data. The proposed approach is verified using PROMISE software engineering dataset. The results show that our ensemble oversampling technique can more greatly reduce the false negative rate compared to the standard classification techniques and identify the faulty components more accurately resulting in a less expensive detection system (lowering the rate of non-faulty predictions of faulty modules).
Article
Full-text available
In the field of early prediction of software defects, various techniques have been developed such as data mining techniques, machine learning techniques. Still early prediction of defects is a challenging task which needs to be addressed and can be improved by getting higher classification rate of defect prediction. With the aim of addressing this issue, we introduce a hybrid approach by combining genetic algorithm (GA) for feature optimization with deep neural network (DNN) for classification. An improved version of GA is incorporated which includes a new technique for chromosome designing and fitness function computation. DNN technique is also improvised using adaptive auto-encoder which provides better representation of selected software features. The improved efficiency of the proposed hybrid approach due to deployment of optimization technique is demonstrated through case studies. An experimental study is carried out for software defect prediction by considering PROMISE dataset using MATLAB tool. In this study, we have used the proposed novel method for classification and defect prediction. Comparative study shows that the proposed approach of prediction of software defects performs better when compared with other techniques where 97.82% accuracy is obtained for KC1 dataset, 97.59% accuracy is obtained for CM1 dataset, 97.96% accuracy is obtained for PC3 dataset and 98.00% accuracy is obtained for PC4 dataset.
Article
Full-text available
A Web-based survey examined how software professionals used testing. The results offer opportunities for further interpretation and comparison to software testers, project managers, and researchers. The data includes characteristics of practitioners, organizations, projects, and practices.
Article
Full-text available
Software defect prediction is an important aspect of preventive maintenance of a software. Many techniques have been employed to improve software quality through defect prediction. This paper introduces an approach of defect prediction through a machine learning algorithm, support vector machines (SVM), by using the code smells as the factor. Smell prediction model based on support vector machines was used to predict defects in the subsequent releases of the eclipse software. The results signify the role of smells in predicting the defects of a software. The results can further be used as a baseline to investigate further the role of smells in predicting defects.
Article
The performance of software defect prediction (SDP) models depend on the quality of considered software features. Redundant features and irrelevant features may reduce the performance of the constructed models, which require feature selection methods to identify and remove them. Previous studies mostly treat feature selection as a single objective optimization problem, and multi-objective feature selection for SDP has not been thoroughly investigated. In this paper, we propose a novel method MOFES (Multi-Objective FEature Selection), which takes two optimization objectives into account. One optimization objective is to minimize the number of selected features, this objective is related to the cost analysis of this problem. Another objective is to maximize the performance of the constructed SDP models, this objective is related to the benefit analysis of this problem. MOFES utilizes Pareto based multi-objective optimization algorithms (PMAs) to solve this problem. In our empirical study, we design and conduct experiments on RELINK and PROMISE datasets, which are gathered from real open source projects. Firstly, we analyze the influence of different PMAs on MOFES and find that NSGA-II can achieve the best performance on both datasets. Then, we compare MOFES method with 22 state-of-the-art filter based and wrapper based feature selection methods, and find that MOFES can effectively select fewer but closely related features to construct high-quality models. Moreover, we also analyze the frequently selected features by MOFES, and these findings can be used to provide guidelines on gathering high-quality SDP datasets. Finally, we analyze the computational cost of MOFES and find that MOFES only needs 107 seconds on average.