Conference PaperPDF Available
117
978-1-7281-8428-9/22/$31.00 ©2022 IEEE
Current Software Defect Prediction: A Systematic
Review
Yahaya Zakariyau Bala
Software Engineering
Universiti Putra Malaysia
Selangor, Malaysia
gs61002@student.upm.my
Pathiah Abdul Samat
Software Engineering
Universiti Putra Malaysia
Selangor, Malaysia
pathiah@upm.edu.my
Khaironi Yatim Sharif
Software Engineering
Universiti Putra Malaysia
Selangor, Malaysia
khaironi@upm.edu.my
Noridayu Manshor
Software Engineering
Universiti Putra Malaysia
Selangor, Malaysia
ayu@upm.edu.my
Abstract detecting defect in a software product prior to
testing reduces the cost of testing and improve the quality of the
software product. Various methods for enhancing the accuracy
of defect prediction model have been published. The goal of this
review is to identify and analyze the dataset, models, framework
and the performance of software defect prediction model. The
IEEExplore, Science Direct, Scopus, and Google Scholar
databases were used to search and download the relevant
papers. Sixty-eight (68) papers published between 2017 to 2021
were selected based on exclusion and inclusion criteria. Analysis
of the selected studies revealed that 100% of the selected articles
used the publicly available dataset from NASA, PROMISE and
others. The most frequently used models in software defect
prediction studies were identified. The analysis also revealed
IEEE Transactions on Software Engineering Journal is the most
significant journal with respect to software defect prediction
studies using Scimago Journal Ranking as criteria. The review
also identified studies on enhancing the predictive performance
of defect prediction models. Software defect prediction is still
active and sound. Thus, needs more research especially on
enhancement methods.
Keywords—Software Defect Prediction, Naïve Bayes, Support
Vector Machine, Software
I. INTRODUCTION
Identification of the defect in software product in the early
phase of the software development life cycle is very
important. Using Software defect prediction (SDP) for
identification and removal of software defect contributes
greatly in producing cost-effective and qualitative software
products. This is due to the fact that, the software defect
prediction process is less expensive than software testing and
reviews. As reported in some studies, software defect models
outperform currently software reviews methods utilized by
industries [1]. Furthermore, SDP help software testing teams
to focus on the defect prone component. Thus, minimizing
the effort of testing and enhancing the quality of software [2],
[3]. For these reasons, the SDP has become an important and
attractive research topic in software engineering domain.
Most of the existing review on SDP studies have not capture
the comprehensive image of current approaches such as
ensemble techniques and frameworks proposed in studies on
software defect prediction. Therefore, in this review work,
the datasets, models used, ensemble and frameworks
employed in various studies on software defect prediction
models between 2017 and 2021 were identified and analyzed.
This systematic review is organized as follows; the
methodology is explained in section II, the result of the
review is presented in section III and IV. Finally, the
conclusion in section V.
A. Research Questions
The research questions addressed by this work are;
RQ1. Which of the journals is the most significant in the SDP
area?
RQ2. Who are the most influential and active researchers in
SDP?
RQ3. What are research topics in SDP area?
RQ4. What are the datasets mostly used in SDP?
RQ5. What are the models used for SDP?
RQ6. What are the most commonly used models for SDP?
RQ7. What are the best performing models for SDP?
RQ8. What are the improvement methods proposed for SDP?
RQ9. What are the frameworks proposed for SDP?
RQ10. What are the feature selection methods proposed in
SDP?
RQ11. What are the ensemble methods proposed in SDP?
II. METHOD
The work followed the guideline proposed by [4]. as
shown in Fig.1. The planning stage includes; identifying
reasons for the review, the formation of research questions,
designing and following the suitable method for searching
relevant materials, extracting and synthesizing of information
for performing the review and as well as evaluation. The next
step is performing the review where the searching method,
extraction and data synthesis are implemented.
2022 Applied Informatics International Conference (AiIC) | 978-1-7281-8428-9/22/$31.00 ©2022 IEEE | DOI: 10.1109/AiIC54368.2022.9914586
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on November 01,2022 at 00:01:19 UTC from IEEE Xplore. Restrictions apply.
118
Fig. 1. Systematics Literature review process
A. Search process
The search process begins by defining search strings,
selecting databases and finally stating the inclusion and
exclusion criteria. Alternative spellings, synonyms and
Boolean ANDs and ORs were considered for constructing
search strings. Digital libraries such as IEEExplore, Science
Direct, Scopus and Google Scholar were searched.
B. Inclusion and Exclusion criteria
Peer-reviewed articles SDP published between 2017 and
2021 were included:
Inclusion Criteria;
Empirical studies on SDP, study comparing the performance
of models in SDP, Studies that used feature selection
technique in SDP, Studies that used ensemble methods for
SDP were all considered
Exclusion Criteria
Studies without experiment or empirical analysis and studies
in any language other than English
C. Assessment of the Quality
Each article was evaluated based on stated assessment of
quality questions as shown in Table II. The articles were
score as follows; for each question, the score of 1 was given
to Yes, 0.5 to Partial and 0 to No.
D. Collection of Data
The extracted data from each article were:
Authors
Titles
Publishers
Independent variable
Datasets
Evaluation metrics
Models
E. Data analysis
MS excel was used to tabulate the data extracted from each
article to show;
Number of articles published per year
Number of articles published per author
Number of articles per model
Quality score for each article
TABLE I. Q
UALITY
A
SSESSMENT
S/N Question Yes Partially NO
1 Are the study’s objectives clearly
stated?
2 Is the study scope clearly stated?
3 Are there relevant literature in the
study?
4 Are the dependent variable(metrics)
clearly stated
5 Are the independent variable clearly
stated?
6 Are the dataset used clearly presented?
7 Has the study been cited?
8 Are the performance evaluation
techniques properly written?
III. R
ESULTS
A. Search result
A total of 2609 articles were identified initially, after
applying exclusion criteria, some based on the title, some
based on abstract and finally based on content and removal
of duplicate 68 articles were considered.
B. Quality Evaluation of Selected Articles
As shown in Table I, articles scored 5 or above were
selected otherwise, rejected. A nomenclature was assigned to
each study i.e., PS (Primary Study). After quality assessment,
68 studies were considered and named as primary studies.
IV. D
ISCUSSION
Response to research questions is discus in this section.
A. Significant Journal Publications
Sixty-eight (68) studies on software defect prediction are
considered in this review. Fig. 2 shows the distribution of
selected studies per year. As can be seen from the Fig. 2, most
of the studies were published in 2020. Fig. 2 also shows that,
the relevancy of research on SDP is sound and active till
today. To address the research question RQ1: the most
significant journal in selected primary studies, Scimago
ranking metrics such as SJR value and Quartiles categories
from Q1 to Q4 were used to find the most significant as
shown in Table II. IEEE Transactions on Software
Engineering emerge the most significant journal.
Fig. 2. Distribution of study per year
5 2 14
29
15 1
2017 2018 2019 2020 2021 2022
Total
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on November 01,2022 at 00:01:19 UTC from IEEE Xplore. Restrictions apply.
119
B. Most Active and Influential Researchers
The researchers in the area of SDP who contributed a lot
in the selected papers were identified based on the number of
studies they appeared as first Autor. Balogun A.O.
contributed as the first Autor in four studies followed by Yu
Q and Zhang N who contributed in three (3) studies and as
the first Autor in two (2) studies each. Shown in Fig. 3.
Fig. 4. Distribution of the number of studies per model
C. Research Topics in the Area of Software Defect
Prediction
Analysis of selected studies showed that recent research in
software defect prediction concentrate on four topics:
Just-in-time SDP: The major aim of this topic is the
identification of defect inducing changes.
Defect analysis: This topic involves classifying
defect into different categories and identifying the possible
causes of the defect so as to prevent them from reoccurring.
Cross Version Defect Prediction (CVDP): in this
topic, defect labels of the current version of software is
predicted using the classification algorithm trained on the
historical dataset of the previous version of the same
software.
Cross-project defect prediction (CPDP): major aim
of this topic is to predict the defect labels of one project is
predicted using classification model trained on the historical
data of another project.
D. Datasets Used for SDP Studies
A dataset is a group of data. Dataset is used for training
and evaluating the SDP model. The major component in SDP
study is dataset. Public datasets are obtained from NASA,
PROMISE, AEEEM, Relink, SOFTLAB, etc. Most of the
selected studies used the combination of multiple datasets for
analysis.
Fig. 3. Distribution of Influential Researchers and Number of Studies
TABLE II.
RANKING
O
F
S
ELECTED
J
OURNALS
B
ASED
O
N
S
CIMAGO
J
OURNAL
R
ANKING
(SJR)
Journal Publication SJR Quartile
IEEE Transactions on Soft. Eng. 857 Software (Q1)
Journal of King Saud University
– Comp. and Infor. Sci.
617 Comp. Sci. (miscellaneous)
(Q1)
Information and Software
Technology
606 Software (Q2)
International Journal of
Computational Intelligence
Systems
385 Computer Science
(miscellaneous) (Q2)
Symmetry 385 Computer Science
(miscellaneous) (Q2)
IET Software 305 Computer Graphics and
Computer-Aided Design (Q3)
Journal of Systems Engineering
and Electronics
301 Computer Science
Applications (Q3)
International Journal of
Electrical and Computer
Engineering
277 Computer Science
(miscellaneous) (Q2)
E. Methods(Models) Used in SDP Studies
As shown in Fig. 4, the most useful models are NB and
SVM.
F. Most Frequently Used Methods(Models) in SDP Studies
From the used methods in software defect prediction in
Section E, six most frequently used models in SDP were
identified based on the number of studies they were employed
in. Naïve Bayes (NB) and Support Vector Machine (SVM)
emerged as the most useful methods followed by
Convolutional Neural Network (CNN).
G. Best Performing Models for SDP
Two out of selected studies reported RF and SVM as better
performing models such studies are; [5] and [6].
H. Proposed Improvements Method for SDP models
In order to improve the SDP models’ predictive
performance, researchers focus on improving the quality of
0
2
4
6
8
ANFIS
ANN
C4.5
CNN
DT
EML
GA
J48
KNN
LDA
LR
LSTM
MLP
NB
NN
RF
RIPPER
RNN
Number of Studies
4 3 34 2 2
BALOGUN A.O., YU Q., ZHANG N.,
Active Researchers in
SDP
Number of Studies First Autor
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on November 01,2022 at 00:01:19 UTC from IEEE Xplore. Restrictions apply.
120
training and testing datasets (using feature selection
techniques), optimizing classifiers (ensemble techniques) or
the combination of both.
1) Feature Selection
Different researchers concentrated their studies toward
improving SDP models through feature selection techniques
such as studies performed by [7], [8], [9], [10], [11], [12] and
[13]
2) Ensemble Techniques
On the other hand, some studies concentrate on using the
ensemble method to improve software defect prediction.
Ensemble method is a procedure in which multiple classifiers
are combined together in order to achieve better performance.
In addition to improving classifiers’ accuracy, ensemble
method has been used for mitigating the issue of class
imbalance in software defect prediction. Examples of such
studies are [14], [15], [16], [17] and [18].
I. Proposed Frameworks for Software Defect Prediction
Various studies proposed different framework to improve
SDP model. Below are the frameworks proposed for software
defect prediction.
[19] proposed a framework which consists three layers;
the first layer is the pe-processing layer where missing values
and outliers in the dataset are handled, and dataset are splited
in to training and testing data. The classification through four
different back-propagation techniques such as BR, LM, SCG
and BFGS-QN was done in the second layer. The third layer
includes fuzzy technique where the bet training algorithms
selected based on the performance and stored in the cloud so
that the development team can access the model anywhere.
Thus, reducing the cost of testing. The framework was
evaluated on NAA dataset using various metrics like Recall,
Precision, Accuracy, F-measure, AUC, R2 and MSE. It was
observed that the BR training function outperformed the
other training functions.
[20] proposed an integrated machine learning framework
for Infrastructure as a Code (IaC) defect prediction which
allows crawling of the repository, extraction of metrics,
building model and evaluation. Five machine learning
algorithms and 104 projects were used to evaluate the
performance of the framework in identifying defective IaC
scripts. The result of the experiment shows that Random
Forest outperformed the other classification algorithm
especially for AUC and MCC evaluation metrics.
[21] proposed a framework to solve the issue of class
imbalance SDP models. In this approach, a balance between
the defect and non-defect instances in the imbalance dataset
are created using the distribution properties of the dataset.
Experiment result shows that, the proposed framework
outperformed SMOTE and K-Means SOMTE.
[22] to improve the performance of SDP, this study
proposed a framework in which the variant selection activity
was used to identify the best optimized version of
classification techniques was combined with feature selection
technique. Based on evaluation, the framework outperformed
some commonly used classification techniques.
[23] proposed a framework which improve the
performance of Extreme Machine Learning (EML). The
framework comprises three components; component 1:
KPCA leverages a nonlinear mapping function to extract
optimal representative features. Component 2: leverages
adaptive genetic algorithm to enhance the prediction
performance of EML. Component 3: the multiple EML
optimized by the adaptive genetic algorithm re integrated by
Adaboost algorithm. The framework was evaluated on eleven
open-source datasets and compared with some machine
learning techniques, including EML and three other variants.
The result shows that, the proposed framework outperformed
the base line models.
[24] proposed a framework which consist of three stages;
stage1: the influence of the minority class in the dataset is
minimized using Sampling with Minority Method (SWIM)
based on Mahalanobis distance technique. Stage2: the most
relevant representative feature between the training and the
testing dataset was selected. Stage3: performs the transfer
learning from the training to the testing dataset in the
Grossmann manifold. The experimental result shows that, the
proposed framework is superior in terms of AUC.
[25] proposed a framework which identify defect using
features extracted from the commit automatically when
messages and code changes. Experiment result shows that,
the approach achieves 10.36 to 11.02% improvement over the
state-of-the-art methods.
J. Limitations of this study
Only open access papers were downloaded out of which
68 studies were selected for this review and as such, there
might have been an exclusion of some relevant software
defect prediction studies from journals or conference
proceedings.
V. CONCLUSION
The aim of systematic review was to identify and analyze
current models used, datasets, best performing models,
topics, improvement methods proposed and frameworks in
SDP. Analysis performed on the selected primary studies
revealed that, 100% of the selected papers utilized public
datasets for analysis. RF and SVM performed better than
most of the classification algorithms in SDP. The feature
selection techniques play a vital role in improving the quality
of the dataset used for SDP likewise, ensemble techniques
have been used for enhancing the predictive performance of
SDP models especially weak classifiers.
ACKNOWLEDGMENT
This work has been supported by Universiti Putra
Malaysia and TedFund Nigeria
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on November 01,2022 at 00:01:19 UTC from IEEE Xplore. Restrictions apply.
121
TABLE III. LIST OF SELECTED PRIMARY STUDIES
REFERENCES
[1] T.Menzies, Z.Milton, B.Turhan, B.Cukic, Y.Jiang, And A.Bener,.
“Defect Prediction From Static Code Features: Current Results,
Limitations, New Approaches,” Automated Soft. Eng., 17(4), 375-407,
2010
[2] C.Catal, Software fault prediction: A literature review and current
trends,” Expert systems with applications, 38(4), 4626-4636, 2011.
[3] T.Hall and D.Bowes, “The state of machine learning methodology in
software fault prediction,” In 2012 11th international conference on
machine learning and applications (Vol. 2, pp. 308-313). IEEE,
December, 2012.
[4] B.Kitchenham, O.P.Brereton, D.Budgen, M.Turner, J.Bailey and
S.Linkman, “Systematic literature reviews in software eng.–a
systematic literature review,” Information and soft. Technology, 51(1),
7-15, 2009.
[5] Iqbal, A., Aftab, S., Ali, U., Nawaz, Z., Sana, L., Ahmad,M., and
Husen, A. (2019). “Performance analysis of machine learning
techniques on software defect prediction using NASA datasets. Int. J.
Adv. Comp. Sci. Appl, 10(5), 300-308.
[6] B.Khan, R.Naseem, M. A.Shah, K.Wakil, A.Khan, M. I.Uddin, and
M.Mahmoud, “Software defect prediction for healthcare big data: An
empirical evaluation of machine learning techniques,” Journal of
Healthcare Eng., 2021.
[7] K.Bashir, T.Li and M.Yahaya, “A novel feature selection method based
on maximum likelihood logistic regression for imbalanced learning in
software defect prediction,” Int. Arab J. Inf. Technol., 17(5), 721-730,
2020.
[8] J.Lin, and L.Lu, “Semantic Feature Learning via Dual Sequences for
Defect Prediction,” IEEE Access, 9, 13112-13124, 2021.
[9] A. O.Balogun, S.Basri, S.Mahamad, S. J.Abdulkadir, L. F.Capretz, A.
A.Imam, and G.Kumar, “Empirical Analysis of Rank Aggregation-
Based Multi-Filter Feature Selection Methods in Software Defect
Prediction,” Electronics, 10(2), 179, 2021.
[10] M.Mustaqeem and M.Saqib, “Principal component-based support
vector machine (PC-SVM): a hybrid technique for software defect
detection,” Cluster Computing, 1-15, 2021.
[11] X. U.Xiaolong, C. H. E. N.Wen, and W. A. N. G.Xinheng,"RFC: a
feature selection algorithm for software defect prediction,” Journal of
Systems Eng. and Electronics, 32(2), 389-398, 2021.
[12] A. O.Balogun, S.Basri, L. F.Capretz, S.Mahamad, A. A.Imam, M.
A.Almomani, and G.Kumar, “An adaptive rank aggregation-based
ensemble multi-filter feature selection method in software defect
prediction,” Entropy, 23(10), 1274, 2021.
[13] B.Mumtaz, S.Kanwal, S.Alamri, and F.Khan, “Feature Selection Using
Artificial Immune Network: An Approach for Software Defect
Prediction,” INTELLIGENT AUTOMATION AND SOFT
COMPUTING, 29(3), 669-684, 2021.
[14] S. S. Maddipati and M.Srinivas, “A Hybrid Approach for Cost
Effective Prediction of Software Defects,” International Journal of
Adv. Comp. Sci. and App. 12(2), 145 – 152, 2021.
[15] H.Alsawalqah, N.Hijazi, M.Eshtay, H.Faris, A.Radaideh, I.Aljarah,
and Y.Alshamaileh, “Software defect prediction using heterogeneous
ensemble classification based on segmented patterns,” Applied Sci.,
10(5), 1745, 2020.
[16] T. T.Khuat, and M. H.Le, “Ensemble learning for software fault
prediction problem with imbalanced data,” International Journal of
Electrical and Comp. Eng., 9(4), 3241, 2019.
[17] H.He, X.Zhang, Q.Wang, J.Ren, J.Liu, X.Zhao, and Y. Cheng,
“Ensemble multiboost based on ripper classifier for prediction of
imbalanced soft. defect data,” IEEE Access, 7, 110333-110343, 2019.
[18] X.Yang, D.Lo, X.Xia and J.Sun, “TLEL: A two-layer ensemble
learning approach for just-in-time defect prediction,” Information and
Soft.Technology, 87, 206-220, 2017.
[19] M. S.Daoud, S.Aftab, M.Ahmad, , M. A.Khan, A.Iqbal, S.Abbas, and
B.Ihnaini, “Machine Learning Empowered Soft,” Defect Prediction
System, 2022.
[20] S.Dalla Palma, D.Di Nucci, F.Palomba, and D. A. Tamburri, “ Within-
project defect prediction of infrastructure-as-code using product and
process metrics,” IEEE Transactions on Soft. Eng., 1-1, 2021.
[21] K. K.Bejjanki, J.Gyani, and N.Gugulothu, “Class imbalance reduction
(CIR): a novel approach to software defect prediction in the presence
of class imbalance,” Symmetry, 12(3), 407, 2020.
[22] U.Ali, S.Aftab, A.Iqbal, Z.Nawaz, M. S.Bashir, and M. A. Saeed,
“Software Defect Prediction Using Variant based Ensemble Learning
and Feature Selection Techniques,” International Journal of Modern
Education & Comp. Sci., 12(5), 2020.
[23] N.Zhang, K.Zhu, S.Ying, and X.Wang, “KAEA: A novel three-stage
ensemble model for software defect prediction,” Computers, Materials
and Continua, 64(1), 471-499, 2020.
[24] K.Jiang, Y.Zhang, H.Wu, A.Wang, and Y.Iwahori, Heterogeneous
defect prediction based on transfer learning to handle extreme
imbalance,” Applied Sci., 10(1), 396, 2020.
[25] T.Hoang, H. K.Dam, Y.Kamei, D.Lo, and N.Ubayashi, “DeepJIT: an
end-to-end deep learning framework for just-in-time defect
prediction,” In 2019 IEEE/ACM 16th International Conference on
Mining Soft. Repositories (MSR) (pp. 34-45). IEEE, May, 2019.
Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on November 01,2022 at 00:01:19 UTC from IEEE Xplore. Restrictions apply.
Article
Full-text available
Feature selection is known to be an applicable solution to address the problem of high dimensionality in software defect prediction (SDP). However, choosing an appropriate filter feature selection (FFS) method that will generate and guarantee optimal features in SDP is an open research issue, known as the filter rank selection problem. As a solution, the combination of multiple filter methods can alleviate the filter rank selection problem. In this study, a novel adaptive aggregation-based ensemble multi-filter feature selection (AREMFFS) method is proposed to resolve high dimensionality and filter rank selection problems in SDP. Specifically, the proposed AREMFFS method is based on assessing and combining the strengths of individual FFS methods by aggregating multiple rank lists in the generation and subsequent selection of top-ranked features to be used in the SDP process. The efficacy of the proposed AREMFFS method is evaluated with decision tree (DT) and naïve Bayes (NB) models on defect datasets from different repositories with diverse defect granularities. Findings from the experimental results indicated the superiority ofAREMFFS over other baseline FFS methods that were evaluated, existing rank aggregation based multi-filter FS methods, and variants of AREMFFS as developed in this study. That is, the proposed AREMFFS method not only had a superior effect on prediction performances of SDP models but also outperformed baseline FS methods and existing rank aggregation based multi-filter FS methods. Therefore, this study recommends the combination of multiple FFS methods to utilize the strength of respective FFS methods and take advantage of filter–filter relationships in selecting optimal features for SDP processes.
Article
Full-text available
Software Defect Prediction (SDP) is a dynamic research field in the software industry. A quality software product results in customer satisfaction. However, the higher the number of user requirements, the more complex will be the software, with a correspondingly higher probability of failure. SDP is a challenging task requiring smart algorithms that can estimate the quality of a software component before it is handed over to the end-user. In this paper, we propose a hybrid approach to address this particular issue. Our approach combines the feature selection capability of the Optimized Artificial Immune Networks (Opt-aiNet) algorithm with benchmark machine-learning classifiers for the better detection of bugs in software modules. Our proposed methodology was tested and validated using 5 open-source National Aeronautics and Space Administration (NASA) data sets from the PROMISE repository: CM1, KC2, JM1, KC1 and PC1. Results were reported in terms of accuracy level and of an AUC with highest accuracy, namely, 94.82%. The results of our experiments indicate that the detection capability of benchmark classifiers can be improved by incorporating Opt-aiNet as a feature selection (FS) method.
Article
Full-text available
Defects are the major problems in the current situation and predicting them is also a difficult task. Researchers and scientists have developed many software defects prediction techniques to overcome this very helpful issue. But to some extend there is a need for an algorithm/method to predict defects with more accuracy, reduce time and space complexities. All the previous research conducted on the data without feature reduction lead to the curse of dimensionality. We brought up a machine learning hybrid approach by combining Principal component Analysis (PCA) and Support vector machines (SVM) to overcome the ongoing problem. We have employed PROMISE (CM1: 344 observations, KC1: 2109 observations) data from the directory of NASA to conduct our research. We split the dataset into training (CM1: 240 observations, KC1: 1476 observations) dataset and testing (CM1: 104 observations, KC1: 633 observations) datasets. Using PCA, we find the principal components for feature optimization which reduce the time complexity. Then, we applied SVM for classification due to very native qualities over traditional and conventional methods. We also employed the GridSearchCV method for hyperparameter tuning. In the proposed hybrid model we have found better accuracy (CM1: 95.2%, KC1: 86.6%) than other methods. The proposed model also presents higher evaluation in the terms of other criteria. As a limitation, the only problem with SVM is there is no probabilistic explanation for classification which may very rigid towards classifications. In the future, some other method may also introduce which can overcome this limitation and keep a soft probabilistic based margin for classification on the optimal hyperplane.
Article
Full-text available
Software defect prediction (SDP) in the initial period of the software development life cycle (SDLC) remains a critical and important assignment. SDP is essentially studied during few last decades as it leads to assure the quality of software systems. The quick forecast of defective or imperfect artifacts in software development may serve the development team to use the existing assets competently and more effectively to provide extraordinary software products in the given or narrow time. Previously, several canvassers have industrialized models for defect prediction utilizing machine learning (ML) and statistical techniques. ML methods are considered as an operative and operational approach to pinpoint the defective modules, in which moving parts through mining concealed patterns amid software metrics (attributes). ML techniques are also utilized by several researchers on healthcare datasets. This study utilizes different ML techniques software defect prediction using seven broadly used datasets. The ML techniques include the multilayer perceptron (MLP), support vector machine (SVM), decision tree (J48), radial basis function (RBF), random forest (RF), hidden Markov model (HMM), credal decision tree (CDT), K-nearest neighbor (KNN), average one dependency estimator (A1DE), and Naïve Bayes (NB). The performance of each technique is evaluated using different measures, for instance, relative absolute error (RAE), mean absolute error (MAE), root mean squared error (RMSE), root relative squared error (RRSE), recall, and accuracy. The inclusive outcome shows the best performance of RF with 88.32% average accuracy and 2.96 rank value, second-best performance is achieved by SVM with 87.99% average accuracy and 3.83 rank values. Moreover, CDT also shows 87.88% average accuracy and 3.62 rank values, placed on the third position. The comprehensive outcomes of research can be utilized as a reference point for new research in the SDP domain, and therefore, any assertion concerning the enhancement in prediction over any new technique or model can be benchmarked and proved. 1. Introduction Software engineering (SE) is a discipline that is worrisome with all qualities of software development from the beginning of software specification over to keeping up to the software maintenance after it has gone into practice [1]. In the domain of SE, software defect prediction (SDP) is the utmost significant and dynamic research zone that assumes a significant job in the software quality assurance (SQA) [2, 3]. The rising convolutions as well dependencies of software systems have expanded the difficulty to deliver software with minimal effort, high caliber, and maintainability as well increase the chances of making software defects (SDs) [4, 5]. SD is a flaw or insufficiency in a software system that roots the development of a spontaneous result. An SD can moreover be the situation when the last software product does not meet the client’s desire or client prerequisite [6]. SD’s can cause the diminution of the software product quality and increase the development cost. SDP is a momentous commotion to assure the substances of a software system that leads to adequate development cost and recover the quality by identifying defect-prone instances before testing [4]. It moreover embraces categorizing software components in different varieties of a software system that constructs the testing progression supplementary by concentrating on testing as well as evaluating the components classified as defective [7]. Defects adversely affect software reliability and quality [8]. SDP in the primary period of the software development life cycle (SDLC) is measured as an utmost thought-provoking aspect of SQA [9]. In SE, bug fixing and testing are very costly which also require a massive amount of resources. Forecasting the software defects in software development has been observed by numerous studies in the last decades. Amid all these studies, machine learning (ML) techniques are considered as the best approach toward SDPs [7, 10, 11]. Keeping the above issue related to SDP, various researchers evaluated and built SDP models utilizing diverse classification techniques. Still, it is quite challenging to sort any broad-spectrum preparation to inaugurate the usability of these techniques. Inclusively, it was originated that notwithstanding some dissimilarities in the studies, no particular SDP technique delivers higher to the other techniques diagonally different datasets. The researchers have utilized different evaluation measures to assess the projected models to find the best model for SDP [12, 13]. However, this study focuses on the empirical analysis of ten ML techniques amid which some are proposes as new solutions for SDP. ML techniques include the multilayer perceptron (MLP), radial basis function (RBF), support vector machine (SVM), decision tree (J48), random forest (RF), hidden Markov model (HMM), credal decision tree (CDT), K -nearest neighbor (KNN), average one dependency estimator (A1DE), and Naïve Bayes (NB) for SDP. Amid all these techniques, HMM and A1DE are proposed aimed for the first time for SDP. These techniques are employed on seven different datasets including AR1, AR3, CM1, JM1, KC2, KC3, and MC1. All the experiments are validated using relative absolute error (RAE), mean absolute error (MAE), root relative squared error (RRSE), root mean squared error (RMSE), recall, and accuracy. Following is a list of the contributions of this research:(1)To benchmark ten different ML techniques (MLP, J48, SVM, RF, RBF, HMM, CDT, A1DE, KNN, and NB) for SDP(2)To demeanor a series of try-outs on different datasets such as AR1, AR3, CM1, JM1, KC2, KC3, and MC1(3)To reveal insight into the experimental outcomes, evaluation is accomplished using MAE, RAE, RMSE, RRSE, recall, and accuracy(4)To show that experimental outcomes are significantly different and comparable with verifying the best results, Friedman two-way examination of difference by ranks is performed Hereinafter, Section 2 presents the literature survey, Section 3 comprises the methodology and techniques, while experimental outcomes are discussed in Sections 4, and Section 5 covers the inclusive conclusion. 2. Literature Survey This section delivers an ephemeral study about existing techniques in the field of SDP. Several researchers have employed ML techniques for SDP at the initial phase of software development. Several particular studies converse here. Czibula et al. [11] presented a model grounded on relational association discovery (RAD) for SDP. They apply all investigations on NASA dataset including KC1, KC3, MC2, MW1, JM1, PC3, PC4, PC1, PC2, and CM1. To assess the model as compared to other models, use accuracy, precision, specificity, probability of detection (PD), and area under cover (ROC) assessment measure. The acquired outcomes present that RAD perform well rather than other employed techniques. A framework for SDP named the Defect Prediction through Convolutional Neural Network (DP-CNN) has been recommended by Li et al. [14]. The authors evaluated the DP-CNN on seven different open source projects such as Camel, jEdit, Lucene, Xalam, Xerces, Synapse, and Poi in terms of F-measure in defect predictions. Overall outcomes illustrate that on average, the DP-CNN enhanced the up-to-the-minute technique by 12%. Jacob and Raju [15] introduced a hybrid feature selection (HFS) method for SDP. They also perform their analysis on NASA datasets including PC1, PC2, PC3, PC4, CM1, JM1, KC3, and MW1. The outcomes of HFS are benchmarked with Naïve Bayes (NB), neural networks (NN), RF, random tree (RT), and J48. Benchmarking is carried out using accuracy, specificity, sensitivity, and Matthew’s correlation coefficient (MCC). The analyzed outcome shows that HFS outperform while improving classification accuracy from 82% to 98%. Bashir et al. [16] presented a joined framework to improve the SDP model using Ranker feature selection (RFS), data sampling (DS), and iterative partition filter (IPF) techniques to conquest class imbalance, noisy correspondingly, and high dimensionality. Seven ML techniques including NB, RF, KNN, MLP, SVM, J48, and decision stump are employed on CM1, JM1, KC2, MC1, PC1, and PC5 datasets for evaluations. The outcomes are carried out utilizing receiver operating characteristic (ROC) performance evaluation. Overall experimental outcomes of the proposed model outperformed other models. A new approach for SDP utilizing a hybridized gradual relational association (HyGRAR) and artificial neural network (ANN) to classify the defective and nondefective objects is projected in [7]. Experiments were achieved based on ten different open source datasets such as Tomcat 6.0, Anr 1.7, jEdit 4.0, jEdit 4.2, jEdit 4.3, AR1, AR3, AR4, AR5, and AR6. For module evaluation, accuracy, sensitivity, specificity, and precision measures were utilized. The author concluded that HyGRAR achieved better outcomes as compared to most of the foregoing projected approaches. Alsaeedi and Khan [8] performed the comparison on supervised learning techniques including bagging, SVM, decision tree (DT), and RF and ensemble classifiers on different NASA datasets such as CM1, MC1, MC2, PC1, PC3, PC4, PC5, KC2, KC3, and JM1. The basic learning and ensemble classifiers are evaluated using G-measure, specificity, F-score, recall, precision, and accuracy. The experimental results conducted show that RF, AdaBoost with RF, and DS with bagging outperform than other employed techniques. The author in [9] performed comparative exploration of several ML techniques for SDP on twelve NASA datasets such as MW1, CM1, JM1, PC1, PC2, PC3, PC4, PC5, KC1, KC3, MC1, and MC2, while the classification techniques include one rule (OneR), NB, MLP, DT, RBF, kStar (K∗), SVM, KNN, PART, and RF. The performance of each technique is assessed using MCC, ROC area, recall, precision, F-measure, and accuracy. Malhotra and Kamal [6] evaluated the efficiency of ML classifiers for SDP on twelve excessive datasets taken from the NASA repository by employing sampling approaches and cost-sensitive classifiers. They examine five prevailing methods including J48, RF, NB, AdaBoost, and bagging, as well as suggest the SPIDER3 method for SDP. They have compared the performance based on accuracy, sensitivity, specificity, and precision. Manjula and Florence [17] developed a hybrid model of the genetic algorithm (GA) and the deep neural network (DNN). GA is utilized for feature optimization while DNN is for classification. The enactment of the projected technique is benchmarked with NB, RF, DT, Immunos, ANN-artificial bee colony (ABC), SVM, majority vote, AntMiner+, and KNN. All the performances are carried out on a dataset that includes KC1, KC2, CM1, PC1, and JM1 and assessed via recall, F-score, sensitivity, precision, specificity, and accuracy. The tentative results show that the recommended technique beats other techniques in terms of achieving better accuracy. Researchers have used various techniques to incredulous the boundaries of SDP on a variety of datasets. In each study, different evaluation measures are accomplished to evaluate and benchmark the proposed techniques. The overall summary of the literature discussed above is listed in Table 1, where the first column represents the authors who conducted research studies utilizing various ML techniques. The second column of the table shows techniques utilized by an individual study, while the third and fourth columns represent dataset and evaluation measures utilized in different studies. As shown in Table 1, each study has used different evaluation measures to achieve higher accuracy, but none affects decreasing error rate which is a significant feature. Author Technique/Model Datasets Evaluation measures Czibula et al. [11] RAD MW1, JM1, PC1, PC2, PC3, PC4, KC1, KC3, MC2, and CM1 Accuracy, specificity, precision, PD, and ROC Li et al. [14] DP-CNN Camel, jEdit, Lucene, Xalam, Xerces, Synapse, and Poi F-measure Jacob and Raju [15] HFS, NB, NN, RF, RT, J48 PC1, PC2, PC3, PC4, CM1, MW1, KC3, and JM1 Specificity, sensitivity, MCC, and accuracy Bashir et al. [16] NB, RF, KNN, MLP, SVM, J48, and decision stump CM1, JM1, KC2, MC1, PC1, and PC5 ROC Miholca et al. [7] HyGRAR Tomcat 6.0, Anr 1.7, jEdit 4.0, AR1, jEdit 4.2, AR3, jEdit 4.3, AR5, AR4, and AR6 Accuracy, sensitivity, specificity, and precision Alsaeedi and Khan [8] Bagging, SVM, DT, and RF PC1, PC3, PC4, PC5, JM1, KC2, KC3, MC1, MC2, and CM1 G-measure, specificity, F-score, recall, precision, and accuracy Iqbal et al. [9] OneR, NB, K∗, MLP, SVM, RBF, RF, KNN, DT, and PART JM1, MW1, CM1, MC1, PC1, MC2, PC4, PC3, PC2, PC5, KC3, and KC1 MCC, ROC area, F-measure, recall, precision, and accuracy Malhotra and Kamal [6] J48, RF, NB, AdaBoost, and bagging, and SPIDER3 NASA datasets Accuracy, sensitivity, specificity, and precision Manjula and Florence [17] GA, DNN, NB, RF, DT, ABC, SVM, and KNN KC1, KC2, CM1, PC1, and JM1 Precision, sensitivity, specificity, recall, F-score, and accuracy
Article
Full-text available
Software defect prediction (SDP) can help developers reasonably allocate limited resources for locating bugs and prioritizing their testing efforts. Existing methods often serialize an Abstract Syntax Tree (AST) obtained from the program source code into a token sequence, which is then inputted into the deep learning model to learn the semantic features. However, there are different ASTs with the same token sequence, and it is impossible to distinguish the tree structure of the ASTs only by a token sequence. To solve this problem, this paper proposes a framework called Semantic Feature Learning via Dual Sequences (SFLDS), which can capture the semantic and structural information in the AST for feature generation. Specifically, based on the AST, we select the representative nodes in the AST and convert the program source code into a simplified AST (S-AST). Our method introduces two sequences to represent the semantic and structural information of the S-AST, one is the result of traversing the S-AST node in pre-order, and another is composed of parent nodes. Then each token in the dual sequences is encoded as a numerical vector via mapping and word embedding. Finally, we use a bi-directional long short-term memory (BiLSTM) based neural network to automatically generate semantic features from the dual sequences for SDP. In addition, to leverage the statistical characteristics contained in the handcrafted metrics, we also propose a framework called Defect Prediction via SFLDS (DP-SFLDS) which combines the semantic features generated from SFLDS with handcrafted metrics to perform SDP. In our empirical studies, eight open-source Java projects from the PROMISE repository are chosen as our empirical subjects. Experimental results show that our proposed approach can perform better than several state-of-the-art baseline SDP methods.
Article
Full-text available
Infrastructure-as-code (IaC) is the DevOps practice enabling management and provisioning of infrastructure through the definition of machine-readable files, hereinafter referred to as IaC scripts. Similarly to other source code artefacts, these files may contain defects that can preclude their correct functioning. In this paper, we aim at assessing the role of product and process metrics when predicting defective IaC scripts. We propose a fully integrated machine-learning framework for IaC Defect Prediction, that allows for repository crawling, metrics collection, model building, and evaluation. To evaluate it, we analyzed 104 projects and employed five machine-learning classifiers to compare their performance in flagging suspicious defective IaC scripts. The key results of the study report Random Forest as the best-performing model, with a median AUC-PR of 0.93 and MCC of 0.80. Furthermore, at least for the collected projects, product metrics identify defective IaC scripts more accurately than process metrics. Our findings put a baseline for investigating IaC Defect Prediction and the relationship between the product and process metrics, and IaC scripts' quality.
Article
Full-text available
Selecting the most suitable filter method that will produce a subset of features with the best performance remains an open problem that is known as filter rank selection problem. A viable solution to this problem is to independently apply a mixture of filter methods and evaluate the results. This study proposes novel rank aggregation-based multi-filter feature selection (FS) methods to address high dimensionality and filter rank selection problem in software defect prediction (SDP). The proposed methods combine rank lists generated by individual filter methods using rank aggregation mechanisms into a single aggregated rank list. The proposed methods aim to resolve the filter selection problem by using multiple filter methods of diverse computational characteristics to produce a dis-joint and complete feature rank list superior to individual filter rank methods. The effectiveness of the proposed method was evaluated with Decision Tree (DT) and Naïve Bayes (NB) models on defect datasets from NASA repository. From the experimental results, the proposed methods had a superior impact (positive) on prediction performances of NB and DT models than other experimented FS methods. This makes the combination of filter rank methods a viable solution to filter rank selection problem and enhancement of prediction models in SDP.
Article
Full-text available
Testing is considered as one of the expensive activities in software development process. Fixing the defects during testing process can increase the cost as well as the completion time of the project. Cost of testing process can be reduced by identifying the defective modules during the development (before testing) stage. This process is known as "Software Defect Prediction", which has been widely focused by many researchers in the last two decades. This research proposes a classification framework for the prediction of defective modules using variant based ensemble learning and feature selection techniques. Variant selection activity identifies the best optimized versions of classification techniques so that their ensemble can achieve high performance whereas feature selection is performed to get rid of such features which do not participate in classification and become the cause of lower performance. The proposed framework is implemented on four cleaned NASA datasets from MDP repository and evaluated by using three performance measures,
Article
Software defect prediction (SDP) is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects, so as to effectively predictdefects in the new software. However, there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors. In order to identify and remove the redundant and irrelevant features in software defectdatasets, we propose Relief F-based clustering (RFC), a cluster-based feature selection algorithm. Then, the correlation between features is calculated based on the symmetric uncertainty. According to the correlation degree, RFC partitions features into kclusters based on the k-medoids algorithm, and finally selects the representative features from each cluster to form the final feature subset. In the experiments, we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration (NASA) software defectprediction datasets in terms of area under curve (AUC) and F-value. The experimental results show that RFC can effectively improve the performance of SDP.
Article
Identifying software defects during early stages of Software Development life cycle reduces the project effort and cost. Hence there is a lot of research done in finding defective proneness of a software module using machine learning approaches. The main problems with software defect data are cost effective and imbalance. Cost effective problem refers to predicting defective module as non defective induces high penalty compared to predicting non defective module as defective. In our work, we are proposing a hybrid approach to address cost effective problem in Software defect data. To address cost effective problem, we used bagging technique with Artificial Neuro Fuzzy Inference system as base classifier. In addition to that, we also addressed Class Imbalance & High dimensionality problems using Artificial Neuro Fuzzy inference system & principle component analysis respectively. We conducted experiments on software defect datasets, downloaded from NASA dataset repository using our proposed approach and compared with approaches mentioned in literature survey. We observed Area under ROC curve (AuC) for proposed approach was improved approximately 15% compared with highly efficient approach mentioned in literature survey.