Conference PaperPDF Available

Current Software Defect Prediction: A Systematic Review

May 2022

May 2022

DOI:10.1109/AiIC54368.2022.9914586

Conference: 2022 Applied Informatics International Conference (AiIC)

Authors:

Zakariyau Bala Yahaya

Federal University, Kashere

Pathiah abdul samat

Universiti Putra Malaysia

Khaironi Yatim Sharif

Universiti Putra Malaysia

Noridayu Manshor

Universiti Putra Malaysia

Content uploaded by Zakariyau Bala Yahaya

Content may be subject to copyright.

117

Current Software Defect Prediction: A Systematic

Review

Yahaya Zakariyau Bala

Software Engineering

Universiti Putra Malaysia

Selangor, Malaysia

gs61002@student.upm.my

Pathiah Abdul Samat

Software Engineering

Universiti Putra Malaysia

Selangor, Malaysia

pathiah@upm.edu.my

Khaironi Yatim Sharif

Software Engineering

Universiti Putra Malaysia

Selangor, Malaysia

khaironi@upm.edu.my

Noridayu Manshor

Software Engineering

Universiti Putra Malaysia

Selangor, Malaysia

ayu@upm.edu.my

Abstract— detecting defect in a software product prior to

testing reduces the cost of testing and improve the quality of the

software product. Various methods for enhancing the accuracy

of defect prediction model have been published. The goal of this

review is to identify and analyze the dataset, models, framework

and the performance of software defect prediction model. The

IEEExplore, Science Direct, Scopus, and Google Scholar

databases were used to search and download the relevant

papers. Sixty-eight (68) papers published between 2017 to 2021

were selected based on exclusion and inclusion criteria. Analysis

of the selected studies revealed that 100% of the selected articles

used the publicly available dataset from NASA, PROMISE and

others. The most frequently used models in software defect

prediction studies were identified. The analysis also revealed

IEEE Transactions on Software Engineering Journal is the most

significant journal with respect to software defect prediction

studies using Scimago Journal Ranking as criteria. The review

also identified studies on enhancing the predictive performance

of defect prediction models. Software defect prediction is still

active and sound. Thus, needs more research especially on

enhancement methods.

Keywords—Software Defect Prediction, Naïve Bayes, Support

Vector Machine, Software

I. INTRODUCTION

Identification of the defect in software product in the early

phase of the software development life cycle is very

important. Using Software defect prediction (SDP) for

identification and removal of software defect contributes

greatly in producing cost-effective and qualitative software

products. This is due to the fact that, the software defect

prediction process is less expensive than software testing and

reviews. As reported in some studies, software defect models

outperform currently software reviews methods utilized by

industries [1]. Furthermore, SDP help software testing teams

to focus on the defect prone component. Thus, minimizing

the effort of testing and enhancing the quality of software [2],

[3]. For these reasons, the SDP has become an important and

attractive research topic in software engineering domain.

Most of the existing review on SDP studies have not capture

the comprehensive image of current approaches such as

ensemble techniques and frameworks proposed in studies on

software defect prediction. Therefore, in this review work,

the datasets, models used, ensemble and frameworks

employed in various studies on software defect prediction

models between 2017 and 2021 were identified and analyzed.

This systematic review is organized as follows; the

methodology is explained in section II, the result of the

review is presented in section III and IV. Finally, the

conclusion in section V.

A. Research Questions

The research questions addressed by this work are;

RQ1. Which of the journals is the most significant in the SDP

area?

RQ2. Who are the most influential and active researchers in

SDP?

RQ3. What are research topics in SDP area?

RQ4. What are the datasets mostly used in SDP?

RQ5. What are the models used for SDP?

RQ6. What are the most commonly used models for SDP?

RQ7. What are the best performing models for SDP?

RQ8. What are the improvement methods proposed for SDP?

RQ9. What are the frameworks proposed for SDP?

RQ10. What are the feature selection methods proposed in

SDP?

RQ11. What are the ensemble methods proposed in SDP?

II. METHOD

The work followed the guideline proposed by [4]. as

shown in Fig.1. The planning stage includes; identifying

reasons for the review, the formation of research questions,

designing and following the suitable method for searching

relevant materials, extracting and synthesizing of information

for performing the review and as well as evaluation. The next

step is performing the review where the searching method,

extraction and data synthesis are implemented.

Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on November 01,2022 at 00:01:19 UTC from IEEE Xplore. Restrictions apply.

118

Fig. 1. Systematics Literature review process

A. Search process

The search process begins by defining search strings,

selecting databases and finally stating the inclusion and

exclusion criteria. Alternative spellings, synonyms and

Boolean ANDs and ORs were considered for constructing

search strings. Digital libraries such as IEEExplore, Science

Direct, Scopus and Google Scholar were searched.

B. Inclusion and Exclusion criteria

Peer-reviewed articles SDP published between 2017 and

2021 were included:

Inclusion Criteria;

Empirical studies on SDP, study comparing the performance

of models in SDP, Studies that used feature selection

technique in SDP, Studies that used ensemble methods for

SDP were all considered

Exclusion Criteria

Studies without experiment or empirical analysis and studies

in any language other than English

C. Assessment of the Quality

Each article was evaluated based on stated assessment of

quality questions as shown in Table II. The articles were

score as follows; for each question, the score of 1 was given

to Yes, 0.5 to Partial and 0 to No.

D. Collection of Data

The extracted data from each article were:

● Authors

● Titles

● Publishers

● Independent variable

● Datasets

● Evaluation metrics

● Models

E. Data analysis

MS excel was used to tabulate the data extracted from each

article to show;

● Number of articles published per year

● Number of articles published per author

● Number of articles per model

● Quality score for each article

TABLE I. Q

UALITY

SSESSMENT

S/N Question Yes Partially NO

1 Are the study’s objectives clearly

stated?

2 Is the study scope clearly stated?

3 Are there relevant literature in the

study?

4 Are the dependent variable(metrics)

clearly stated

5 Are the independent variable clearly

stated?

6 Are the dataset used clearly presented?

7 Has the study been cited?

8 Are the performance evaluation

techniques properly written?

III. R

ESULTS

A. Search result

A total of 2609 articles were identified initially, after

applying exclusion criteria, some based on the title, some

based on abstract and finally based on content and removal

of duplicate 68 articles were considered.

B. Quality Evaluation of Selected Articles

As shown in Table I, articles scored 5 or above were

selected otherwise, rejected. A nomenclature was assigned to

each study i.e., PS (Primary Study). After quality assessment,

68 studies were considered and named as primary studies.

IV. D

ISCUSSION

Response to research questions is discus in this section.

A. Significant Journal Publications

Sixty-eight (68) studies on software defect prediction are

considered in this review. Fig. 2 shows the distribution of

selected studies per year. As can be seen from the Fig. 2, most

of the studies were published in 2020. Fig. 2 also shows that,

the relevancy of research on SDP is sound and active till

today. To address the research question RQ1: the most

significant journal in selected primary studies, Scimago

ranking metrics such as SJR value and Quartiles categories

from Q1 to Q4 were used to find the most significant as

shown in Table II. IEEE Transactions on Software

Engineering emerge the most significant journal.

Fig. 2. Distribution of study per year

5 2 14

15 1

2017 2018 2019 2020 2021 2022

Total

Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on November 01,2022 at 00:01:19 UTC from IEEE Xplore. Restrictions apply.

119

B. Most Active and Influential Researchers

The researchers in the area of SDP who contributed a lot

in the selected papers were identified based on the number of

studies they appeared as first Autor. Balogun A.O.

contributed as the first Autor in four studies followed by Yu

Q and Zhang N who contributed in three (3) studies and as

the first Autor in two (2) studies each. Shown in Fig. 3.

Fig. 4. Distribution of the number of studies per model

C. Research Topics in the Area of Software Defect

Prediction

Analysis of selected studies showed that recent research in

software defect prediction concentrate on four topics:

• Just-in-time SDP: The major aim of this topic is the

identification of defect inducing changes.

• Defect analysis: This topic involves classifying

defect into different categories and identifying the possible

causes of the defect so as to prevent them from reoccurring.

• Cross Version Defect Prediction (CVDP): in this

topic, defect labels of the current version of software is

predicted using the classification algorithm trained on the

historical dataset of the previous version of the same

software.

• Cross-project defect prediction (CPDP): major aim

of this topic is to predict the defect labels of one project is

predicted using classification model trained on the historical

data of another project.

D. Datasets Used for SDP Studies

A dataset is a group of data. Dataset is used for training

and evaluating the SDP model. The major component in SDP

study is dataset. Public datasets are obtained from NASA,

PROMISE, AEEEM, Relink, SOFTLAB, etc. Most of the

selected studies used the combination of multiple datasets for

analysis.

Fig. 3. Distribution of Influential Researchers and Number of Studies

TABLE II.

RANKING

ELECTED

OURNALS

ASED

CIMAGO

OURNAL

ANKING

(SJR)

Journal Publication SJR Quartile

IEEE Transactions on Soft. Eng. 857 Software (Q1)

Journal of King Saud University

– Comp. and Infor. Sci.

617 Comp. Sci. (miscellaneous)

(Q1)

Information and Software

Technology

606 Software (Q2)

International Journal of

Computational Intelligence

Systems

385 Computer Science

(miscellaneous) (Q2)

Symmetry 385 Computer Science

(miscellaneous) (Q2)

IET Software 305 Computer Graphics and

Computer-Aided Design (Q3)

Journal of Systems Engineering

and Electronics

301 Computer Science

Applications (Q3)

International Journal of

Electrical and Computer

Engineering

277 Computer Science

(miscellaneous) (Q2)

E. Methods(Models) Used in SDP Studies

As shown in Fig. 4, the most useful models are NB and

SVM.

F. Most Frequently Used Methods(Models) in SDP Studies

From the used methods in software defect prediction in

Section E, six most frequently used models in SDP were

identified based on the number of studies they were employed

in. Naïve Bayes (NB) and Support Vector Machine (SVM)

emerged as the most useful methods followed by

Convolutional Neural Network (CNN).

G. Best Performing Models for SDP

Two out of selected studies reported RF and SVM as better

performing models such studies are; [5] and [6].

H. Proposed Improvements Method for SDP models

In order to improve the SDP models’ predictive

performance, researchers focus on improving the quality of

ANFIS

ANN

C4.5

CNN

EML

J48

KNN

LDA

LSTM

MLP

RIPPER

RNN

Number of Studies

4 3 34 2 2

BALOGUN A.O., YU Q., ZHANG N.,

Active Researchers in

SDP

Number of Studies First Autor

Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on November 01,2022 at 00:01:19 UTC from IEEE Xplore. Restrictions apply.

120

training and testing datasets (using feature selection

techniques), optimizing classifiers (ensemble techniques) or

the combination of both.

1) Feature Selection

Different researchers concentrated their studies toward

improving SDP models through feature selection techniques

such as studies performed by [7], [8], [9], [10], [11], [12] and

[13]

2) Ensemble Techniques

On the other hand, some studies concentrate on using the

ensemble method to improve software defect prediction.

Ensemble method is a procedure in which multiple classifiers

are combined together in order to achieve better performance.

In addition to improving classifiers’ accuracy, ensemble

method has been used for mitigating the issue of class

imbalance in software defect prediction. Examples of such

studies are [14], [15], [16], [17] and [18].

I. Proposed Frameworks for Software Defect Prediction

Various studies proposed different framework to improve

SDP model. Below are the frameworks proposed for software

defect prediction.

[19] proposed a framework which consists three layers;

the first layer is the pe-processing layer where missing values

and outliers in the dataset are handled, and dataset are splited

in to training and testing data. The classification through four

different back-propagation techniques such as BR, LM, SCG

and BFGS-QN was done in the second layer. The third layer

includes fuzzy technique where the bet training algorithms

selected based on the performance and stored in the cloud so

that the development team can access the model anywhere.

Thus, reducing the cost of testing. The framework was

evaluated on NAA dataset using various metrics like Recall,

Precision, Accuracy, F-measure, AUC, R2 and MSE. It was

observed that the BR training function outperformed the

other training functions.

[20] proposed an integrated machine learning framework

for Infrastructure as a Code (IaC) defect prediction which

allows crawling of the repository, extraction of metrics,

building model and evaluation. Five machine learning

algorithms and 104 projects were used to evaluate the

performance of the framework in identifying defective IaC

scripts. The result of the experiment shows that Random

Forest outperformed the other classification algorithm

especially for AUC and MCC evaluation metrics.

[21] proposed a framework to solve the issue of class

imbalance SDP models. In this approach, a balance between

the defect and non-defect instances in the imbalance dataset

are created using the distribution properties of the dataset.

Experiment result shows that, the proposed framework

outperformed SMOTE and K-Means SOMTE.

[22] to improve the performance of SDP, this study

proposed a framework in which the variant selection activity

was used to identify the best optimized version of

classification techniques was combined with feature selection

technique. Based on evaluation, the framework outperformed

some commonly used classification techniques.

[23] proposed a framework which improve the

performance of Extreme Machine Learning (EML). The

framework comprises three components; component 1:

KPCA leverages a nonlinear mapping function to extract

optimal representative features. Component 2: leverages

adaptive genetic algorithm to enhance the prediction

performance of EML. Component 3: the multiple EML

optimized by the adaptive genetic algorithm re integrated by

Adaboost algorithm. The framework was evaluated on eleven

open-source datasets and compared with some machine

learning techniques, including EML and three other variants.

The result shows that, the proposed framework outperformed

the base line models.

[24] proposed a framework which consist of three stages;

stage1: the influence of the minority class in the dataset is

minimized using Sampling with Minority Method (SWIM)

based on Mahalanobis distance technique. Stage2: the most

relevant representative feature between the training and the

testing dataset was selected. Stage3: performs the transfer

learning from the training to the testing dataset in the

Grossmann manifold. The experimental result shows that, the

proposed framework is superior in terms of AUC.

[25] proposed a framework which identify defect using

features extracted from the commit automatically when

messages and code changes. Experiment result shows that,

the approach achieves 10.36 to 11.02% improvement over the

state-of-the-art methods.

J. Limitations of this study

Only open access papers were downloaded out of which

68 studies were selected for this review and as such, there

might have been an exclusion of some relevant software

defect prediction studies from journals or conference

proceedings.

V. CONCLUSION

The aim of systematic review was to identify and analyze

current models used, datasets, best performing models,

topics, improvement methods proposed and frameworks in

SDP. Analysis performed on the selected primary studies

revealed that, 100% of the selected papers utilized public

datasets for analysis. RF and SVM performed better than

most of the classification algorithms in SDP. The feature

selection techniques play a vital role in improving the quality

of the dataset used for SDP likewise, ensemble techniques

have been used for enhancing the predictive performance of

SDP models especially weak classifiers.

ACKNOWLEDGMENT

This work has been supported by Universiti Putra

Malaysia and TedFund Nigeria

Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on November 01,2022 at 00:01:19 UTC from IEEE Xplore. Restrictions apply.

121

TABLE III. LIST OF SELECTED PRIMARY STUDIES

REFERENCES

[1] T.Menzies, Z.Milton, B.Turhan, B.Cukic, Y.Jiang, And A.Bener,.

“Defect Prediction From Static Code Features: Current Results,

Limitations, New Approaches,” Automated Soft. Eng., 17(4), 375-407,

2010

[2] C.Catal, “Software fault prediction: A literature review and current

trends,” Expert systems with applications, 38(4), 4626-4636, 2011.

[3] T.Hall and D.Bowes, “The state of machine learning methodology in

software fault prediction,” In 2012 11th international conference on

machine learning and applications (Vol. 2, pp. 308-313). IEEE,

December, 2012.

[4] B.Kitchenham, O.P.Brereton, D.Budgen, M.Turner, J.Bailey and

S.Linkman, “Systematic literature reviews in software eng.–a

systematic literature review,” Information and soft. Technology, 51(1),

7-15, 2009.

[5] Iqbal, A., Aftab, S., Ali, U., Nawaz, Z., Sana, L., Ahmad,M., and

Husen, A. (2019). “Performance analysis of machine learning

techniques on software defect prediction using NASA datasets. Int. J.

Adv. Comp. Sci. Appl, 10(5), 300-308.

[6] B.Khan, R.Naseem, M. A.Shah, K.Wakil, A.Khan, M. I.Uddin, and

M.Mahmoud, “Software defect prediction for healthcare big data: An

empirical evaluation of machine learning techniques,” Journal of

Healthcare Eng., 2021.

[7] K.Bashir, T.Li and M.Yahaya, “A novel feature selection method based

on maximum likelihood logistic regression for imbalanced learning in

software defect prediction,” Int. Arab J. Inf. Technol., 17(5), 721-730,

2020.

[8] J.Lin, and L.Lu, “Semantic Feature Learning via Dual Sequences for

Defect Prediction,” IEEE Access, 9, 13112-13124, 2021.

[9] A. O.Balogun, S.Basri, S.Mahamad, S. J.Abdulkadir, L. F.Capretz, A.

A.Imam, and G.Kumar, “Empirical Analysis of Rank Aggregation-

Based Multi-Filter Feature Selection Methods in Software Defect

Prediction,” Electronics, 10(2), 179, 2021.

[10] M.Mustaqeem and M.Saqib, “Principal component-based support

vector machine (PC-SVM): a hybrid technique for software defect

detection,” Cluster Computing, 1-15, 2021.

[11] X. U.Xiaolong, C. H. E. N.Wen, and W. A. N. G.Xinheng,"RFC: a

feature selection algorithm for software defect prediction,” Journal of

Systems Eng. and Electronics, 32(2), 389-398, 2021.

[12] A. O.Balogun, S.Basri, L. F.Capretz, S.Mahamad, A. A.Imam, M.

A.Almomani, and G.Kumar, “An adaptive rank aggregation-based

ensemble multi-filter feature selection method in software defect

prediction,” Entropy, 23(10), 1274, 2021.

[13] B.Mumtaz, S.Kanwal, S.Alamri, and F.Khan, “Feature Selection Using

Artificial Immune Network: An Approach for Software Defect

Prediction,” INTELLIGENT AUTOMATION AND SOFT

COMPUTING, 29(3), 669-684, 2021.

[14] S. S. Maddipati and M.Srinivas, “A Hybrid Approach for Cost

Effective Prediction of Software Defects,” International Journal of

Adv. Comp. Sci. and App. 12(2), 145 – 152, 2021.

[15] H.Alsawalqah, N.Hijazi, M.Eshtay, H.Faris, A.Radaideh, I.Aljarah,

and Y.Alshamaileh, “Software defect prediction using heterogeneous

ensemble classification based on segmented patterns,” Applied Sci.,

10(5), 1745, 2020.

[16] T. T.Khuat, and M. H.Le, “Ensemble learning for software fault

prediction problem with imbalanced data,” International Journal of

Electrical and Comp. Eng., 9(4), 3241, 2019.

[17] H.He, X.Zhang, Q.Wang, J.Ren, J.Liu, X.Zhao, and Y. Cheng,

“Ensemble multiboost based on ripper classifier for prediction of

imbalanced soft. defect data,” IEEE Access, 7, 110333-110343, 2019.

[18] X.Yang, D.Lo, X.Xia and J.Sun, “TLEL: A two-layer ensemble

learning approach for just-in-time defect prediction,” Information and

Soft.Technology, 87, 206-220, 2017.

[19] M. S.Daoud, S.Aftab, M.Ahmad, , M. A.Khan, A.Iqbal, S.Abbas, and

B.Ihnaini, “Machine Learning Empowered Soft,” Defect Prediction

System, 2022.

[20] S.Dalla Palma, D.Di Nucci, F.Palomba, and D. A. Tamburri, “ Within-

project defect prediction of infrastructure-as-code using product and

process metrics,” IEEE Transactions on Soft. Eng., 1-1, 2021.

[21] K. K.Bejjanki, J.Gyani, and N.Gugulothu, “Class imbalance reduction

(CIR): a novel approach to software defect prediction in the presence

of class imbalance,” Symmetry, 12(3), 407, 2020.

[22] U.Ali, S.Aftab, A.Iqbal, Z.Nawaz, M. S.Bashir, and M. A. Saeed,

“Software Defect Prediction Using Variant based Ensemble Learning

and Feature Selection Techniques,” International Journal of Modern

Education & Comp. Sci., 12(5), 2020.

[23] N.Zhang, K.Zhu, S.Ying, and X.Wang, “KAEA: A novel three-stage

ensemble model for software defect prediction,” Computers, Materials

and Continua, 64(1), 471-499, 2020.

[24] K.Jiang, Y.Zhang, H.Wu, A.Wang, and Y.Iwahori, “Heterogeneous

defect prediction based on transfer learning to handle extreme

imbalance,” Applied Sci., 10(1), 396, 2020.

[25] T.Hoang, H. K.Dam, Y.Kamei, D.Lo, and N.Ubayashi, “DeepJIT: an

end-to-end deep learning framework for just-in-time defect

prediction,” In 2019 IEEE/ACM 16th International Conference on

Mining Soft. Repositories (MSR) (pp. 34-45). IEEE, May, 2019.

Authorized licensed use limited to: UNIVERSITY PUTRA MALAYSIA. Downloaded on November 01,2022 at 00:01:19 UTC from IEEE Xplore. Restrictions apply.

Software Defect Prediction using Outlier Detection Algorithm

Conference Paper

Feb 2024

An Adaptive Rank Aggregation-Based Ensemble Multi-Filter Feature Selection Method in Software Defect Prediction

Article

Full-text available

Sep 2021
Entropy

Feature selection is known to be an applicable solution to address the problem of high dimensionality in software defect prediction (SDP). However, choosing an appropriate filter feature selection (FFS) method that will generate and guarantee optimal features in SDP is an open research issue, known as the filter rank selection problem. As a solution, the combination of multiple filter methods can alleviate the filter rank selection problem. In this study, a novel adaptive aggregation-based ensemble multi-filter feature selection (AREMFFS) method is proposed to resolve high dimensionality and filter rank selection problems in SDP. Specifically, the proposed AREMFFS method is based on assessing and combining the strengths of individual FFS methods by aggregating multiple rank lists in the generation and subsequent selection of top-ranked features to be used in the SDP process. The efficacy of the proposed AREMFFS method is evaluated with decision tree (DT) and naïve Bayes (NB) models on defect datasets from different repositories with diverse defect granularities. Findings from the experimental results indicated the superiority ofAREMFFS over other baseline FFS methods that were evaluated, existing rank aggregation based multi-filter FS methods, and variants of AREMFFS as developed in this study. That is, the proposed AREMFFS method not only had a superior effect on prediction performances of SDP models but also outperformed baseline FS methods and existing rank aggregation based multi-filter FS methods. Therefore, this study recommends the combination of multiple FFS methods to utilize the strength of respective FFS methods and take advantage of filter–filter relationships in selecting optimal features for SDP processes.

Feature Selection Using Artificial Immune Network: An Approach for Software Defect Prediction

Article

Full-text available

Jan 2021

Software Defect Prediction (SDP) is a dynamic research field in the software industry. A quality software product results in customer satisfaction. However, the higher the number of user requirements, the more complex will be the software, with a correspondingly higher probability of failure. SDP is a challenging task requiring smart algorithms that can estimate the quality of a software component before it is handed over to the end-user. In this paper, we propose a hybrid approach to address this particular issue. Our approach combines the feature selection capability of the Optimized Artificial Immune Networks (Opt-aiNet) algorithm with benchmark machine-learning classifiers for the better detection of bugs in software modules. Our proposed methodology was tested and validated using 5 open-source National Aeronautics and Space Administration (NASA) data sets from the PROMISE repository: CM1, KC2, JM1, KC1 and PC1. Results were reported in terms of accuracy level and of an AUC with highest accuracy, namely, 94.82%. The results of our experiments indicate that the detection capability of benchmark classifiers can be improved by incorporating Opt-aiNet as a feature selection (FS) method.

Principal component based support vector machine (PC-SVM): a hybrid technique for software defect detection

Article

Full-text available

Sep 2021
CLUSTER COMPUT

Defects are the major problems in the current situation and predicting them is also a difficult task. Researchers and scientists have developed many software defects prediction techniques to overcome this very helpful issue. But to some extend there is a need for an algorithm/method to predict defects with more accuracy, reduce time and space complexities. All the previous research conducted on the data without feature reduction lead to the curse of dimensionality. We brought up a machine learning hybrid approach by combining Principal component Analysis (PCA) and Support vector machines (SVM) to overcome the ongoing problem. We have employed PROMISE (CM1: 344 observations, KC1: 2109 observations) data from the directory of NASA to conduct our research. We split the dataset into training (CM1: 240 observations, KC1: 1476 observations) dataset and testing (CM1: 104 observations, KC1: 633 observations) datasets. Using PCA, we find the principal components for feature optimization which reduce the time complexity. Then, we applied SVM for classification due to very native qualities over traditional and conventional methods. We also employed the GridSearchCV method for hyperparameter tuning. In the proposed hybrid model we have found better accuracy (CM1: 95.2%, KC1: 86.6%) than other methods. The proposed model also presents higher evaluation in the terms of other criteria. As a limitation, the only problem with SVM is there is no probabilistic explanation for classification which may very rigid towards classifications. In the future, some other method may also introduce which can overcome this limitation and keep a soft probabilistic based margin for classification on the optimal hyperplane.

Software Defect Prediction for Healthcare Big Data: An Empirical Evaluation of Machine Learning Techniques

Article

Full-text available

Mar 2021

Software defect prediction (SDP) in the initial period of the software development life cycle (SDLC) remains a critical and important assignment. SDP is essentially studied during few last decades as it leads to assure the quality of software systems. The quick forecast of defective or imperfect artifacts in software development may serve the development team to use the existing assets competently and more effectively to provide extraordinary software products in the given or narrow time. Previously, several canvassers have industrialized models for defect prediction utilizing machine learning (ML) and statistical techniques. ML methods are considered as an operative and operational approach to pinpoint the defective modules, in which moving parts through mining concealed patterns amid software metrics (attributes). ML techniques are also utilized by several researchers on healthcare datasets. This study utilizes different ML techniques software defect prediction using seven broadly used datasets. The ML techniques include the multilayer perceptron (MLP), support vector machine (SVM), decision tree (J48), radial basis function (RBF), random forest (RF), hidden Markov model (HMM), credal decision tree (CDT), K-nearest neighbor (KNN), average one dependency estimator (A1DE), and Naïve Bayes (NB). The performance of each technique is evaluated using different measures, for instance, relative absolute error (RAE), mean absolute error (MAE), root mean squared error (RMSE), root relative squared error (RRSE), recall, and accuracy. The inclusive outcome shows the best performance of RF with 88.32% average accuracy and 2.96 rank value, second-best performance is achieved by SVM with 87.99% average accuracy and 3.83 rank values. Moreover, CDT also shows 87.88% average accuracy and 3.62 rank values, placed on the third position. The comprehensive outcomes of research can be utilized as a reference point for new research in the SDP domain, and therefore, any assertion concerning the enhancement in prediction over any new technique or model can be benchmarked and proved. 1. Introduction Software engineering (SE) is a discipline that is worrisome with all qualities of software development from the beginning of software specification over to keeping up to the software maintenance after it has gone into practice [1]. In the domain of SE, software defect prediction (SDP) is the utmost significant and dynamic research zone that assumes a significant job in the software quality assurance (SQA) [2, 3]. The rising convolutions as well dependencies of software systems have expanded the difficulty to deliver software with minimal effort, high caliber, and maintainability as well increase the chances of making software defects (SDs) [4, 5]. SD is a flaw or insufficiency in a software system that roots the development of a spontaneous result. An SD can moreover be the situation when the last software product does not meet the client’s desire or client prerequisite [6]. SD’s can cause the diminution of the software product quality and increase the development cost. SDP is a momentous commotion to assure the substances of a software system that leads to adequate development cost and recover the quality by identifying defect-prone instances before testing [4]. It moreover embraces categorizing software components in different varieties of a software system that constructs the testing progression supplementary by concentrating on testing as well as evaluating the components classified as defective [7]. Defects adversely affect software reliability and quality [8]. SDP in the primary period of the software development life cycle (SDLC) is measured as an utmost thought-provoking aspect of SQA [9]. In SE, bug fixing and testing are very costly which also require a massive amount of resources. Forecasting the software defects in software development has been observed by numerous studies in the last decades. Amid all these studies, machine learning (ML) techniques are considered as the best approach toward SDPs [7, 10, 11]. Keeping the above issue related to SDP, various researchers evaluated and built SDP models utilizing diverse classification techniques. Still, it is quite challenging to sort any broad-spectrum preparation to inaugurate the usability of these techniques. Inclusively, it was originated that notwithstanding some dissimilarities in the studies, no particular SDP technique delivers higher to the other techniques diagonally different datasets. The researchers have utilized different evaluation measures to assess the projected models to find the best model for SDP [12, 13]. However, this study focuses on the empirical analysis of ten ML techniques amid which some are proposes as new solutions for SDP. ML techniques include the multilayer perceptron (MLP), radial basis function (RBF), support vector machine (SVM), decision tree (J48), random forest (RF), hidden Markov model (HMM), credal decision tree (CDT), K -nearest neighbor (KNN), average one dependency estimator (A1DE), and Naïve Bayes (NB) for SDP. Amid all these techniques, HMM and A1DE are proposed aimed for the first time for SDP. These techniques are employed on seven different datasets including AR1, AR3, CM1, JM1, KC2, KC3, and MC1. All the experiments are validated using relative absolute error (RAE), mean absolute error (MAE), root relative squared error (RRSE), root mean squared error (RMSE), recall, and accuracy. Following is a list of the contributions of this research:(1)To benchmark ten different ML techniques (MLP, J48, SVM, RF, RBF, HMM, CDT, A1DE, KNN, and NB) for SDP(2)To demeanor a series of try-outs on different datasets such as AR1, AR3, CM1, JM1, KC2, KC3, and MC1(3)To reveal insight into the experimental outcomes, evaluation is accomplished using MAE, RAE, RMSE, RRSE, recall, and accuracy(4)To show that experimental outcomes are significantly different and comparable with verifying the best results, Friedman two-way examination of difference by ranks is performed Hereinafter, Section 2 presents the literature survey, Section 3 comprises the methodology and techniques, while experimental outcomes are discussed in Sections 4, and Section 5 covers the inclusive conclusion. 2. Literature Survey This section delivers an ephemeral study about existing techniques in the field of SDP. Several researchers have employed ML techniques for SDP at the initial phase of software development. Several particular studies converse here. Czibula et al. [11] presented a model grounded on relational association discovery (RAD) for SDP. They apply all investigations on NASA dataset including KC1, KC3, MC2, MW1, JM1, PC3, PC4, PC1, PC2, and CM1. To assess the model as compared to other models, use accuracy, precision, specificity, probability of detection (PD), and area under cover (ROC) assessment measure. The acquired outcomes present that RAD perform well rather than other employed techniques. A framework for SDP named the Defect Prediction through Convolutional Neural Network (DP-CNN) has been recommended by Li et al. [14]. The authors evaluated the DP-CNN on seven different open source projects such as Camel, jEdit, Lucene, Xalam, Xerces, Synapse, and Poi in terms of F-measure in defect predictions. Overall outcomes illustrate that on average, the DP-CNN enhanced the up-to-the-minute technique by 12%. Jacob and Raju [15] introduced a hybrid feature selection (HFS) method for SDP. They also perform their analysis on NASA datasets including PC1, PC2, PC3, PC4, CM1, JM1, KC3, and MW1. The outcomes of HFS are benchmarked with Naïve Bayes (NB), neural networks (NN), RF, random tree (RT), and J48. Benchmarking is carried out using accuracy, specificity, sensitivity, and Matthew’s correlation coefficient (MCC). The analyzed outcome shows that HFS outperform while improving classification accuracy from 82% to 98%. Bashir et al. [16] presented a joined framework to improve the SDP model using Ranker feature selection (RFS), data sampling (DS), and iterative partition filter (IPF) techniques to conquest class imbalance, noisy correspondingly, and high dimensionality. Seven ML techniques including NB, RF, KNN, MLP, SVM, J48, and decision stump are employed on CM1, JM1, KC2, MC1, PC1, and PC5 datasets for evaluations. The outcomes are carried out utilizing receiver operating characteristic (ROC) performance evaluation. Overall experimental outcomes of the proposed model outperformed other models. A new approach for SDP utilizing a hybridized gradual relational association (HyGRAR) and artificial neural network (ANN) to classify the defective and nondefective objects is projected in [7]. Experiments were achieved based on ten different open source datasets such as Tomcat 6.0, Anr 1.7, jEdit 4.0, jEdit 4.2, jEdit 4.3, AR1, AR3, AR4, AR5, and AR6. For module evaluation, accuracy, sensitivity, specificity, and precision measures were utilized. The author concluded that HyGRAR achieved better outcomes as compared to most of the foregoing projected approaches. Alsaeedi and Khan [8] performed the comparison on supervised learning techniques including bagging, SVM, decision tree (DT), and RF and ensemble classifiers on different NASA datasets such as CM1, MC1, MC2, PC1, PC3, PC4, PC5, KC2, KC3, and JM1. The basic learning and ensemble classifiers are evaluated using G-measure, specificity, F-score, recall, precision, and accuracy. The experimental results conducted show that RF, AdaBoost with RF, and DS with bagging outperform than other employed techniques. The author in [9] performed comparative exploration of several ML techniques for SDP on twelve NASA datasets such as MW1, CM1, JM1, PC1, PC2, PC3, PC4, PC5, KC1, KC3, MC1, and MC2, while the classification techniques include one rule (OneR), NB, MLP, DT, RBF, kStar (K∗), SVM, KNN, PART, and RF. The performance of each technique is assessed using MCC, ROC area, recall, precision, F-measure, and accuracy. Malhotra and Kamal [6] evaluated the efficiency of ML classifiers for SDP on twelve excessive datasets taken from the NASA repository by employing sampling approaches and cost-sensitive classifiers. They examine five prevailing methods including J48, RF, NB, AdaBoost, and bagging, as well as suggest the SPIDER3 method for SDP. They have compared the performance based on accuracy, sensitivity, specificity, and precision. Manjula and Florence [17] developed a hybrid model of the genetic algorithm (GA) and the deep neural network (DNN). GA is utilized for feature optimization while DNN is for classification. The enactment of the projected technique is benchmarked with NB, RF, DT, Immunos, ANN-artificial bee colony (ABC), SVM, majority vote, AntMiner+, and KNN. All the performances are carried out on a dataset that includes KC1, KC2, CM1, PC1, and JM1 and assessed via recall, F-score, sensitivity, precision, specificity, and accuracy. The tentative results show that the recommended technique beats other techniques in terms of achieving better accuracy. Researchers have used various techniques to incredulous the boundaries of SDP on a variety of datasets. In each study, different evaluation measures are accomplished to evaluate and benchmark the proposed techniques. The overall summary of the literature discussed above is listed in Table 1, where the first column represents the authors who conducted research studies utilizing various ML techniques. The second column of the table shows techniques utilized by an individual study, while the third and fourth columns represent dataset and evaluation measures utilized in different studies. As shown in Table 1, each study has used different evaluation measures to achieve higher accuracy, but none affects decreasing error rate which is a significant feature. Author Technique/Model Datasets Evaluation measures Czibula et al. [11] RAD MW1, JM1, PC1, PC2, PC3, PC4, KC1, KC3, MC2, and CM1 Accuracy, specificity, precision, PD, and ROC Li et al. [14] DP-CNN Camel, jEdit, Lucene, Xalam, Xerces, Synapse, and Poi F-measure Jacob and Raju [15] HFS, NB, NN, RF, RT, J48 PC1, PC2, PC3, PC4, CM1, MW1, KC3, and JM1 Specificity, sensitivity, MCC, and accuracy Bashir et al. [16] NB, RF, KNN, MLP, SVM, J48, and decision stump CM1, JM1, KC2, MC1, PC1, and PC5 ROC Miholca et al. [7] HyGRAR Tomcat 6.0, Anr 1.7, jEdit 4.0, AR1, jEdit 4.2, AR3, jEdit 4.3, AR5, AR4, and AR6 Accuracy, sensitivity, specificity, and precision Alsaeedi and Khan [8] Bagging, SVM, DT, and RF PC1, PC3, PC4, PC5, JM1, KC2, KC3, MC1, MC2, and CM1 G-measure, specificity, F-score, recall, precision, and accuracy Iqbal et al. [9] OneR, NB, K∗, MLP, SVM, RBF, RF, KNN, DT, and PART JM1, MW1, CM1, MC1, PC1, MC2, PC4, PC3, PC2, PC5, KC3, and KC1 MCC, ROC area, F-measure, recall, precision, and accuracy Malhotra and Kamal [6] J48, RF, NB, AdaBoost, and bagging, and SPIDER3 NASA datasets Accuracy, sensitivity, specificity, and precision Manjula and Florence [17] GA, DNN, NB, RF, DT, ABC, SVM, and KNN KC1, KC2, CM1, PC1, and JM1 Precision, sensitivity, specificity, recall, F-score, and accuracy

Semantic Feature Learning via Dual Sequences for Defect Prediction

Article

Full-text available

Jan 2021

Software defect prediction (SDP) can help developers reasonably allocate limited resources for locating bugs and prioritizing their testing efforts. Existing methods often serialize an Abstract Syntax Tree (AST) obtained from the program source code into a token sequence, which is then inputted into the deep learning model to learn the semantic features. However, there are different ASTs with the same token sequence, and it is impossible to distinguish the tree structure of the ASTs only by a token sequence. To solve this problem, this paper proposes a framework called Semantic Feature Learning via Dual Sequences (SFLDS), which can capture the semantic and structural information in the AST for feature generation. Specifically, based on the AST, we select the representative nodes in the AST and convert the program source code into a simplified AST (S-AST). Our method introduces two sequences to represent the semantic and structural information of the S-AST, one is the result of traversing the S-AST node in pre-order, and another is composed of parent nodes. Then each token in the dual sequences is encoded as a numerical vector via mapping and word embedding. Finally, we use a bi-directional long short-term memory (BiLSTM) based neural network to automatically generate semantic features from the dual sequences for SDP. In addition, to leverage the statistical characteristics contained in the handcrafted metrics, we also propose a framework called Defect Prediction via SFLDS (DP-SFLDS) which combines the semantic features generated from SFLDS with handcrafted metrics to perform SDP. In our empirical studies, eight open-source Java projects from the PROMISE repository are chosen as our empirical subjects. Experimental results show that our proposed approach can perform better than several state-of-the-art baseline SDP methods.

Within-Project Defect Prediction of Infrastructure-as-Code Using Product and Process Metrics

Article

Full-text available

Jan 2021

Infrastructure-as-code (IaC) is the DevOps practice enabling management and provisioning of infrastructure through the definition of machine-readable files, hereinafter referred to as IaC scripts. Similarly to other source code artefacts, these files may contain defects that can preclude their correct functioning. In this paper, we aim at assessing the role of product and process metrics when predicting defective IaC scripts. We propose a fully integrated machine-learning framework for IaC Defect Prediction, that allows for repository crawling, metrics collection, model building, and evaluation. To evaluate it, we analyzed 104 projects and employed five machine-learning classifiers to compare their performance in flagging suspicious defective IaC scripts. The key results of the study report Random Forest as the best-performing model, with a median AUC-PR of 0.93 and MCC of 0.80. Furthermore, at least for the collected projects, product metrics identify defective IaC scripts more accurately than process metrics. Our findings put a baseline for investigating IaC Defect Prediction and the relationship between the product and process metrics, and IaC scripts' quality.

Empirical Analysis of Rank Aggregation-Based Multi-Filter Feature Selection Methods in Software Defect Prediction

Article

Full-text available

Jan 2021

Selecting the most suitable filter method that will produce a subset of features with the best performance remains an open problem that is known as filter rank selection problem. A viable solution to this problem is to independently apply a mixture of filter methods and evaluate the results. This study proposes novel rank aggregation-based multi-filter feature selection (FS) methods to address high dimensionality and filter rank selection problem in software defect prediction (SDP). The proposed methods combine rank lists generated by individual filter methods using rank aggregation mechanisms into a single aggregated rank list. The proposed methods aim to resolve the filter selection problem by using multiple filter methods of diverse computational characteristics to produce a dis-joint and complete feature rank list superior to individual filter rank methods. The effectiveness of the proposed method was evaluated with Decision Tree (DT) and Naïve Bayes (NB) models on defect datasets from NASA repository. From the experimental results, the proposed methods had a superior impact (positive) on prediction performances of NB and DT models than other experimented FS methods. This makes the combination of filter rank methods a viable solution to filter rank selection problem and enhancement of prediction models in SDP.

Software Defect Prediction Using Variant based Ensemble Learning and Feature Selection Techniques

Article

Full-text available

Oct 2020

Testing is considered as one of the expensive activities in software development process. Fixing the defects during testing process can increase the cost as well as the completion time of the project. Cost of testing process can be reduced by identifying the defective modules during the development (before testing) stage. This process is known as "Software Defect Prediction", which has been widely focused by many researchers in the last two decades. This research proposes a classification framework for the prediction of defective modules using variant based ensemble learning and feature selection techniques. Variant selection activity identifies the best optimized versions of classification techniques so that their ensemble can achieve high performance whereas feature selection is performed to get rid of such features which do not participate in classification and become the cause of lower performance. The proposed framework is implemented on four cleaned NASA datasets from MDP repository and evaluated by using three performance measures,

RFC: A feature selection algorithm for software defect prediction

Article

Apr 2021
J SYST ENG ELECTRON

Software defect prediction (SDP) is used to perform the statistical analysis of historical defect data to find out the distribution rule of historical defects, so as to effectively predictdefects in the new software. However, there are redundant and irrelevant features in the software defect datasets affecting the performance of defect predictors. In order to identify and remove the redundant and irrelevant features in software defectdatasets, we propose Relief F-based clustering (RFC), a cluster-based feature selection algorithm. Then, the correlation between features is calculated based on the symmetric uncertainty. According to the correlation degree, RFC partitions features into kclusters based on the k-medoids algorithm, and finally selects the representative features from each cluster to form the final feature subset. In the experiments, we compare the proposed RFC with classical feature selection algorithms on nine National Aeronautics and Space Administration (NASA) software defectprediction datasets in terms of area under curve (AUC) and F-value. The experimental results show that RFC can effectively improve the performance of SDP.

An Hybrid Approach for Cost Effective Prediction of Software Defects

Article

Jan 2021

Identifying software defects during early stages of Software Development life cycle reduces the project effort and cost. Hence there is a lot of research done in finding defective proneness of a software module using machine learning approaches. The main problems with software defect data are cost effective and imbalance. Cost effective problem refers to predicting defective module as non defective induces high penalty compared to predicting non defective module as defective. In our work, we are proposing a hybrid approach to address cost effective problem in Software defect data. To address cost effective problem, we used bagging technique with Artificial Neuro Fuzzy Inference system as base classifier. In addition to that, we also addressed Class Imbalance & High dimensionality problems using Artificial Neuro Fuzzy inference system & principle component analysis respectively. We conducted experiments on software defect datasets, downloaded from NASA dataset repository using our proposed approach and compared with approaches mentioned in literature survey. We observed Area under ROC curve (AuC) for proposed approach was improved approximately 15% compared with highly efficient approach mentioned in literature survey.

Current Software Defect Prediction: A Systematic Review

Recommended publications

A Comparison of Classification Algorithms for Software Defect Prediction

On the value of learning from defect dense components for software defect prediction

Correction to: Cross‑projects software defect prediction using spotted hyena optimizer algorithm

A Review on Software Defect Prediction Techniques Using Product Metrics