ArticlePDF Available

Data and Ensemble Machine Learning Fusion Based Intelligent Software Defect Prediction System

Computers, Materials & Continua

May 2023
75(3):6083-6100

DOI:10.32604/cmc.2023.037933

License
CC BY 4.0

Authors:

Sagheer Abbas

Bahria University Lahore

Shabib Aftab

Virtual University of Pakistan

Muhammad Adnan Khan

Gachon University

Taher M. Ghazal

Khalifa University

Show all 6 authorsHide

The software engineering field has long focused on creating high-quality software despite limited resources. Detecting defects before the testing stage of software development can enable quality assurance engineers to concentrate on problematic modules rather than all the modules. This approach can enhance the quality of the final product while lowering development costs. Identifying defective modules early on can allow for early corrections and ensure the timely delivery of a high-quality product that satisfies customers and instills greater confidence in the development team. This process is known as software defect prediction, and it can improve end-product quality while reducing the cost of testing and maintenance. This study proposes a software defect prediction system that utilizes data fusion, feature selection, and ensemble machine learning fusion techniques. A novel filter-based metric selection technique is proposed in the framework to select the optimum features. A three-step nested approach is presented for predicting defective modules to achieve high accuracy. In the first step, three supervised machine learning techniques, including Decision Tree, Support Vector Machines, and Naïve Bayes, are used to detect faulty modules. The second step involves integrating the predictive accuracy of these classification techniques through three ensemble machine-learning methods: Bagging, Voting, and Stacking. Finally, in the third step, a fuzzy logic technique is employed to integrate the predictive accuracy of the ensemble machine learning techniques. The experiments are performed on a fused software defect dataset to ensure that the developed fused ensemble model can perform effectively on diverse datasets. Five NASA datasets are integrated to create the fused dataset: MW1, 6084 CMC, 2023, vol.75, no.3 PC1, PC3, PC4, and CM1. According to the results, the proposed system exhibited superior performance to other advanced techniques for predicting software defects, achieving a remarkable accuracy rate of 92.08%.

External view of proposed ISDPS

…

Internal view of proposed ISDPS

…

Rule surface of fused ensemble method with bagging and voting

…

Result of proposed fused ensemble method with defective module (1)

…

Result of proposed fused ensemble method with non-defective module (0)

…

Figures - available from: Computers, Materials & Continua

This content is subject to copyright. Terms and conditions apply.

Access to this full-text is provided by Tech Science Press.

Learn more

Content available from Computers, Materials & Continua

This content is subject to copyright. Terms and conditions apply.

This work is licensed under a Creative Commons Attribution 4.0 International License,

which permits unrestricted use, distribution, and reproduction in any medium, provided

the original work is properly cited.

ech

PressScience

DOI: 10.32604/cmc.2023.037933

Article

Data and Ensemble Machine Learning Fusion Based Intelligent Software

Defect Prediction System

Sagheer Abbas1, Shabib Aftab1,2, Muhammad Adnan Khan3,4, Taher M. Ghazal5,6,

Hussam Al Hamadi7and Chan Yeob Yeun8,*

1School of Computer Science, National College of Business Administration & Economics, Lahore, 54000, Pakistan

2Department of Computer Science, Virtual University of Pakistan, Lahore, 54000, Pakistan

3Department of Software, Faculty of Artificial Intelligence and Software, Gachon University, Seongnam,

13120, Korea

4Riphah School of Computing & Innovation, Faculty of Computing, Riphah International University, Lahore Campus,

Lahore, 54000, Pakistan

5School of Information Technology, Skyline University College, University City Sharjah, Sharjah, UAE

6Center for Cyber Security, Faculty of Information Science and Technology, UKM, Bangi, Selangor, 43600, Malaysia

7College of Engineering and IT, University of Dubai, 14143, UAE

8EECS Department, Center for Cyber Physical Systems, Khalifa University, Abu Dhabi, 127788, UAE

*Corresponding Author: Chan Yeob Yeun. Email: chan.yeun@ku.ac.ae

Received: 22 November 2022; Accepted: 16 March 2023

Abstract: The software engineering field has long focused on creating high-

quality software despite limited resources. Detecting defects before the testing

stage of software development can enable quality assurance engineers to con-

centrate on problematic modules rather than all the modules. This approach

can enhance the quality of the final product while lowering development costs.

Identifying defective modules early on can allow for early corrections and

ensure the timely delivery of a high-quality product that satisfies customers

and instills greater confidence in the development team. This process is

known as software defect prediction, and it can improve end-product quality

while reducing the cost of testing and maintenance. This study proposes a

software defect prediction system that utilizes data fusion, feature selection,

and ensemble machine learning fusion techniques. A novel filter-based metric

selection technique is proposed in the framework to select the optimum

features. A three-step nested approach is presented for predicting defective

modules to achieve high accuracy. In the first step, three supervised machine

learning techniques, including Decision Tree, Support Vector Machines, and

Naïve Bayes, are used to detect faulty modules. The second step involves

integrating the predictive accuracy of these classification techniques through

three ensemble machine-learning methods: Bagging, Voting, and Stacking.

Finally, in the third step, a fuzzy logic technique is employed to integrate

the predictive accuracy of the ensemble machine learning techniques. The

experiments are performed on a fused software defect dataset to ensure

that the developed fused ensemble model can perform effectively on diverse

datasets. Five NASA datasets are integrated to create the fused dataset: MW1,

6084 CMC, 2023, vol.75, no.3

PC1, PC3, PC4, and CM1. According to the results, the proposed system

exhibited superior performance to other advanced techniques for predicting

software defects, achieving a remarkable accuracy rate of 92.08%.

Keywords: Ensemble machine learning fusion; software defect prediction;

fuzzy logic

1Introduction

Most of the researchers from the software engineering domain have been working to minimize the

cost of the Software Development Life Cycle (SDLC) without compromising on the quality [1,2]. The

activity of software testing aims to ensure the high quality of the end product [3–5]. A minor defect

in software can lead to system failure and a catastrophic event in the case of a critical system [1–3].

The importance of identifying and removing the defects can be reflected by an example of “NASA’s

$125 million Mars Climate Orbiter”, which was lost because of a minor data conversion defect [1].

Software defects can be of different types, including wrong program statements, syntax errors, and

design or specification errors [1,2]. In SDLC, the testing process plays a crucial role in achieving

a high-quality end product by eliminating defects [6,7]. However, it has been proved that software

testing is the most expensive activity as it takes most of the resources as compared to other tasks

of SDLC [3,8–10]. Identifying and fixing the defects before testing would be less costly compared

to the cost of repairing the defects at later stages, especially after integration [11,12]. This objective

can be achieved by incorporating an effective Software Defect Prediction (SDP) system [13,14], which

can identify faulty software modules before the testing stage, allowing for focused testing efforts on

those particular modules [1]. This approach can guarantee the delivery of high-quality end products

with limited resources [3]. Many supervised machine learning-based defect prediction techniques and

frameworks have been proposed by researchers for the effective and efficient detection of defect-prone

software modules [1,2]. In the supervised machine learning technique, a classifier is trained by using a

pre-labeled dataset. The dataset which is used to train the classifier includes multiple independent

features and at least one dependent feature. The dependent feature is known as the output class,

which is classified or predicted by exploring the hidden pattern and relationship between independent

attributes and dependent attributes. That hidden pattern and relationship are learned by the supervised

classifier, which is further used for prediction on the unseen dataset (testing data) [6–8]. The SDP

dataset typically pertains to a specific software component, with independent attributes represented

by software quality metrics collected during development. The dependent feature, on the other hand,

is the predictable class which reflects whether the particular module is defective or not. Instances

in the SDP dataset reflect the modules, and by classifying a specific instance as defective or non-

defective, we are predicting a particular module that is reflected by the instance. This study presents

the Intelligent Software Defect Prediction System (ISDPS), which utilizes data and decision-level

ensemble machine learning fusion, along with a novel filter-based ensemble feature selection technique

to improve accuracy and reduce costs. The ISDPS follows a three-step process for software defect

prediction, beginning with the use of three supervised classification techniques—Decision Tree (DT),

Support Vector Machines (SVM), and Naïve Bayes (NB)—to build classification models for SDP.

The second step employs ensemble techniques such as Bagging, Voting, and Stacking to merge the

predictive accuracy of the classification models. The third step uses fuzzy logic to fuse the predictive

CMC, 2023, vol.75, no.3 6085

accuracy of the ensemble models. The proposed system was evaluated by combining five datasets

from NASA’s repository and achieved a high accuracy rate of 92.08%, outperforming other published

techniques.

2Related Work

Researchers have presented various models and frameworks to detect faulty software modules

before the testing stage. Several studies have been conducted on this topic, and some are discussed here.

In [14], researchers applied a metric selection algorithm and ensemble machine learning approach to

detect defective software modules. The research was conducted on six different software defect datasets

taken from NASA’s software repository. The performance of the proposed method was evaluated using

statistical measures such as Receiver Operating Characteristic (ROC) value, Matthews’s Correlation

Coefficient (MCC), F-measure, and Accuracy. The study compared varioussearch methods for metric

selection using ensemble machine learning. The results demonstrated that the proposed method

outperformed all other techniques, as evidenced by its superior performance compared to various

supervised classifiers. Researchers in [15] proposed an Artificial Neural Network (ANN) based system

for detecting faulty software modules, along with a metric selection algorithm that uses a multi-filter-

based approach. The proposed system uses oversampling to handle class imbalance and performs

classification in two dimensions, one with oversampling and one without. NASA’s cleaned software

defect datasets were used for implementation, and the system’s performance was evaluated using

various statistical metrics such as MCC, ROC, Accuracy, and F-measure. The study compared the

proposed SDP technique with other methods, and the results showed that the proposed technique

outperformed other techniques in terms of accuracy and other statistical measures. In their study

[16], researchers introduced a framework for predicting faulty software modules based on Multi-

Layer Perceptron (MLP) with bagging and boosting as ensemble machine learning techniques. The

framework employs three approaches for predicting defective modules, including tuning the MLP for

classification, creating an ensemble of tuned MLP using bagging, and developing an ensemble of tuned

MLP using boosting. The study utilized four cleaned datasets from NASA’s repository to implement

the proposed framework and evaluated its performance using statistical metrics such as Accuracy,

MCC, F-measure, and ROC. The results indicated that the proposed technique outperformed various

classifiers from published research, as determined by the test of Scott Knott Effect Size Difference.

According to [17], researchers conducted a thorough comparative analysis of four commonly used

training techniques for back-propagation algorithms in ANN to predict defective software modules.

They also proposed a fuzzy logic-based layer to determine the most effective training technique. The

researchers utilized cleaned versions of NASA software defect datasets and assessed the performance

of the proposed framework using several metrics, such as Accuracy, Specificity, F-measure, Recall,

Precision, ROC, and Mean Square Error. The study compared the results with those of various

classifiers, and the proposed framework outperformed other methods. In [18], a machine learning-

based framework using a variant ensemble technique is presented for predicting faulty software

modules. The proposed framework also incorporates a metric selection method to optimize the

performance of the classification technique. The variant selection process involves identifying and

selecting the best version of the classification technique to achieve high performance. The ensemble

technique integrates the predictive accuracy of the optimized variants to further increase the accuracy

of the proposed framework. The proposed framework was tested using four cleaned software defect

datasets from NASA’s repository, and its performance was evaluated using three statistical measures:

MCC, Accuracy, and F-measure. The results of the framework were compared with various other

classifiers to assess its effectiveness. The analysis showed that the proposed framework outperformed

6086 CMC, 2023, vol.75, no.3

all other classifiers, indicating its superiority in predicting faulty software modules. The authors of

[19] performed a thorough comparative analysis of multiple supervised machine learning techniques

for software defect prediction. They used twelve cleaned software defect datasets from NASA and

evaluated the performance of the models using various statistical measures such as F-measure,

Precision, Accuracy, Recall, ROC, and MCC. The authors concluded that the results obtained from

their study could serve as a benchmark for future comparisons between different SDP techniques.

In [20], researchers designed an Artificial Neural Network (ANN) tool to detect defective software

modules. They compared different training functions of ANN and identified that the Bayesian

Regularization training function outperformed others. The objective of this study was to decrease

the cost of software development by detecting faulty modules before they reach the testing stage, thus

reducing the burden on the quality assurance team.

3Materials and Methods

The study introduces an intelligent system that employs data fusion to detect defective software

modules. The system integrates feature selection and decision-level ensemble machine learning fusion

techniques to improve the accuracy and efficiency of identifying faulty software modules. The ISDPS

proposed in this study can be viewed from two perspectives: external and internal. The exterior view,

as shown in Fig. 1, outlines the workflow surrounding the defect prediction system. The development

team initiates the workflow during the development stage of the software development life cycle

(SDLC). Understanding the surrounding scenario is crucial to comprehend the importance of the

proposed system. A software metric dataset is prepared during the process of software development.

The dataset consists of various quality metrics captured automatically or manually during develop-

ment. Every single dataset reflects a particular Software Component (SC) in which there are many

modules. Each module in a specific SC dataset is reflected by an instance, and the values of quality

attributes/features of that instance are populated during development. Each SC dataset consists of

various quality attributes (independent features) and one dependent feature (also known as output

class). Initially, the developed software components in SDLC are stored in Software Metric Dataset

Repository (SMDR). Each component has its Quality Metrics Dataset (QMD), which contains the

values of various quality attributes recorded during the development stage. The SMDR consists of two

further sub-repositories: The untested Software Components Repository (USCR) and Tested Software

Components Repository (TSCR). Initially, the developed components are not tested and are stored in

USCR. Some selected components from USCR are tested by the Quality Assurance (QA) team, and

an additional attribute is added to the QMD, known as “Result”. This column will reflect the nominal

value of “yes” if the particular module is defective and “no” if the module is non-defective. The tested

component, along with its QMD with result attribute, is stored in the TSCR sub-repository. When the

TSCR is initially populated with tested components, then the proposed ISDPS will come into work as

it will need the pre-labeled dataset for training.

Two or more datasets from TSCR are extracted into the training layer of the proposed defect

prediction system, where data fusion will be performed, and the prediction model is developed, which

will be stored in the cloud for later use. The testing layer of the proposed system will receive QMD as

input from USCR to perform real-time defect predictions. The “Result” attribute in QMD is populated

by the testing layer after prediction and then will be sent to the QA team. The QA team will add

the QMD to its related SC, and if the module is predicted as defective, then thorough testing of that

particular module will be performed, and the detail of identified defect will be sent to the development

team in SDLC. The development team will correct the defective module, and then it will be again

CMC, 2023, vol.75, no.3 6087

transferred to USCR for the next iteration. If the module is predicted as non-defective by the proposed

ISDPS, then it would be considered as good to go to the integration stage of SDLC.

Figure 1: External view of proposed ISDPS

The internal view (Fig. 2) of the proposed ISDPS contains two layers: training and testing. These

layers are comprised of several stages and activities. The workflow in the training layer initiates with

the extraction of pre-labeled QMDs from SMDR. Data fusion is the first stage of the training layer

in the proposed system, in which datasets of multiple software components will be extracted, and

instance-level fusion will be performed. The main objective of the data fusion process is to develop an

effective and efficient classification model, which can be used for prediction on diverse test datasets

from multiple sources. No doubt, training the model with higher accuracy on the fused dataset is

challenging but eventually fruitful for later use, especially when prediction has to be performed on

multiple datasets from multiple sources. However, it should be ensured that the nature of the test data

would be the same as that of the training dataset. For this study, five cleaned datasets were chosen

from NASA’s repository, including MW1, PC1, PC3, PC4, and CM1. The details of the used datasets

and their attributes are available in [21]. After extraction, all five datasets are fused. The fused dataset

consists of 3579 instances and 38 attributes. Of 38 attributes, 37 attributes are independent, whereas

one attribute named “Defective” is dependent, which aims to determine whether a module is defective.

The dependent attribute, also known as the output class, can contain two values, “Yes” for defective

modules and “No” for non-defective modules.

Pre-processing is the second stage of the training layer, which will deal with four data pre-

processing activities, including 1) cleaning, 2) normalization, 3) splitting of data for training and

testing, and 4) feature selection for efficient and effective prediction. The cleaning process of the

pre-processing stage will be responsible for handling the missing values in the dataset. The missing

values will be replaced by using the technique of mean imputation. Missing values in any attribute of

the dataset can misguide the classification model, which may result in low accuracy of the proposed

framework. The second activity deals with the normalization process, which scales the values of all

independent attributes to a range of 0 to 1. It has been observed that cleaning and normalization

activities aid the classification techniques to work efficiently and effectively. The third activity will

deal with the splitting of the dataset into the groups of training and testing datasets with a ratio of

70:30 by using the split class rule. The fourth and final activity of pre-processing stage will deal with

6088 CMC, 2023, vol.75, no.3

the selection of optimum features [14,15] from training and test sets by using a novel feature selection

(FS) technique. A novel filter-based ensemble feature selection (FEFS) technique is proposed in which

feature selection is performed three times in a nested way. The proposed FEFS technique consists

of four steps. First, the complete feature set of training data is given as input to the Correlation-

based FS technique with Genetic Search (GS) method. In the second step, Consistency based FS

is performed on complete training data with Best First (BF) search method. In the third step, an

intersection is performed among both of the resultant feature sets from step 1 and step 2 (Correlation

FS and Consistency FS). Finally, in step four, the feature set generated from the intersection operation

from step 3 is given as input again to the correlation-based FS technique with the GS method, and

the resultant feature set is selected from training and test datasets. The detailed steps of the proposed

FEFS technique are given below (Table 1).

Figure 2: Internal view of proposed ISDPS

Table 1: Proposed FEFS technique

Input:

Training Dataset: Dataset A

Test Dataset: Dataset B

Attribute Evaluator: Correlation FS, Consistency FS

Search Methods: GS—BF

Output:

n=numbers of features

Steps:

1 Dataset A →Correlation FS-GS →Subset 1: a1, a2 ..., an;

2 Dataset A →Consistency FS-BF →Subset 2: b1, b2 ..., bn;

3 Subset 1 Intersection Subset 2 →Subset 3: c1, c2 ...,cn;

4 Subset 3: →Correlation FS-GS →Subset 4: d1, d2, ... dn;

5 Select Subset 4 as Feature Set from Dataset A and Dataset B.

CMC, 2023, vol.75, no.3 6089

Classification is the third stage which deals with the development of classification models to

identify defective and non-defective modules. Selected features of pre-processed datasets (training and

testing) are used as input to the classification stage. The study employed three supervised machine

learning techniques, namely NB, SVM, and DT, for classification. These classifiers were fine-tuned

iteratively to achieve the highest possible accuracy on the testing dataset. During the tuning process

on training data, default parameters are used in NB as the performance decreases after optimization.

In SVM, the polynomial kernel is selected, and the value of the complexity parameter (C) is set to 1.

In DT, the confidence factor has been set to 0.3. The classification stage will end by producing the

optimized prediction models of the used supervised machine learning techniques.

Ensemble Modeling is the fourth stage in the training layer which deals with the development of

ensemble models by integrating the optimized classification classifiers (NB, SVM, and DT), which are

given as input to the ensemble modeling stage. The ensemble machine learning approaches can further

increase the prediction accuracy than individual optimized classification techniques [3,7,14,22,23].

Three ensemble techniques will be used for the development of ensemble models, including Bagging,

Voting, and Stacking. One by one, all of the optimized classification models are used as input to the

ensemble techniques for the development of ensemble models. Three ensemble models are developed

in the proposed system: Bagging with DT, Voting with NB, SVM, and DT, and Stacking with NB

and SVM along with DT as a Meta classifier. All three developed ensemble models have shown better

accuracy on test data than each of the optimized individual classifiers.

The fifth and final stage of the training layer deals with the fusion of ensemble machine-learning

techniques. This stage is responsible for decision-level fusion by integrating the predictive accuracy

of optimized ensemble models [24]. Fuzzy logic is used for decision-level fusion, where membership

functions are developed using if-then rules (as shown in Table 2). These rules form the basis of the

final prediction and enhance the accuracy of the defect prediction system. The fused ensemble model

is then stored in cloud storage for real-time predictions.

Table 2: Membership functions of proposed fusion method

Membership functions Graphical representation

( ) =max min 1, 0.5 −bg

0.05 ,0



( ) =max min bg −0.45

0.05 ,1

,0



(Continued)

6090 CMC, 2023, vol.75, no.3

Table 2: Continued

Membership functions Graphical representation

VY( ) =max min 1, 0.5 −vt

0.05 ,0



VN( ) =max min vt −0.45

0.05 ,1

,0



sky ( ) =max min 1, 0.5 −sk

0.05 ,0



skn ( ) =max min sk −0.45

0.05 ,1

,0



ˇ

D( ) =max min 1, 0.5 −

0.05 ,0



ˇ

D( ) =max min −0.45

0.05 ,1

,0



If-Then conditions that are used to develop membership functions are given below:

IF (Bagging is yes and Voting is yes and Stacking is yes) THEN (Module is defective).

IF (Bagging is yes and Voting is yes and Stacking is no) THEN (Module is defective).

IF (Bagging is yes and Voting is no and Stacking is yes) THEN (Module is defective).

IF (Bagging is no and Voting is yes and Stacking is yes) THEN (Module is defective).

CMC, 2023, vol.75, no.3 6091

IF (Bagging is no and Voting is no and Stacking is also no) THEN (Module is not defective).

IF (Bagging is yes and Voting is no and Stacking is no) THEN (Module is not defective).

IF (Bagging is no and Voting is no and Stacking is yes) THEN (Module is not defective).

IF (Bagging is no and Voting is yes and Stacking is no) THEN (Module is not defective).

Fig. 3 depicts the ruled surface of the proposed fuzzy logic-based fusion method for final

prediction, in contrast to the bagging and voting ensemble techniques. In cases where both techniques

predict that the software module is not defective, the fused model will make the same prediction.

Likewise, if both techniques predict that the module is defective, the fused model will also predict that

the module is defective.

Figure 3: Rule surface of fused ensemble method with bagging and voting

Fig. 4 shows the graphical representation of fuzzy logic based if-then rule for the scenario; when

bagging and stacking, both techniques predict that the particular module is defective, whereas the

voting technique predicts the opposite (non-defective). In this case, the proposed technique would go

for a majority decision (defective).

Fig. 5 illustrates that if bagging and stacking both predict that the module is non-defective, the

proposed technique will also predict that the module is non-defective.

The testing layer is the implementation layer of the proposed ISDPS. In this layer, three activities

are performed. The first activity deals with the extraction of unlabeled QMD from USCR for the

prediction of defective software modules. The second activity involves extracting the fused model

from the cloud for prediction. The third activity of the testing layer deals with real-time prediction

in which unlabeled QMD is given as input to the fused model, which is then labeled after prediction.

The labeled QMD is attached to its related SC in TSCR and then sent back to the development life

cycle. If the software is predicted as defective, then the particular defect is rectified by the development

team; otherwise, the particular software component would be considered good to go for integration.

4Results and Discussion

An empirical analysis was conducted to assess the effectiveness of the proposed ISDPS using

a fused software defect dataset. The dataset was created by combining five of NASA’s cleaned

datasets. The fused dataset contains 3579 instances, with 428 defective and 3151 non-defective. In

6092 CMC, 2023, vol.75, no.3

Figure 4: Result of proposed fused ensemble method with defective module (1)

Figure 5: Result of proposed fused ensemble method with non-defective module (0)

the pre-processing stage of the training layer, the dataset underwent cleaning and normalization

processes, followed by the splitting process, where the dataset was divided into training and test subsets

with a 70:30 ratio. The training subset comprised 2506 instances, while the test subset contained

1073 instances. A novel FEFS technique is proposed for effective and efficient prediction, which

is implemented on the complete feature set of training data to select the optimum feature set. The

proposed method chose 7 out of 37 independent features. The detail of the full features of the fused

dataset is available at [21], whereas the feature set selected by the proposed FEFS technique is shown

in Table 3.

CMC, 2023, vol.75, no.3 6093

Table 3: Selected features using the proposed FEFS technique

No. Selected features

1 LOC_BLANK

2 LOC_CODE_AND_COMMENT

3 CYCLOMATIC_DENSITY

4 PARAMETER_COUNT

5 HALSTEAD_CONTENT

6 NUM_OPERATORS

7 PERCENT_COMMENTS

In the proposed ISDPS, prediction is performed in three steps. Initially, three supervised machine

learning techniques (NB, SVM, and DT) are iteratively optimized for classification in the first step until

the highest possible accuracy is attained for each model. The optimized classification models created

from these classifiers are given to the second step of prediction, where three ensemble techniques

(Bagging, Voting, and Stacking) are used to integrate the predictive accuracy of used classifiers.

The classifiers are integrated by ensemble methods with all possible combinations until we get three

ensemble classification models, one from each ensemble technique that performed higher than the base

classifier. The results of ensemble techniques are given as input to the final prediction step, which is

empowered by fuzzy logic.

The accuracy measures used for the performance analysis of the proposed ISDPS are discussed

below:

Misrate =(AOR1/EOR0+AOR0/EOR1)

EOR0+EOR1

(1)

Accuracy =(AOR0/EOR0+AOR1/EOR1)

EOR0+EOR1

(2)

Positive Prediction Value =AOR1/EOR1

(AOR1/EOR1+AOR0/EOR1)(3)

Negative Prediction Value =AOR0/EOR0

(AOR0/EOR0+AOR1/EOR0)(4)

Specificity =AOR0/EOR0

(AOR0/EOR0+AOR0/EOR1)(5)

Sensitivity =AOR1/EOR1

(AOR1/EOR0+AOR1/EOR1)(6)

False Positive Ratio =1−Specificity (7)

False Positive Ratio =1−Specificity (8)

Likelihood Ratio Positive =Sensitivity

(1−Specificity)(9)

6094 CMC, 2023, vol.75, no.3

Likelihood Ratio Negative =(1−Sensitivity)

Specificity (10)

The training data, which consists of 2506 instances, are used to train the classifiers and ensemble

models. During the NB training process, 2061 instances are correctly predicted as negative, whereas 80

instances are correctly predicted as positive. The output result and achieved results can be compared

in Table 4, which reflects that the training process achieved 85.43% accuracy and a 14.57% miss rate

in NB. In the process of testing, 865 instances are correctly predicted as negative, whereas 46 instances

are correctly predicted as positive. After comparing the output result and expected results (Table 4),

84.90% accuracy is achieved, with a miss rate of 15.10% in NB testing.

Table 4: NB results

Training data Testing data

Samples =2506 Output (AOR0,AOR

1)Samples=1073 Output (AOR0,AOR

Input Expected Result

(EOR0,EOR

AOR0

(Negative-0)

AOR1

(Positive-1)

Expected Result

(EOR0,EOR

AOR0

(Negative)

AOR1

(Positive)

EOR0=2206

(Negative-0)

2061 145 EOR0=945

(Negative-0)

865 80

EOR1=300

(Positive-1)

220 80 EOR1=128

(Positive-0)

82 46

In the training process of SVM, 2135 instances are correctly predicted as negative, whereas 40

instances are correctly predicted as positive. In the training process with SVM, 86.79% accuracy is

achieved, along with a miss rate of 13.21% after analyzing the results in Table 5. Testing results show

that 905 instances are correctly predicted as negative, whereas 24 instances are correctly predicted as

positive. After analyzing the expected and output results, the achieved accuracy in SVM testing is

86.58%, with a miss rate of 13.42%.

Table 5: SVM results

Training data Testing data

Samples =2506 Output (AOR0,AOR

1)Samples=1073 Output (AOR0,AOR

Input Expected Result

(EOR0,EOR

AOR0

(Negative-0)

AOR1

(Positive-1)

Expected Result

(EOR0,EOR

AOR0

(Negative)

AOR1

(Positive)

EOR0=2206

(Negative-0)

2135 71 EOR0=945

(Negative-0)

905 40

EOR1=300

(Positive-1)

260 40 EOR1=128

(Positive-0)

104 24

In DT training, 2122 instances are correctly predicted as negatives, whereas 100 instances are

correctly predicted as positives. Upon reviewing the expected and achieved outputs presented in

Table 6, 88.67% accuracy with an 11.33% miss rate is achieved. In the testing process of DT, 884

instances are correctly predicted as negative, whereas 51 instances are correctly predicted as positive.

After analyzing the results (Table 6), 87.14% accuracy is achieved with a miss rate of 12.86%.

CMC, 2023, vol.75, no.3 6095

Table 6: DT results

Training data Testing data

Samples =2506 Output (AOR0,AOR

1)Samples=1073 Output (AOR0,AOR

Input Expected result

(EOR0,EOR

AOR0

(Negative-0)

AOR1

(Positive-1)

Expected result

(EOR0,EOR

AOR0

(Negative)

AOR1

(Positive)

EOR0=2206

(Negative-0)

2122 84 EOR0=945

(Negative-0)

884 61

EOR1=300

(Positive-1)

200 100 EOR1=128

(Positive-0)

77 51

After the development of classification models using supervised machine learning techniques (NB,

SVM, DT), ensemble machine learning models are developed. In training with the bagging technique,

2205 instances are correctly predicted as negative, whereas 164 instances are predicted as positive.

After analyzing the training results shown in Table 7, 94.53% accuracy is achieved with a miss rate

of 5.47%. Testing with bagging correctly predicted 913 instances as negative, whereas no of correctly

predicted positive instances 55. Upon comparing the expected results with the achieved results, it can

be concluded that the testing yielded an accuracy of 90.21% and a miss rate of 9.79%.

Table 7: Bagging results

Training data Testing data

Samples =2506 Output (AOR0,AOR

1)Samples=1073 Output (AOR0,AOR

Input Expected result

(EOR0,EOR

AOR0

(Negative-0)

AOR1

(Positive-1)

Expected result

(EOR0,EOR

AOR0

(Negative)

AOR1

(Positive)

EOR0=2206

(Negative-0)

2205 1 EOR0=945

(Negative-0)

913 32

EOR1=300

(Positive-1)

136 164 EOR1=128

(Positive-0)

73 55

Training with voting correctly predicted 2196 instances as negative, whereas 39 instances were

correctly predicted as positive. After analyzing the results from Table 8, 89.19% accuracy is achieved

with a 10.81% miss rate. The testing process with voting correctly predicted the 897 instances as

negatives, whereas 58 instances were correctly predicted as positives. The results reflect 89% accuracy

and an 11% miss rate.

During the training process with stacking ensemble, 2201 instances are correctly classified as

negatives, whereas 53 instances are correctly predicted as positives. Table 9 presents the output results

and expected outcome, demonstrating an accuracy of 89.94% and a miss rate of 10.06%. Testing with

stacking ensemble correctly classified 911 instances as negatives, whereas 54 instances were classified

as positives. A comparison of the expected and output results reveals an accuracy of 89.93% and a

miss rate of 10.07%.

Finally, the test dataset is given to the proposed model, which correctly predicted 926 instances

as negatives out of 945 instances, whereas, on the other hand, it correctly predicted 62 instances as

6096 CMC, 2023, vol.75, no.3

Table 8: Voting results

Training data Testing data

Samples =2506 Output (AOR0,AOR

1)Samples=1073 Output (AOR0,AOR

Input Expected result

(EOR0,EOR

AOR0

(Negative-0)

AOR1

(Positive-1)

Expected result

(EOR0,EOR

AOR0

(Negative)

AOR1

(Positive)

EOR0=2206

(Negative-0)

2196 10 EOR0=945

(Negative-0)

897 48

EOR1=300

(Positive-1)

261 39 EOR1=128

(Positive-0)

70 58

Table 9: Stacking results

Training Data Testing Data

Samples =2506 Output (AOR0,AOR

1)Samples=1073 Output (AOR0,AOR

Input Expected result

(EOR0,EOR

AOR0

(Negative-0)

AOR1

(Positive-1)

Expected result

(EOR0,EOR

AOR0

(Negative)

AOR1

(Positive)

EOR0=2206

(Negative-0)

2201 5 EOR0=945

(Negative-0)

911 34

EOR1=300

(Positive-1)

247 53 EOR1=128

(Positive-0)

74 54

positives out of 128 instances. The results are shown in Table 10, according to which the proposed

system has achieved 92.08% accuracy and a 7.92% miss rate.

Table 10: Fused ensemble testing

N=1073 (No. of samples) Output result (AOR0,AOR

Input Expected result (EOR0, EOR1)AOR

0(Negative-0) AOR1(Positive-1)

EOR0=945 (Negative-0) 926 19

EOR1=128 (Positive-1) 66 62

Table 11 shows the detailed results of base classifiers and ensemble classification models on

training and testing data, along with the results of the proposed ISDPS on test data. The analysis

showed that the proposed system outperformed both the base classifiers (NB, SVM, and DT) and the

ensembles (Bagging, Voting, and Stacking). It can be observed that the results achieved from ensemble

models are better than the results of base classifiers, and the results of final prediction by decision-level

fusion with fuzzy logic further increased the accuracy to 92.08%. The effectiveness of the proposed

system can be inferred from its performance on the fused dataset in comparison to other models.

CMC, 2023, vol.75, no.3 6097

Table 11: Detailed results of classifiers, ensembles, and ensemble fusion

ML Algorithm Dataset Accuracy Miss rate Sensitivity Specificity Positive

prediction

value

Negative

prediction

value

False

positive value

False

negative value

Likelihood

ratio

negative

Likelihood

ratio

positive

Naïve Bayes Training 0.8543 0.1457 0.2667 0.9343 0.3556 0.9036 0.0657 0.7333 0.7849 4.0570

Testing 0.8490 0.1510 0.3594 0.9153 0.3651 0.9134 0.0847 0.6406 0.6999 4.2451

Support vector

machines

Training 0.8679 0.1321 0.1333 0.9678 0.3604 0.8914 0.0322 0.8667 0.8955 4.1427

Testing 0.8658 0.1342 0.1875 0.9577 0.375 0.8969 0.0423 0.8125 0.8484 4.4297

Decision tree Training 0.8867 0.1133 0.3333 0.9619 0.5435 0.9139 0.0381 0.6667 0.6931 8.7540

Testing 0.8714 0.1286 0.3984 0.9354 0.4554 0.9199 0.0646 0.6016 0.6431 6.1725

Bagging Training 0.9453 0.0547 0.5467 0.9995 0.9939 0.9419 0.0005 0.4533 0.4535 1205.9467

Testing 0.9021 0.0979 0.4297 0.9661 0.6322 0.9260 0.0339 0.5703 0.5903 12.6892

Voting Training 0.8919 0.1081 0.13 0.9955 0.7959 0.8938 0.0045 0.87 0.8740 28.678

Testing 0.8900 0.1100 0.4531 0.9492 0.5472 0.9276 0.0508 0.5469 0.5761 8.92090

Stacking Training 0.8994 0.1006 0.1767 0.9977 0.9138 0.8991 0.0023 0.8233 0.8252 77.9453

Testing 0.8993 0.1007 0.4218 0.9640 0.6136 0.9249 0.0360 0.5781 0.5997 11.7257

Proposed fussed

ensemble

Testing 0.9208 0.0792 0.4844 0.9799 0.7654 0.9335 0.0201 0.5156 0.5262 24.0913

6098 CMC, 2023, vol.75, no.3

The performance of the proposed ISDPS is compared with other techniques in terms of accuracy

in Table 12. It is reflected that the accuracy achieved by the proposed ISDPS is higher than other

published techniques. The data fusion technique usually decreases the accuracy of the prediction

system as training of classification models on the dataset extracted from multiple sources is challenging

as compared to training on a dataset extracted from a single source. However, the proposed FEFS

technique for the selection of optimum attributes as well as the multi-step prediction system played

crucial roles in achieving high accuracy of the proposed ISDPS.

Table 12: Accuracy comparison of proposed system with other methods

Algorithm Accuracy % Miss rate %

MLP-FS [15] 85.13 14.87

Boosting-OPT-MLP [16] 79.08 20.92

ANN-BR-fused [17] 85.45 14.55

FS-variant ensemble ML [18] 84.97 15.03

NB [19] 82.65 17.35

MLP [25] 89.96 10.04

Tree [25] 84.94 15.06

Bagging ensemble [26] 80.20 19.8

Boosting ensemble [26] 81.30 18.7

Heterogeneous classifier [27] 89.20 10.8

Stacked ensemble [28] 89.10 10.9

RBFNN-based ADBBO [29] 88.65 11.35

LWL-based bagging ensemble [30] 90.10 9.9

Proposed ensemble ml fusion approach 92.08 7.92

5Conclusion

Software testing is considered an expensive activity of SDLC, which aims to ensure the high

quality of the end product by removing software bugs. Anticipating software faults before the testing

phase can assist the quality assurance team in directing their attention towards potentially defective

software modules during the testing process instead of having to scrutinize every module. This process

would limit the cost of the testing process, which would ultimately reduce the overall development

cost without compromising on software quality. The current study aimed to develop a system that

can predict faulty software modules before the testing stage by utilizing data fusion, feature selection,

and decision-level ensemble machine-learning fusion techniques. A novel FEFS technique is proposed

to select optimum features from the input dataset. The proposed system used NB, SVM, and DT

for initial predictions, followed by the development of ensemble models using Bagging, Voting,

and Stacking. The predictions from ensemble models are then given to the decision-level fusion

phase, which works on a fuzzy logic-based technique for the final prediction. The decision-level

fusion integrated the predictive accuracy of ensemble models by if-then rules-based fuzzy logic. Five

clean datasets are fused from NASA’s software repository to implement the proposed system. After

comparing the performance of the proposed ISDPS with other defect prediction techniques published

in the literature, it was found that the ISDPS outperformed all other methods on the fused dataset.

CMC, 2023, vol.75, no.3 6099

The proposed system achieved an accuracy of 92.08% on the fused data, indicating the effectiveness of

the novel FEFS and decision-level ensemble machine-learning fusion techniques. For future work, it is

suggested that hybrid filter-wrapper feature selection and deep extreme machine-learning techniques

should be incorporated into the proposedsystem. Moreover proposed design should also be optimized

for cross-project defect prediction problems.

Acknowledgement: Thanks to our families & colleagues who supported us morally.

Funding Statement: This work was supported by the Center for Cyber-Physical Systems, Khalifa

University, under Grant 8474000137-RC1-C2PS-T5.

Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the

present study.

References

[1] F. Matloob, S. Aftab, M. Ahmad, M. A. Khan, A. Fatima et al., “Software defect prediction using

supervised machine learning techniques: A systematic literature review,” Intelligent Automation & Soft

Computing, vol. 29, no. 2, pp. 403–421, 2021.

[2] D. R. Ibrahim, R. Ghnemat and A. Hudaib, “Software defect prediction using feature selection and random

forest algorithm,” in Int. Conf. on New Trends in Computer Science, Amman, Jordan, pp. 252–257, 2017.

[3] F. Matloob, T. M. Ghazal, N. Taleb, S. Aftab, M. Ahmad et al., “Software defect prediction using ensemble

learning: A systematic literature review,” IEEE Access, vol. 10, pp. 13123–13143, 2022.

[4] A. Boucher and M. Badri, “Software metrics thresholds calculation techniques to predict fault-proneness:

An empirical comparison,” Information and Software Technology, vol. 96, pp. 38–67, 2018.

[5] L. Chen, B. Fang, Z. Shang and Y. Tang, “Tackling class overlap and imbalance problems in software defect

prediction,” Software Quality Journal, vol. 26, no. 1, pp. 97–125, 2018.

[6] S. Goyal and P. K. Bhatia, “Empirical software measurements with machine learning,” in Computational

Intelligence Techniques and Their Applications to Software Engineering Problems, Boca Raton: CRC Press,

pp. 49–64, 2020.

[7] S.Huda,K.Liu,M.Abdelrazek,A.Ibrahim,S.Alyahyaet al., “An ensemble oversampling model for class

imbalance problem in software defect prediction,” IEEE Access, vol. 6, pp. 24184–24195, 2018.

[8] H. K. Lee and S. B. Kim, “An overlap-sensitive margin classifier for imbalanced and overlapping data,”

Expert Systems with Applications, vol. 98, pp. 72–83, 2018.

[9] D. L. Miholca, G. Czibula and I. G. Czibula, “A novel approach for software defect prediction through

hybridizing gradual relational association rules with artificial neural networks,” Information Sciences,vol.

441, pp. 152–170, 2018.

[10] R. Özakıncı and A. Tarhan,. “Early software defect prediction: A systematic map and review,” Journal of

Systems and Software, vol. 144, pp. 216–239, 2018.

[11] S. S. Rathore and S. Kumar, “Towards an ensemble based system for predicting the number of software

faults,” Expert Systems with Applications, vol. 82, pp. 357–382, 2017.

[12] S. S. Rathore and S. Kumar, “A study on software fault prediction techniques,” Artificial Intelligence

Review, vol. 51, no. 2, pp. 255–327, 2019.

[13] L. H. Son, N. Pritam, M. Khari, R. Kumar, P. T. M. Phuong et al., “Empirical study of software defect

prediction: A systematic mapping,” Symmetry, vol. 11, no. 2, pp. 2–28, 2019.

[14] F. Matloob, S. Aftab and A. Iqbal, “A framework for software defect prediction using feature selection and

en-semble learning techniques,” International Journal of Modern Education and Computer Science, vol. 11,

no. 12, pp. 14–20, 2019.

6100 CMC, 2023, vol.75, no.3

[15] A. Iqbal and S. Aftab, “A classification framework for software defect prediction using multi-filter feature

selection technique and mlp,” International Journal of Modern Education and Computer Science, vol. 12,

no. 1, pp. 18–25, 2020.

[16] A. Iqbal and S. Aftab, “Prediction of defect prone software modules using mlp based ensemble techniques,”

International Journal of Information Technology and Computer Science, vol. 12, no. 3, pp. 26–31, 2020.

[17] M. S. Daoud, S. Aftab, M. Ahmad, M. A. Khan, A. Iqbal et al., “Machine learning empowered software

defect prediction system,” Intelligent Automation & Soft Computing, vol. 31, no. 32, pp. 1287–1300, 2022.

[18] U. Ali, S. Aftab, A. Iqbal, Z. Nawaz, M. S. Bashir et al., “Software defect prediction using variant based

ensemble learning and feature selection techniques,”International Journal of Modern Education & Computer

Science, vol. 12, no. 5, pp. 29–40, 2020.

[19] A. Iqbal, S. Aftab, U. Ali, Z. Nawaz, L. Sana et al., “Performance analysis of machine learning techniques

on software defect prediction using nasa datasets,” International Journal of Advanced Computer Science and

Applications, vol. 10, no. 5, pp. 300–308, 2019.

[20] R. Mahajan, S. K. Gupta and R. K. Bedi, “Design of software fault prediction model using br technique,”

Procedia Computer Science, vol. 46, pp. 849–858, 2015.

[21] M. Shepperd, Q. Song, Z. Sun and C. Mair, “Data quality: Some comments on the nasa software defect

datasets,” IEEE Transactions on Software Engineering, vol. 39, no. 9, pp. 1208–1215, 2013.

[22] Y. J. Cruz, M. Rivas, R. Quiza, A. Villalonga, R. E. Haber et al., “Ensemble of convolutional neural

networks based on an evolutionary algorithm applied to an industrial welding process,” Computers in

Industry, vol. 133, pp. 103530–103538, 2021.

[23] M. Shahhosseini, G. Hu and H. Pham, “Optimizing ensemble weights and hyperparameters of machine

learning models for regression problems,” Machine Learning with Applications, vol. 7, pp. 100251–100260,

2022.

[24] A. U. Rahman, S. Abbas, M. Gollapalli, R. Ahmed, S. Aftab et al., “Rainfall prediction system using

machine learning fusion for smart cities,” Sensors, vol. 22, no. 9, pp. 3504–3519, 2022.

[25] S. Goyal and P. K. Bhatia, “Comparison of machine learning techniques for software quality prediction,”

International Journal of Knowledge and Systems Science, vol. 11, no. 2, pp. 20–40, 2020.

[26] A. O. Balogun, F. B. L. Balogun, H. A. Mojeed, V. E. Adeyemo, O. N. Akande et al., “Smote-based

homogeneous ensemble methods for software defect prediction,” in 22nd Int. Conf. on Computational

Science and its Applications, Cagliari, Italy, pp. 615–631, 2022.

[27] T. T. Khuat and M. H. Le, “Evaluation of sampling-based ensembles of classifiers on imbalanced data for

software defect prediction problems,” SN Computer Science, vol. 1, no. 2, pp. 1–16, 2020.

[28] S. Goyal and P. K. Bhatia, “Heterogeneous stacked ensemble classifier for software defect prediction,”

Multimedia Tools and Applications, vol. 81, pp. 37033–37055, 2022.

[29] P. Kumudha and R. Venkatesan, “Cost-sensitive radial basis function neural network classifier for software

defect prediction,” the Scientific World Journal, vol. 2016, pp. 1–20, 2016.

[30] A. S. Abdou and N. R. Darwish, “Early prediction of software defect using ensemble learning: A

comparative study,” International Journal of Computer Applications, vol. 179, no. 46, pp. 29–40, 2018.

Content uploaded by Muhammad Adnan Khan

Content may be subject to copyright.

Software Defect Prediction Using an Intelligent Ensemble-Based Model

Article

Full-text available

Jan 2024

Software defect prediction plays a crucial role in enhancing software quality while achieving cost savings in testing. Its primary objective is to identify and send only defective modules to the testing stage. This research introduces an intelligent ensemble-based software defect prediction model that combines diverse classifiers. The proposed model employs a two-stage prediction process to detect defective modules. In the first stage, four supervised machine learning algorithms are employed: Random Forest, Support Vector Machine, Naïve Bayes, and Artificial Neural Network. These algorithms are optimized through iterative parameter optimization to achieve the highest accuracy possible. In the second stage, the predictive accuracy of the individual classifiers is integrated into a voting ensemble to make the final predictions. This ensemble approach further improves the accuracy and reliability of the defect predictions. Seven historical defect datasets from the NASA MDP repository, namely CM1, JM1, MC2, MW1, PC1, PC3, and PC4, were utilized to implement and evaluate the proposed defect prediction system. The results demonstrate that each dataset’s proposed intelligent system achieved remarkable accuracy, outperforming twenty state-of-the-art defect prediction techniques, including base classifiers and ensemble methods.

A Transfer Learning Based Framework for Diabetic Retinopathy Detection Using Data Fusion

Conference Paper

Feb 2024

Diabetic retinopathy is a condition associated with diabetes that damages the blood vessels within the retina resulting in vision impairment or even blindness. Early detection and classification of DR enables timely intervention, which is crucial for preventing vision loss and blindness in diabetic patients. This research presents a framework for binary classification, employing transfer learning to identify diabetic retinopathy in individuals with diabetes. APTOS19 and IDRiD, two datasets containing fundus images, are merged together for training the transfer learning models to predict the presence or absence of the disease. Many preprocessing techniques have been applied to these images like resizing, Gaussian filtering, and dataset splitting. After the split, training set is augmented using zooming, rotation, flipping etc. to increase diversity. The transfer learning models used are: ResNet50 and DenseNet121. These models are fine tuned for classification. The results highlight that the DenseNet121 model achieved a superior test accuracy of 97.22% as compared to ResNet50.

A Classification Framework for Diabetic Retinopathy Detection Using Transfer Learning

Conference Paper

Feb 2024

Diabetic Retinopathy is a serious eye condition resulting from long-term diabetes mellitus that can lead to permanent blindness if not treated on time. Early detection can reduce the likelihood of serious disability. This research introduces a binary classification framework using Transfer Learning to identify diabetic retinopathy in diabetic patients. The research makes use of an image-based dataset, APTOS 2019, sourced from Kaggle. These images are used to train transfer learning models for predicting the presence or absence of this disease among patients. Preprocessing steps including resizing, Gaussian filtering and dataset split are employed prior to classification. Afterwards, data augmentation techniques like rotation, zooming, flipping, shifting and rescaling are applied to increase the training set. For classification, two Transfer Learning models, EfficientNetB3 and VGG16, are used after fine-tuning. The results indicate that EfficientNetB3 model outperformed VGG16 model due to its computationally efficient architecture, achieving a test accuracy of 97.82%.

Cross Project Software Defect Prediction Using Machine Learning: A Review

Article

Full-text available

Oct 2023

Software defect prediction is a crucial area of study focused on enhancing software quality and cutting down on software upkeep expenses. Cross Project Defect Prediction (CPDP) is a method meant to use information from different source projects to spot software issues in a specific project. CPDP comes in handy when the project being analyzed lacks enough or any data about defects for creating a dependable defect prediction model. Machine learning that is a part of artificial intelligence learns from data and then makes forecasts or choices. Machine learning (ML) is a key component of CPDP because it can learn from heterogeneous and imbalanced data sources. However, there are many challenges and open issues in applying machine learning to CPDP, such as data selection, feature extraction, model selection, evaluation metrics, and transfer learning. In this study, we provide a complete review of existing literature from 2018 to 2023 on Defect Prediction using Machine Learning, covering the main methods, applications, and limitations. We also use ML to identify current research gaps and future directions for CPDP. This paper will serve as a useful reference for researchers interested in using ML for CPDP.

Data Fusion Based Ensemble Transfer Learning Approach to Detect Diabetic Retinopathy

Conference Paper

Feb 2024

Diabetic retinopathy is an eye disease damaging the blood vessels of retina as a result of long term diabetes. In this research, an ensemble classification system is proposed by combining two transfer learning models to identify diabetic retinopathy among diabetic patients. A fusion of two datasets containing fundus images is used in this study: 1) APTOS, which contains 3662 images and 2) IDRiD, which contains 516 images. For binary classification, the dataset is divided into two classes: DR and No DR. Preprocessing steps are applied on the dataset such as resizing the images to 150x150x3, applying Gaussian filtering, balancing minority classes using SMOTE and splitting the dataset to 80:20 ratios. Data augmentation techniques like zooming, rotation etc. are also used to augment the images. Two transfer learning models, Xception and EfficientNetB3, are used for classification after fine-tuning. An ensemble of these models is built which achieved the highest test accuracy of 97.47% outperforming individual models.

ROLE OF FEATURE SELECTION IN CROSS PROJECT SOFTWARE DEFECT PREDICTION- A REVIEW

Article

Full-text available

Jan 2024

Muhammad SALMAN Saeed

Software Defect Prediction (SDP) is crucial for enhancing software quality and minimizing issues after release. The advent of machine learning, particularly in Cross-Project Defect Prediction (CPDP), has garnered significant attention for its potential to enhance defect predictions in one project by leveraging information from another. A critical factor influencing CPDP effectiveness is feature selection, the process of identifying the most relevant features from an available set. This review article thoroughly examines the role of feature selection in CPDP. Existing feature selection methods are systematically analyzed and classified within the CPDP context, encompassing both traditional and state-of-the-art approaches. The review delves into the challenges and opportunities presented by diverse project characteristics, data heterogeneity, and the curse of dimensionality. Additionally, the article underscores how feature selection impacts model performance, generalization, and adaptability across various software projects. Through synthesizing findings from multiple studies, trends, best practices, and potential research directions in the field are identified. In conclusion, this review article provides valuable insights into the significance of feature selection for enhancing the reliability and efficiency of CPDP models.

Rainfall Prediction System Using Machine Learning Fusion for Smart Cities

Article

Full-text available

May 2022
SENSORS-BASEL

Precipitation in any form-such as rain, snow, and hail-can affect day-today outdoor activities. Rainfall prediction is one of the challenging tasks in weather forecasting process. Accurate rainfall prediction is now more difficult than before due to the extreme climate variations. Machine learning techniques can predict rainfall by extracting hidden patterns from historical weather data. Selection of an appropriate classification technique for prediction is a difficult job. This research proposes a novel real-time rainfall prediction system for smart cities using a machine learning fusion technique. The proposed framework uses four widely used supervised machine learning techniques , i.e., decision tree, Naïve Bayes, K-nearest neighbors, and support vector machines. For effective prediction of rainfall, the technique of fuzzy logic is incorporated in the framework to integrate the predictive accuracies of the machine learning techniques, also known as fusion. For prediction , 12 years of historical weather data (2005 to 2017) for the city of Lahore is considered. Pre-processing tasks such as cleaning and normalization were performed on the dataset before the classification process. The results reflect that the proposed machine learning fusion-based framework outperforms other models.

Machine Learning Empowered Software Defect Prediction System

Article

Full-text available

Sep 2021

Production of high-quality software at lower cost has always been the main concern of developers. However, due to exponential increases in size and complexity, the development of qualitative software with lower costs is almost impossible. This issue can be resolved by identifying defects at the early stages of the development lifecycle. As a significant amount of resources are consumed in testing activities, if only those software modules are shortlisted for testing that is identified as defective, then the overall cost of development can be reduced with the assurance of high quality. An artificial neural network is considered as one of the extensively used machine-learning techniques for predicting defect-prone software modules. In this paper, a cloud-based framework for real-time software-defect prediction is presented. In the proposed framework, empirical analysis is performed to compare the performance of four training algorithms of the back-propagation technique on software-defect prediction: Bayesian regularization (BR), Scaled Conjugate Gradient, Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton, and Levenberg-Marquardt algorithms. The proposed framework also includes a fuzzy layer to identify the best training function based on performance. Publicly available cleaned versions of NASA datasets are used in this study. Various measures are used for performance evaluation including specificity, precision , recall, F-measure, an area under the receiver operating characteristic curve, accuracy, R 2 , and mean-square error. Two graphical user interface tools are developed in MatLab software to implement the proposed framework. The first tool is developed for comparing training functions as well as for extracting the results; the second tool is developed for the selection of the best training function using fuzzy logic. A BR training algorithm is selected by the fuzzy layer as it This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Heterogeneous stacked ensemble classifier for software defect prediction

Article

Full-text available

Sep 2021
MULTIMED TOOLS APPL

Software defect prediction (SDP) plays an important role to ensure that software meets quality standards; by highlighting the modules which are prone to errors and hence allows to focus the test efforts on them. Class imbalance nature of the defect dataset hinders the defect predictors to correctly classify the buggy modules. Here, we introduce a novel heterogenous ensemble classifier built with stacking methodology to overcome this problem of imbalanced datasets and hence, significant improvement in the prediction power is being proposed. Stacked ensemble is achieved with the best known classifiers from SDP literature as base classifiers (artificial neural network, nearest neighbor, tree based classifier, Bayesian classifier and support vector machines). For experimental work, five public datasets from NASA corpus are used. A comparative analysis for the proposed heterogenous stacking based ensemble method is made with the base classifiers and with the state-of-the art ensemble based SDP models over the evaluation criteria of ROC, AUC and accuracy. It is found that the proposed heterogenous stacking based ensemble classifier outperforms the base classifiers by 12% in terms of AUC score and by 8% in terms of Accuracy. It improves the performance of state-of-the-art ensemble methods by 4% in terms of AUC score and by 9% in terms of Accuracy. It can be concluded from the comparative analysis that the proposed SDP classifier is best performer among the candidate SDP classifiers statistically.

Ensemble of convolutional neural networks based on an evolutionary algorithm applied to an industrial welding process

Article

Full-text available

Aug 2021
COMPUT IND

This paper presents an approach for image classification based on an ensemble of convolutional neural networks and the application to a real case study of an industrial welding process. The ensemble consists of five convolutional neural networks, whose outputs are combined through a voting policy. In order to select appropriate network parameters (i.e., the number of convolutional layers and layers hyperparameters) and voting policy, an efficient search process was carried out by using an evolutionary algorithm. The proposed method is applied and validated in a case study focused on detecting misalignment of metal sheets to be joined through submerged arc welding process. After selecting the most convenient setup, the ensemble outperforms other seven strategies considered in a comparison in several metrics, while maintaining an adequate computational cost.

Software Defect Prediction Using Ensemble Learning: A Systematic Literature Review

Article

Full-text available

Jul 2021

Recent advances in the domain of software defect prediction (SDP) include the integration of multiple classification techniques to create an ensemble or hybrid approach. This technique was introduced to improve the prediction performance by overcoming the limitations of any single classification technique. This research provides a systematic literature review on the use of the ensemble learning approach for software defect prediction. The review is conducted after critically analyzing research papers published since 2012 in four well-known online libraries: ACM, IEEE, Springer Link, and Science Direct. In this study, five research questions that cover the different aspects of research progress on the use of ensemble learning for software defect prediction are addressed. To extract the answers to identified questions, 46 most relevant papers are shortlisted after a thorough systematic research process. This study will provide compact information regarding the latest trends and advances in ensemble learning for software defect prediction and provide a baseline for future innovations and further reviews. Through our study, we discovered that frequently employed ensemble methods by researchers are the random forest, boosting, and bagging. Less frequently employed methods include stacking, voting and Extra Trees. Researchers proposed many promising frameworks, such as EMKCA, SMOTE-Ensemble, MKEL, SDAEsTSE, TLEL, and LRCR, using ensemble learning methods. The AUC, accuracy, F-measure, Recall, Precision, and MCC were mostly utilized to measure the prediction performance of models. WEKA was widely adopted as a platform for machine learning. Many researchers showed through empirical analysis that feature selection and data sampling were important pre-processing steps that improve the performance of ensemble classifiers.

Software Defect Prediction Using Supervised Machine Learning Techniques: A Systematic Literature Review

Article

Full-text available

Jun 2021

Software defect prediction (SDP) is the process of detecting defect-prone software modules before the testing stage. The testing stage in the software development life cycle is expensive and consumes the most resources of all the stages. SDP can minimize the cost of the testing stage, which can ultimately lead to the development of higher-quality software at a lower cost. With this approach, only those modules classified as defective are tested. Over the past two decades, many researchers have proposed methods and frameworks to improve the performance of the SDP process. The main research topics are association, estimation, clustering , classification, and dataset analysis. This study provides a systematic literature review that highlights the latest research trends in the area of SDP by providing a critical review of papers published between 2016 and 2019. Initially, 1012 papers were shortlisted from three online libraries (IEEE Xplore, ACM, and ScienceDirect); following a systematic research protocol, 22 of these papers were selected for detailed critical review. This review will serve researchers by providing the most current picture of the published work on software defect classification.

Software Defect Prediction Using Variant based Ensemble Learning and Feature Selection Techniques

Article

Full-text available

Oct 2020

Testing is considered as one of the expensive activities in software development process. Fixing the defects during testing process can increase the cost as well as the completion time of the project. Cost of testing process can be reduced by identifying the defective modules during the development (before testing) stage. This process is known as "Software Defect Prediction", which has been widely focused by many researchers in the last two decades. This research proposes a classification framework for the prediction of defective modules using variant based ensemble learning and feature selection techniques. Variant selection activity identifies the best optimized versions of classification techniques so that their ensemble can achieve high performance whereas feature selection is performed to get rid of such features which do not participate in classification and become the cause of lower performance. The proposed framework is implemented on four cleaned NASA datasets from MDP repository and evaluated by using three performance measures,

SMOTE-Based Homogeneous Ensemble Methods for Software Defect Prediction

Chapter

Full-text available

Sep 2020

Class imbalance is a prevalent problem in machine learning which affects the prediction performance of classification algorithms. Software Defect Prediction (SDP) is no exception to this latent problem. Solutions such as data sampling and ensemble methods have been proposed to address the class imbalance problem in SDP. This study proposes a combination of Synthetic Minority Oversampling Technique (SMOTE) and homogeneous ensemble (Bagging and Boosting) methods for predicting software defects. The proposed approach was implemented using Decision Tree (DT) and Bayesian Network (BN) as base classifiers on defects datasets acquired from NASA software corpus. The experimental results showed that the proposed approach outper-formed other experimental methods. High accuracy of 86.8% and area under operating receiver characteristics curve value of 0.93% achieved by the proposed technique affirmed its ability to differentiate between the defective and non-defective labels without bias.

Optimizing ensemble weights and hyperparameters of machine learning models for regression problems

Article

Jan 2022

Aggregating multiple learners through an ensemble of models aim to make better predictions by capturing the underlying distribution of the data more accurately. Different ensembling methods, such as bagging, boosting, and stacking/blending, have been studied and adopted extensively in research and practice. While bagging and boosting focus more on reducing variance and bias, respectively, stacking approaches target both by finding the optimal way to combine base learners. In stacking with the weighted average, ensembles are created from weighted averages of multiple base learners. It is known that tuning hyperparameters of each base learner inside the ensemble weight optimization process can produce better performing ensembles. To this end, an optimization-based nested algorithm that considers tuning hyperparameters as well as finding the optimal weights to combine ensembles (Generalized Weighted Ensemble with Internally Tuned Hyperparameters (GEM-ITH)) is designed. Besides, Bayesian search was used to speed-up the optimizing process and a heuristic was implemented to generate diverse and well-performing base learners. The algorithm is shown to be generalizable to real data sets through analyses with ten publicly available data sets.

Empirical Software Measurements with Machine Learning

Chapter

Jan 2021

Measurement of the attributes of software processes, products, projects and people associated with the software development is necessary so that the industry can deliver quality product, that is, high-quality software within the limits of time 50and cost. It is evident that accurate software measurements using empirical techniques are essential. As per the Chaos Report (Chaos Report 2015), only 23% of total projects get the status of “successful project completion.” The reason for this poor successful completion rate is the inaccurate measurement of attributes of software quality and quantity (Demarco 1982). Empirical techniques are essential for accurate measurements in the field of software engineering. We need to evaluate, assess, predict, monitor and control the various aspects of software development. For successful project completion, the quantitative methods need to be followed. This chapter discusses the empirical approach for software measurements using machine learning (ML) techniques. A majority of research work has already been done in this field; ML has found software measurements a very fertile ground. Both dimensions including software quality and quantity can easily be measured empirically using ML techniques. Software quantity measurement is analogous to effort estimation, cost estimation, schedule prediction and several software measurement tasks, which can be modeled as regression-based tasks. Software quality measurements is analogous to defect prediction, quality prediction, prediction of faulty modules and other such problems, which can be formulated as classification tasks in the world of ML. In this way, software quantity and quality measurements together can be formulated as supervised ML-based problems. This is the base point which is being utilized in this research field for measuring software using ML techniques empirically. Since the 1980s, this field is resonating with software researchers, which is quite fascinating. This chapter demonstrates the usage of ML techniques for both software quality and quantity measurements. With a basic introduction to the current trends of the field and moving through problem definition, we will reach the experimental set-up and then draw inferences from the experiments. This chapter aims to provide the reader practical and applicable knowledge of ML and deep learning for empirical software measurements.

Data and Ensemble Machine Learning Fusion Based Intelligent Software Defect Prediction System

Abstract and Figures

Recommended publications

A Cloud-Based Software Defect Prediction System Using Data and Decision-Level Machine Learning Fusio...

Heterogeneous stacked ensemble classifier for software defect prediction

Predicting the Defects using Stacked Ensemble Learner with Filtered Dataset

Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction