ArticlePDF Available

Abstract and Figures

The software engineering field has long focused on creating high-quality software despite limited resources. Detecting defects before the testing stage of software development can enable quality assurance engineers to concentrate on problematic modules rather than all the modules. This approach can enhance the quality of the final product while lowering development costs. Identifying defective modules early on can allow for early corrections and ensure the timely delivery of a high-quality product that satisfies customers and instills greater confidence in the development team. This process is known as software defect prediction, and it can improve end-product quality while reducing the cost of testing and maintenance. This study proposes a software defect prediction system that utilizes data fusion, feature selection, and ensemble machine learning fusion techniques. A novel filter-based metric selection technique is proposed in the framework to select the optimum features. A three-step nested approach is presented for predicting defective modules to achieve high accuracy. In the first step, three supervised machine learning techniques, including Decision Tree, Support Vector Machines, and Naïve Bayes, are used to detect faulty modules. The second step involves integrating the predictive accuracy of these classification techniques through three ensemble machine-learning methods: Bagging, Voting, and Stacking. Finally, in the third step, a fuzzy logic technique is employed to integrate the predictive accuracy of the ensemble machine learning techniques. The experiments are performed on a fused software defect dataset to ensure that the developed fused ensemble model can perform effectively on diverse datasets. Five NASA datasets are integrated to create the fused dataset: MW1, 6084 CMC, 2023, vol.75, no.3 PC1, PC3, PC4, and CM1. According to the results, the proposed system exhibited superior performance to other advanced techniques for predicting software defects, achieving a remarkable accuracy rate of 92.08%.
This content is subject to copyright. Terms and conditions apply.
This work is licensed under a Creative Commons Attribution 4.0 International License,
which permits unrestricted use, distribution, and reproduction in any medium, provided
the original work is properly cited.
ech
T
PressScience
DOI: 10.32604/cmc.2023.037933
Article
Data and Ensemble Machine Learning Fusion Based Intelligent Software
Defect Prediction System
Sagheer Abbas1, Shabib Aftab1,2, Muhammad Adnan Khan3,4, Taher M. Ghazal5,6,
Hussam Al Hamadi7and Chan Yeob Yeun8,*
1School of Computer Science, National College of Business Administration & Economics, Lahore, 54000, Pakistan
2Department of Computer Science, Virtual University of Pakistan, Lahore, 54000, Pakistan
3Department of Software, Faculty of Artificial Intelligence and Software, Gachon University, Seongnam,
13120, Korea
4Riphah School of Computing & Innovation, Faculty of Computing, Riphah International University, Lahore Campus,
Lahore, 54000, Pakistan
5School of Information Technology, Skyline University College, University City Sharjah, Sharjah, UAE
6Center for Cyber Security, Faculty of Information Science and Technology, UKM, Bangi, Selangor, 43600, Malaysia
7College of Engineering and IT, University of Dubai, 14143, UAE
8EECS Department, Center for Cyber Physical Systems, Khalifa University, Abu Dhabi, 127788, UAE
*Corresponding Author: Chan Yeob Yeun. Email: chan.yeun@ku.ac.ae
Received: 22 November 2022; Accepted: 16 March 2023
Abstract: The software engineering field has long focused on creating high-
quality software despite limited resources. Detecting defects before the testing
stage of software development can enable quality assurance engineers to con-
centrate on problematic modules rather than all the modules. This approach
can enhance the quality of the final product while lowering development costs.
Identifying defective modules early on can allow for early corrections and
ensure the timely delivery of a high-quality product that satisfies customers
and instills greater confidence in the development team. This process is
known as software defect prediction, and it can improve end-product quality
while reducing the cost of testing and maintenance. This study proposes a
software defect prediction system that utilizes data fusion, feature selection,
and ensemble machine learning fusion techniques. A novel filter-based metric
selection technique is proposed in the framework to select the optimum
features. A three-step nested approach is presented for predicting defective
modules to achieve high accuracy. In the first step, three supervised machine
learning techniques, including Decision Tree, Support Vector Machines, and
Naïve Bayes, are used to detect faulty modules. The second step involves
integrating the predictive accuracy of these classification techniques through
three ensemble machine-learning methods: Bagging, Voting, and Stacking.
Finally, in the third step, a fuzzy logic technique is employed to integrate
the predictive accuracy of the ensemble machine learning techniques. The
experiments are performed on a fused software defect dataset to ensure
that the developed fused ensemble model can perform effectively on diverse
datasets. Five NASA datasets are integrated to create the fused dataset: MW1,
6084 CMC, 2023, vol.75, no.3
PC1, PC3, PC4, and CM1. According to the results, the proposed system
exhibited superior performance to other advanced techniques for predicting
software defects, achieving a remarkable accuracy rate of 92.08%.
Keywords: Ensemble machine learning fusion; software defect prediction;
fuzzy logic
1Introduction
Most of the researchers from the software engineering domain have been working to minimize the
cost of the Software Development Life Cycle (SDLC) without compromising on the quality [1,2]. The
activity of software testing aims to ensure the high quality of the end product [35]. A minor defect
in software can lead to system failure and a catastrophic event in the case of a critical system [13].
The importance of identifying and removing the defects can be reflected by an example of “NASA’s
$125 million Mars Climate Orbiter”, which was lost because of a minor data conversion defect [1].
Software defects can be of different types, including wrong program statements, syntax errors, and
design or specification errors [1,2]. In SDLC, the testing process plays a crucial role in achieving
a high-quality end product by eliminating defects [6,7]. However, it has been proved that software
testing is the most expensive activity as it takes most of the resources as compared to other tasks
of SDLC [3,810]. Identifying and fixing the defects before testing would be less costly compared
to the cost of repairing the defects at later stages, especially after integration [11,12]. This objective
can be achieved by incorporating an effective Software Defect Prediction (SDP) system [13,14], which
can identify faulty software modules before the testing stage, allowing for focused testing efforts on
those particular modules [1]. This approach can guarantee the delivery of high-quality end products
with limited resources [3]. Many supervised machine learning-based defect prediction techniques and
frameworks have been proposed by researchers for the effective and efficient detection of defect-prone
software modules [1,2]. In the supervised machine learning technique, a classifier is trained by using a
pre-labeled dataset. The dataset which is used to train the classifier includes multiple independent
features and at least one dependent feature. The dependent feature is known as the output class,
which is classified or predicted by exploring the hidden pattern and relationship between independent
attributes and dependent attributes. That hidden pattern and relationship are learned by the supervised
classifier, which is further used for prediction on the unseen dataset (testing data) [68]. The SDP
dataset typically pertains to a specific software component, with independent attributes represented
by software quality metrics collected during development. The dependent feature, on the other hand,
is the predictable class which reflects whether the particular module is defective or not. Instances
in the SDP dataset reflect the modules, and by classifying a specific instance as defective or non-
defective, we are predicting a particular module that is reflected by the instance. This study presents
the Intelligent Software Defect Prediction System (ISDPS), which utilizes data and decision-level
ensemble machine learning fusion, along with a novel filter-based ensemble feature selection technique
to improve accuracy and reduce costs. The ISDPS follows a three-step process for software defect
prediction, beginning with the use of three supervised classification techniques—Decision Tree (DT),
Support Vector Machines (SVM), and Naïve Bayes (NB)—to build classification models for SDP.
The second step employs ensemble techniques such as Bagging, Voting, and Stacking to merge the
predictive accuracy of the classification models. The third step uses fuzzy logic to fuse the predictive
CMC, 2023, vol.75, no.3 6085
accuracy of the ensemble models. The proposed system was evaluated by combining five datasets
from NASA’s repository and achieved a high accuracy rate of 92.08%, outperforming other published
techniques.
2Related Work
Researchers have presented various models and frameworks to detect faulty software modules
before the testing stage. Several studies have been conducted on this topic, and some are discussed here.
In [14], researchers applied a metric selection algorithm and ensemble machine learning approach to
detect defective software modules. The research was conducted on six different software defect datasets
taken from NASA’s software repository. The performance of the proposed method was evaluated using
statistical measures such as Receiver Operating Characteristic (ROC) value, Matthews’s Correlation
Coefficient (MCC), F-measure, and Accuracy. The study compared varioussearch methods for metric
selection using ensemble machine learning. The results demonstrated that the proposed method
outperformed all other techniques, as evidenced by its superior performance compared to various
supervised classifiers. Researchers in [15] proposed an Artificial Neural Network (ANN) based system
for detecting faulty software modules, along with a metric selection algorithm that uses a multi-filter-
based approach. The proposed system uses oversampling to handle class imbalance and performs
classification in two dimensions, one with oversampling and one without. NASA’s cleaned software
defect datasets were used for implementation, and the system’s performance was evaluated using
various statistical metrics such as MCC, ROC, Accuracy, and F-measure. The study compared the
proposed SDP technique with other methods, and the results showed that the proposed technique
outperformed other techniques in terms of accuracy and other statistical measures. In their study
[16], researchers introduced a framework for predicting faulty software modules based on Multi-
Layer Perceptron (MLP) with bagging and boosting as ensemble machine learning techniques. The
framework employs three approaches for predicting defective modules, including tuning the MLP for
classification, creating an ensemble of tuned MLP using bagging, and developing an ensemble of tuned
MLP using boosting. The study utilized four cleaned datasets from NASA’s repository to implement
the proposed framework and evaluated its performance using statistical metrics such as Accuracy,
MCC, F-measure, and ROC. The results indicated that the proposed technique outperformed various
classifiers from published research, as determined by the test of Scott Knott Effect Size Difference.
According to [17], researchers conducted a thorough comparative analysis of four commonly used
training techniques for back-propagation algorithms in ANN to predict defective software modules.
They also proposed a fuzzy logic-based layer to determine the most effective training technique. The
researchers utilized cleaned versions of NASA software defect datasets and assessed the performance
of the proposed framework using several metrics, such as Accuracy, Specificity, F-measure, Recall,
Precision, ROC, and Mean Square Error. The study compared the results with those of various
classifiers, and the proposed framework outperformed other methods. In [18], a machine learning-
based framework using a variant ensemble technique is presented for predicting faulty software
modules. The proposed framework also incorporates a metric selection method to optimize the
performance of the classification technique. The variant selection process involves identifying and
selecting the best version of the classification technique to achieve high performance. The ensemble
technique integrates the predictive accuracy of the optimized variants to further increase the accuracy
of the proposed framework. The proposed framework was tested using four cleaned software defect
datasets from NASA’s repository, and its performance was evaluated using three statistical measures:
MCC, Accuracy, and F-measure. The results of the framework were compared with various other
classifiers to assess its effectiveness. The analysis showed that the proposed framework outperformed
6086 CMC, 2023, vol.75, no.3
all other classifiers, indicating its superiority in predicting faulty software modules. The authors of
[19] performed a thorough comparative analysis of multiple supervised machine learning techniques
for software defect prediction. They used twelve cleaned software defect datasets from NASA and
evaluated the performance of the models using various statistical measures such as F-measure,
Precision, Accuracy, Recall, ROC, and MCC. The authors concluded that the results obtained from
their study could serve as a benchmark for future comparisons between different SDP techniques.
In [20], researchers designed an Artificial Neural Network (ANN) tool to detect defective software
modules. They compared different training functions of ANN and identified that the Bayesian
Regularization training function outperformed others. The objective of this study was to decrease
the cost of software development by detecting faulty modules before they reach the testing stage, thus
reducing the burden on the quality assurance team.
3Materials and Methods
The study introduces an intelligent system that employs data fusion to detect defective software
modules. The system integrates feature selection and decision-level ensemble machine learning fusion
techniques to improve the accuracy and efficiency of identifying faulty software modules. The ISDPS
proposed in this study can be viewed from two perspectives: external and internal. The exterior view,
as shown in Fig. 1, outlines the workflow surrounding the defect prediction system. The development
team initiates the workflow during the development stage of the software development life cycle
(SDLC). Understanding the surrounding scenario is crucial to comprehend the importance of the
proposed system. A software metric dataset is prepared during the process of software development.
The dataset consists of various quality metrics captured automatically or manually during develop-
ment. Every single dataset reflects a particular Software Component (SC) in which there are many
modules. Each module in a specific SC dataset is reflected by an instance, and the values of quality
attributes/features of that instance are populated during development. Each SC dataset consists of
various quality attributes (independent features) and one dependent feature (also known as output
class). Initially, the developed software components in SDLC are stored in Software Metric Dataset
Repository (SMDR). Each component has its Quality Metrics Dataset (QMD), which contains the
values of various quality attributes recorded during the development stage. The SMDR consists of two
further sub-repositories: The untested Software Components Repository (USCR) and Tested Software
Components Repository (TSCR). Initially, the developed components are not tested and are stored in
USCR. Some selected components from USCR are tested by the Quality Assurance (QA) team, and
an additional attribute is added to the QMD, known as “Result”. This column will reflect the nominal
value of “yes” if the particular module is defective and “no” if the module is non-defective. The tested
component, along with its QMD with result attribute, is stored in the TSCR sub-repository. When the
TSCR is initially populated with tested components, then the proposed ISDPS will come into work as
it will need the pre-labeled dataset for training.
Two or more datasets from TSCR are extracted into the training layer of the proposed defect
prediction system, where data fusion will be performed, and the prediction model is developed, which
will be stored in the cloud for later use. The testing layer of the proposed system will receive QMD as
input from USCR to perform real-time defect predictions. The “Result” attribute in QMD is populated
by the testing layer after prediction and then will be sent to the QA team. The QA team will add
the QMD to its related SC, and if the module is predicted as defective, then thorough testing of that
particular module will be performed, and the detail of identified defect will be sent to the development
team in SDLC. The development team will correct the defective module, and then it will be again
CMC, 2023, vol.75, no.3 6087
transferred to USCR for the next iteration. If the module is predicted as non-defective by the proposed
ISDPS, then it would be considered as good to go to the integration stage of SDLC.
Figure 1: External view of proposed ISDPS
The internal view (Fig. 2) of the proposed ISDPS contains two layers: training and testing. These
layers are comprised of several stages and activities. The workflow in the training layer initiates with
the extraction of pre-labeled QMDs from SMDR. Data fusion is the first stage of the training layer
in the proposed system, in which datasets of multiple software components will be extracted, and
instance-level fusion will be performed. The main objective of the data fusion process is to develop an
effective and efficient classification model, which can be used for prediction on diverse test datasets
from multiple sources. No doubt, training the model with higher accuracy on the fused dataset is
challenging but eventually fruitful for later use, especially when prediction has to be performed on
multiple datasets from multiple sources. However, it should be ensured that the nature of the test data
would be the same as that of the training dataset. For this study, five cleaned datasets were chosen
from NASA’s repository, including MW1, PC1, PC3, PC4, and CM1. The details of the used datasets
and their attributes are available in [21]. After extraction, all five datasets are fused. The fused dataset
consists of 3579 instances and 38 attributes. Of 38 attributes, 37 attributes are independent, whereas
one attribute named “Defective” is dependent, which aims to determine whether a module is defective.
The dependent attribute, also known as the output class, can contain two values, “Yes” for defective
modules and “No” for non-defective modules.
Pre-processing is the second stage of the training layer, which will deal with four data pre-
processing activities, including 1) cleaning, 2) normalization, 3) splitting of data for training and
testing, and 4) feature selection for efficient and effective prediction. The cleaning process of the
pre-processing stage will be responsible for handling the missing values in the dataset. The missing
values will be replaced by using the technique of mean imputation. Missing values in any attribute of
the dataset can misguide the classification model, which may result in low accuracy of the proposed
framework. The second activity deals with the normalization process, which scales the values of all
independent attributes to a range of 0 to 1. It has been observed that cleaning and normalization
activities aid the classification techniques to work efficiently and effectively. The third activity will
deal with the splitting of the dataset into the groups of training and testing datasets with a ratio of
70:30 by using the split class rule. The fourth and final activity of pre-processing stage will deal with
6088 CMC, 2023, vol.75, no.3
the selection of optimum features [14,15] from training and test sets by using a novel feature selection
(FS) technique. A novel filter-based ensemble feature selection (FEFS) technique is proposed in which
feature selection is performed three times in a nested way. The proposed FEFS technique consists
of four steps. First, the complete feature set of training data is given as input to the Correlation-
based FS technique with Genetic Search (GS) method. In the second step, Consistency based FS
is performed on complete training data with Best First (BF) search method. In the third step, an
intersection is performed among both of the resultant feature sets from step 1 and step 2 (Correlation
FS and Consistency FS). Finally, in step four, the feature set generated from the intersection operation
from step 3 is given as input again to the correlation-based FS technique with the GS method, and
the resultant feature set is selected from training and test datasets. The detailed steps of the proposed
FEFS technique are given below (Table 1).
Figure 2: Internal view of proposed ISDPS
Table 1: Proposed FEFS technique
Input:
Training Dataset: Dataset A
Test Dataset: Dataset B
Attribute Evaluator: Correlation FS, Consistency FS
Search Methods: GS—BF
Output:
n=numbers of features
Steps:
1 Dataset A Correlation FS-GS Subset 1: a1, a2 ..., an;
2 Dataset A Consistency FS-BF Subset 2: b1, b2 ..., bn;
3 Subset 1 Intersection Subset 2 Subset 3: c1, c2 ...,cn;
4 Subset 3: Correlation FS-GS Subset 4: d1, d2, ... dn;
5 Select Subset 4 as Feature Set from Dataset A and Dataset B.
CMC, 2023, vol.75, no.3 6089
Classification is the third stage which deals with the development of classification models to
identify defective and non-defective modules. Selected features of pre-processed datasets (training and
testing) are used as input to the classification stage. The study employed three supervised machine
learning techniques, namely NB, SVM, and DT, for classification. These classifiers were fine-tuned
iteratively to achieve the highest possible accuracy on the testing dataset. During the tuning process
on training data, default parameters are used in NB as the performance decreases after optimization.
In SVM, the polynomial kernel is selected, and the value of the complexity parameter (C) is set to 1.
In DT, the confidence factor has been set to 0.3. The classification stage will end by producing the
optimized prediction models of the used supervised machine learning techniques.
Ensemble Modeling is the fourth stage in the training layer which deals with the development of
ensemble models by integrating the optimized classification classifiers (NB, SVM, and DT), which are
given as input to the ensemble modeling stage. The ensemble machine learning approaches can further
increase the prediction accuracy than individual optimized classification techniques [3,7,14,22,23].
Three ensemble techniques will be used for the development of ensemble models, including Bagging,
Voting, and Stacking. One by one, all of the optimized classification models are used as input to the
ensemble techniques for the development of ensemble models. Three ensemble models are developed
in the proposed system: Bagging with DT, Voting with NB, SVM, and DT, and Stacking with NB
and SVM along with DT as a Meta classifier. All three developed ensemble models have shown better
accuracy on test data than each of the optimized individual classifiers.
The fifth and final stage of the training layer deals with the fusion of ensemble machine-learning
techniques. This stage is responsible for decision-level fusion by integrating the predictive accuracy
of optimized ensemble models [24]. Fuzzy logic is used for decision-level fusion, where membership
functions are developed using if-then rules (as shown in Table 2). These rules form the basis of the
final prediction and enhance the accuracy of the defect prediction system. The fused ensemble model
is then stored in cloud storage for real-time predictions.
Table 2: Membership functions of proposed fusion method
Membership functions Graphical representation
( ) =max min 1, 0.5 bg
0.05 ,0

( ) =max min bg 0.45
0.05 ,1
,0

(Continued)
6090 CMC, 2023, vol.75, no.3
Table 2: Continued
Membership functions Graphical representation
VY( ) =max min 1, 0.5 vt
0.05 ,0

VN( ) =max min vt 0.45
0.05 ,1
,0

sky ( ) =max min 1, 0.5 sk
0.05 ,0

skn ( ) =max min sk 0.45
0.05 ,1
,0

ˇ
D( ) =max min 1, 0.5
0.05 ,0

ˇ
D( ) =max min 0.45
0.05 ,1
,0

If-Then conditions that are used to develop membership functions are given below:
IF (Bagging is yes and Voting is yes and Stacking is yes) THEN (Module is defective).
IF (Bagging is yes and Voting is yes and Stacking is no) THEN (Module is defective).
IF (Bagging is yes and Voting is no and Stacking is yes) THEN (Module is defective).
IF (Bagging is no and Voting is yes and Stacking is yes) THEN (Module is defective).
CMC, 2023, vol.75, no.3 6091
IF (Bagging is no and Voting is no and Stacking is also no) THEN (Module is not defective).
IF (Bagging is yes and Voting is no and Stacking is no) THEN (Module is not defective).
IF (Bagging is no and Voting is no and Stacking is yes) THEN (Module is not defective).
IF (Bagging is no and Voting is yes and Stacking is no) THEN (Module is not defective).
Fig. 3 depicts the ruled surface of the proposed fuzzy logic-based fusion method for final
prediction, in contrast to the bagging and voting ensemble techniques. In cases where both techniques
predict that the software module is not defective, the fused model will make the same prediction.
Likewise, if both techniques predict that the module is defective, the fused model will also predict that
the module is defective.
Figure 3: Rule surface of fused ensemble method with bagging and voting
Fig. 4 shows the graphical representation of fuzzy logic based if-then rule for the scenario; when
bagging and stacking, both techniques predict that the particular module is defective, whereas the
voting technique predicts the opposite (non-defective). In this case, the proposed technique would go
for a majority decision (defective).
Fig. 5 illustrates that if bagging and stacking both predict that the module is non-defective, the
proposed technique will also predict that the module is non-defective.
The testing layer is the implementation layer of the proposed ISDPS. In this layer, three activities
are performed. The first activity deals with the extraction of unlabeled QMD from USCR for the
prediction of defective software modules. The second activity involves extracting the fused model
from the cloud for prediction. The third activity of the testing layer deals with real-time prediction
in which unlabeled QMD is given as input to the fused model, which is then labeled after prediction.
The labeled QMD is attached to its related SC in TSCR and then sent back to the development life
cycle. If the software is predicted as defective, then the particular defect is rectified by the development
team; otherwise, the particular software component would be considered good to go for integration.
4Results and Discussion
An empirical analysis was conducted to assess the effectiveness of the proposed ISDPS using
a fused software defect dataset. The dataset was created by combining five of NASA’s cleaned
datasets. The fused dataset contains 3579 instances, with 428 defective and 3151 non-defective. In
6092 CMC, 2023, vol.75, no.3
Figure 4: Result of proposed fused ensemble method with defective module (1)
Figure 5: Result of proposed fused ensemble method with non-defective module (0)
the pre-processing stage of the training layer, the dataset underwent cleaning and normalization
processes, followed by the splitting process, where the dataset was divided into training and test subsets
with a 70:30 ratio. The training subset comprised 2506 instances, while the test subset contained
1073 instances. A novel FEFS technique is proposed for effective and efficient prediction, which
is implemented on the complete feature set of training data to select the optimum feature set. The
proposed method chose 7 out of 37 independent features. The detail of the full features of the fused
dataset is available at [21], whereas the feature set selected by the proposed FEFS technique is shown
in Table 3.
CMC, 2023, vol.75, no.3 6093
Table 3: Selected features using the proposed FEFS technique
No. Selected features
1 LOC_BLANK
2 LOC_CODE_AND_COMMENT
3 CYCLOMATIC_DENSITY
4 PARAMETER_COUNT
5 HALSTEAD_CONTENT
6 NUM_OPERATORS
7 PERCENT_COMMENTS
In the proposed ISDPS, prediction is performed in three steps. Initially, three supervised machine
learning techniques (NB, SVM, and DT) are iteratively optimized for classification in the first step until
the highest possible accuracy is attained for each model. The optimized classification models created
from these classifiers are given to the second step of prediction, where three ensemble techniques
(Bagging, Voting, and Stacking) are used to integrate the predictive accuracy of used classifiers.
The classifiers are integrated by ensemble methods with all possible combinations until we get three
ensemble classification models, one from each ensemble technique that performed higher than the base
classifier. The results of ensemble techniques are given as input to the final prediction step, which is
empowered by fuzzy logic.
The accuracy measures used for the performance analysis of the proposed ISDPS are discussed
below:
Misrate =(AOR1/EOR0+AOR0/EOR1)
EOR0+EOR1
(1)
Accuracy =(AOR0/EOR0+AOR1/EOR1)
EOR0+EOR1
(2)
Positive Prediction Value =AOR1/EOR1
(AOR1/EOR1+AOR0/EOR1)(3)
Negative Prediction Value =AOR0/EOR0
(AOR0/EOR0+AOR1/EOR0)(4)
Specificity =AOR0/EOR0
(AOR0/EOR0+AOR0/EOR1)(5)
Sensitivity =AOR1/EOR1
(AOR1/EOR0+AOR1/EOR1)(6)
False Positive Ratio =1Specificity (7)
False Positive Ratio =1Specificity (8)
Likelihood Ratio Positive =Sensitivity
(1Specificity)(9)
6094 CMC, 2023, vol.75, no.3
Likelihood Ratio Negative =(1Sensitivity)
Specificity (10)
The training data, which consists of 2506 instances, are used to train the classifiers and ensemble
models. During the NB training process, 2061 instances are correctly predicted as negative, whereas 80
instances are correctly predicted as positive. The output result and achieved results can be compared
in Table 4, which reflects that the training process achieved 85.43% accuracy and a 14.57% miss rate
in NB. In the process of testing, 865 instances are correctly predicted as negative, whereas 46 instances
are correctly predicted as positive. After comparing the output result and expected results (Table 4),
84.90% accuracy is achieved, with a miss rate of 15.10% in NB testing.
Table 4: NB results
Training data Testing data
Samples =2506 Output (AOR0,AOR
1)Samples=1073 Output (AOR0,AOR
1)
Input Expected Result
(EOR0,EOR
1)
AOR0
(Negative-0)
AOR1
(Positive-1)
Expected Result
(EOR0,EOR
1)
AOR0
(Negative)
AOR1
(Positive)
EOR0=2206
(Negative-0)
2061 145 EOR0=945
(Negative-0)
865 80
EOR1=300
(Positive-1)
220 80 EOR1=128
(Positive-0)
82 46
In the training process of SVM, 2135 instances are correctly predicted as negative, whereas 40
instances are correctly predicted as positive. In the training process with SVM, 86.79% accuracy is
achieved, along with a miss rate of 13.21% after analyzing the results in Table 5. Testing results show
that 905 instances are correctly predicted as negative, whereas 24 instances are correctly predicted as
positive. After analyzing the expected and output results, the achieved accuracy in SVM testing is
86.58%, with a miss rate of 13.42%.
Table 5: SVM results
Training data Testing data
Samples =2506 Output (AOR0,AOR
1)Samples=1073 Output (AOR0,AOR
1)
Input Expected Result
(EOR0,EOR
1)
AOR0
(Negative-0)
AOR1
(Positive-1)
Expected Result
(EOR0,EOR
1)
AOR0
(Negative)
AOR1
(Positive)
EOR0=2206
(Negative-0)
2135 71 EOR0=945
(Negative-0)
905 40
EOR1=300
(Positive-1)
260 40 EOR1=128
(Positive-0)
104 24
In DT training, 2122 instances are correctly predicted as negatives, whereas 100 instances are
correctly predicted as positives. Upon reviewing the expected and achieved outputs presented in
Table 6, 88.67% accuracy with an 11.33% miss rate is achieved. In the testing process of DT, 884
instances are correctly predicted as negative, whereas 51 instances are correctly predicted as positive.
After analyzing the results (Table 6), 87.14% accuracy is achieved with a miss rate of 12.86%.
CMC, 2023, vol.75, no.3 6095
Table 6: DT results
Training data Testing data
Samples =2506 Output (AOR0,AOR
1)Samples=1073 Output (AOR0,AOR
1)
Input Expected result
(EOR0,EOR
1)
AOR0
(Negative-0)
AOR1
(Positive-1)
Expected result
(EOR0,EOR
1)
AOR0
(Negative)
AOR1
(Positive)
EOR0=2206
(Negative-0)
2122 84 EOR0=945
(Negative-0)
884 61
EOR1=300
(Positive-1)
200 100 EOR1=128
(Positive-0)
77 51
After the development of classification models using supervised machine learning techniques (NB,
SVM, DT), ensemble machine learning models are developed. In training with the bagging technique,
2205 instances are correctly predicted as negative, whereas 164 instances are predicted as positive.
After analyzing the training results shown in Table 7, 94.53% accuracy is achieved with a miss rate
of 5.47%. Testing with bagging correctly predicted 913 instances as negative, whereas no of correctly
predicted positive instances 55. Upon comparing the expected results with the achieved results, it can
be concluded that the testing yielded an accuracy of 90.21% and a miss rate of 9.79%.
Table 7: Bagging results
Training data Testing data
Samples =2506 Output (AOR0,AOR
1)Samples=1073 Output (AOR0,AOR
1)
Input Expected result
(EOR0,EOR
1)
AOR0
(Negative-0)
AOR1
(Positive-1)
Expected result
(EOR0,EOR
1)
AOR0
(Negative)
AOR1
(Positive)
EOR0=2206
(Negative-0)
2205 1 EOR0=945
(Negative-0)
913 32
EOR1=300
(Positive-1)
136 164 EOR1=128
(Positive-0)
73 55
Training with voting correctly predicted 2196 instances as negative, whereas 39 instances were
correctly predicted as positive. After analyzing the results from Table 8, 89.19% accuracy is achieved
with a 10.81% miss rate. The testing process with voting correctly predicted the 897 instances as
negatives, whereas 58 instances were correctly predicted as positives. The results reflect 89% accuracy
and an 11% miss rate.
During the training process with stacking ensemble, 2201 instances are correctly classified as
negatives, whereas 53 instances are correctly predicted as positives. Table 9 presents the output results
and expected outcome, demonstrating an accuracy of 89.94% and a miss rate of 10.06%. Testing with
stacking ensemble correctly classified 911 instances as negatives, whereas 54 instances were classified
as positives. A comparison of the expected and output results reveals an accuracy of 89.93% and a
miss rate of 10.07%.
Finally, the test dataset is given to the proposed model, which correctly predicted 926 instances
as negatives out of 945 instances, whereas, on the other hand, it correctly predicted 62 instances as
6096 CMC, 2023, vol.75, no.3
Table 8: Voting results
Training data Testing data
Samples =2506 Output (AOR0,AOR
1)Samples=1073 Output (AOR0,AOR
1)
Input Expected result
(EOR0,EOR
1)
AOR0
(Negative-0)
AOR1
(Positive-1)
Expected result
(EOR0,EOR
1)
AOR0
(Negative)
AOR1
(Positive)
EOR0=2206
(Negative-0)
2196 10 EOR0=945
(Negative-0)
897 48
EOR1=300
(Positive-1)
261 39 EOR1=128
(Positive-0)
70 58
Table 9: Stacking results
Training Data Testing Data
Samples =2506 Output (AOR0,AOR
1)Samples=1073 Output (AOR0,AOR
1)
Input Expected result
(EOR0,EOR
1)
AOR0
(Negative-0)
AOR1
(Positive-1)
Expected result
(EOR0,EOR
1)
AOR0
(Negative)
AOR1
(Positive)
EOR0=2206
(Negative-0)
2201 5 EOR0=945
(Negative-0)
911 34
EOR1=300
(Positive-1)
247 53 EOR1=128
(Positive-0)
74 54
positives out of 128 instances. The results are shown in Table 10, according to which the proposed
system has achieved 92.08% accuracy and a 7.92% miss rate.
Table 10: Fused ensemble testing
N=1073 (No. of samples) Output result (AOR0,AOR
1)
Input Expected result (EOR0, EOR1)AOR
0(Negative-0) AOR1(Positive-1)
EOR0=945 (Negative-0) 926 19
EOR1=128 (Positive-1) 66 62
Table 11 shows the detailed results of base classifiers and ensemble classification models on
training and testing data, along with the results of the proposed ISDPS on test data. The analysis
showed that the proposed system outperformed both the base classifiers (NB, SVM, and DT) and the
ensembles (Bagging, Voting, and Stacking). It can be observed that the results achieved from ensemble
models are better than the results of base classifiers, and the results of final prediction by decision-level
fusion with fuzzy logic further increased the accuracy to 92.08%. The effectiveness of the proposed
system can be inferred from its performance on the fused dataset in comparison to other models.
CMC, 2023, vol.75, no.3 6097
Table 11: Detailed results of classifiers, ensembles, and ensemble fusion
ML Algorithm Dataset Accuracy Miss rate Sensitivity Specificity Positive
prediction
value
Negative
prediction
value
False
positive value
False
negative value
Likelihood
ratio
negative
Likelihood
ratio
positive
Naïve Bayes Training 0.8543 0.1457 0.2667 0.9343 0.3556 0.9036 0.0657 0.7333 0.7849 4.0570
Testing 0.8490 0.1510 0.3594 0.9153 0.3651 0.9134 0.0847 0.6406 0.6999 4.2451
Support vector
machines
Training 0.8679 0.1321 0.1333 0.9678 0.3604 0.8914 0.0322 0.8667 0.8955 4.1427
Testing 0.8658 0.1342 0.1875 0.9577 0.375 0.8969 0.0423 0.8125 0.8484 4.4297
Decision tree Training 0.8867 0.1133 0.3333 0.9619 0.5435 0.9139 0.0381 0.6667 0.6931 8.7540
Testing 0.8714 0.1286 0.3984 0.9354 0.4554 0.9199 0.0646 0.6016 0.6431 6.1725
Bagging Training 0.9453 0.0547 0.5467 0.9995 0.9939 0.9419 0.0005 0.4533 0.4535 1205.9467
Testing 0.9021 0.0979 0.4297 0.9661 0.6322 0.9260 0.0339 0.5703 0.5903 12.6892
Voting Training 0.8919 0.1081 0.13 0.9955 0.7959 0.8938 0.0045 0.87 0.8740 28.678
Testing 0.8900 0.1100 0.4531 0.9492 0.5472 0.9276 0.0508 0.5469 0.5761 8.92090
Stacking Training 0.8994 0.1006 0.1767 0.9977 0.9138 0.8991 0.0023 0.8233 0.8252 77.9453
Testing 0.8993 0.1007 0.4218 0.9640 0.6136 0.9249 0.0360 0.5781 0.5997 11.7257
Proposed fussed
ensemble
Testing 0.9208 0.0792 0.4844 0.9799 0.7654 0.9335 0.0201 0.5156 0.5262 24.0913
6098 CMC, 2023, vol.75, no.3
The performance of the proposed ISDPS is compared with other techniques in terms of accuracy
in Table 12. It is reflected that the accuracy achieved by the proposed ISDPS is higher than other
published techniques. The data fusion technique usually decreases the accuracy of the prediction
system as training of classification models on the dataset extracted from multiple sources is challenging
as compared to training on a dataset extracted from a single source. However, the proposed FEFS
technique for the selection of optimum attributes as well as the multi-step prediction system played
crucial roles in achieving high accuracy of the proposed ISDPS.
Table 12: Accuracy comparison of proposed system with other methods
Algorithm Accuracy % Miss rate %
MLP-FS [15] 85.13 14.87
Boosting-OPT-MLP [16] 79.08 20.92
ANN-BR-fused [17] 85.45 14.55
FS-variant ensemble ML [18] 84.97 15.03
NB [19] 82.65 17.35
MLP [25] 89.96 10.04
Tree [25] 84.94 15.06
Bagging ensemble [26] 80.20 19.8
Boosting ensemble [26] 81.30 18.7
Heterogeneous classifier [27] 89.20 10.8
Stacked ensemble [28] 89.10 10.9
RBFNN-based ADBBO [29] 88.65 11.35
LWL-based bagging ensemble [30] 90.10 9.9
Proposed ensemble ml fusion approach 92.08 7.92
5Conclusion
Software testing is considered an expensive activity of SDLC, which aims to ensure the high
quality of the end product by removing software bugs. Anticipating software faults before the testing
phase can assist the quality assurance team in directing their attention towards potentially defective
software modules during the testing process instead of having to scrutinize every module. This process
would limit the cost of the testing process, which would ultimately reduce the overall development
cost without compromising on software quality. The current study aimed to develop a system that
can predict faulty software modules before the testing stage by utilizing data fusion, feature selection,
and decision-level ensemble machine-learning fusion techniques. A novel FEFS technique is proposed
to select optimum features from the input dataset. The proposed system used NB, SVM, and DT
for initial predictions, followed by the development of ensemble models using Bagging, Voting,
and Stacking. The predictions from ensemble models are then given to the decision-level fusion
phase, which works on a fuzzy logic-based technique for the final prediction. The decision-level
fusion integrated the predictive accuracy of ensemble models by if-then rules-based fuzzy logic. Five
clean datasets are fused from NASA’s software repository to implement the proposed system. After
comparing the performance of the proposed ISDPS with other defect prediction techniques published
in the literature, it was found that the ISDPS outperformed all other methods on the fused dataset.
CMC, 2023, vol.75, no.3 6099
The proposed system achieved an accuracy of 92.08% on the fused data, indicating the effectiveness of
the novel FEFS and decision-level ensemble machine-learning fusion techniques. For future work, it is
suggested that hybrid filter-wrapper feature selection and deep extreme machine-learning techniques
should be incorporated into the proposedsystem. Moreover proposed design should also be optimized
for cross-project defect prediction problems.
Acknowledgement: Thanks to our families & colleagues who supported us morally.
Funding Statement: This work was supported by the Center for Cyber-Physical Systems, Khalifa
University, under Grant 8474000137-RC1-C2PS-T5.
Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the
present study.
References
[1] F. Matloob, S. Aftab, M. Ahmad, M. A. Khan, A. Fatima et al., “Software defect prediction using
supervised machine learning techniques: A systematic literature review,” Intelligent Automation & Soft
Computing, vol. 29, no. 2, pp. 403–421, 2021.
[2] D. R. Ibrahim, R. Ghnemat and A. Hudaib, “Software defect prediction using feature selection and random
forest algorithm,” in Int. Conf. on New Trends in Computer Science, Amman, Jordan, pp. 252–257, 2017.
[3] F. Matloob, T. M. Ghazal, N. Taleb, S. Aftab, M. Ahmad et al., “Software defect prediction using ensemble
learning: A systematic literature review,” IEEE Access, vol. 10, pp. 13123–13143, 2022.
[4] A. Boucher and M. Badri, “Software metrics thresholds calculation techniques to predict fault-proneness:
An empirical comparison,” Information and Software Technology, vol. 96, pp. 38–67, 2018.
[5] L. Chen, B. Fang, Z. Shang and Y. Tang, “Tackling class overlap and imbalance problems in software defect
prediction,” Software Quality Journal, vol. 26, no. 1, pp. 97–125, 2018.
[6] S. Goyal and P. K. Bhatia, “Empirical software measurements with machine learning,” in Computational
Intelligence Techniques and Their Applications to Software Engineering Problems, Boca Raton: CRC Press,
pp. 49–64, 2020.
[7] S.Huda,K.Liu,M.Abdelrazek,A.Ibrahim,S.Alyahyaet al., “An ensemble oversampling model for class
imbalance problem in software defect prediction,” IEEE Access, vol. 6, pp. 24184–24195, 2018.
[8] H. K. Lee and S. B. Kim, “An overlap-sensitive margin classifier for imbalanced and overlapping data,”
Expert Systems with Applications, vol. 98, pp. 72–83, 2018.
[9] D. L. Miholca, G. Czibula and I. G. Czibula, “A novel approach for software defect prediction through
hybridizing gradual relational association rules with artificial neural networks,” Information Sciences,vol.
441, pp. 152–170, 2018.
[10] R. Özakıncı and A. Tarhan,. “Early software defect prediction: A systematic map and review,” Journal of
Systems and Software, vol. 144, pp. 216–239, 2018.
[11] S. S. Rathore and S. Kumar, “Towards an ensemble based system for predicting the number of software
faults,” Expert Systems with Applications, vol. 82, pp. 357–382, 2017.
[12] S. S. Rathore and S. Kumar, “A study on software fault prediction techniques,” Artificial Intelligence
Review, vol. 51, no. 2, pp. 255–327, 2019.
[13] L. H. Son, N. Pritam, M. Khari, R. Kumar, P. T. M. Phuong et al., “Empirical study of software defect
prediction: A systematic mapping,” Symmetry, vol. 11, no. 2, pp. 2–28, 2019.
[14] F. Matloob, S. Aftab and A. Iqbal, “A framework for software defect prediction using feature selection and
en-semble learning techniques,” International Journal of Modern Education and Computer Science, vol. 11,
no. 12, pp. 14–20, 2019.
6100 CMC, 2023, vol.75, no.3
[15] A. Iqbal and S. Aftab, “A classification framework for software defect prediction using multi-filter feature
selection technique and mlp,” International Journal of Modern Education and Computer Science, vol. 12,
no. 1, pp. 18–25, 2020.
[16] A. Iqbal and S. Aftab, “Prediction of defect prone software modules using mlp based ensemble techniques,”
International Journal of Information Technology and Computer Science, vol. 12, no. 3, pp. 26–31, 2020.
[17] M. S. Daoud, S. Aftab, M. Ahmad, M. A. Khan, A. Iqbal et al., “Machine learning empowered software
defect prediction system,” Intelligent Automation & Soft Computing, vol. 31, no. 32, pp. 1287–1300, 2022.
[18] U. Ali, S. Aftab, A. Iqbal, Z. Nawaz, M. S. Bashir et al., “Software defect prediction using variant based
ensemble learning and feature selection techniques,”International Journal of Modern Education & Computer
Science, vol. 12, no. 5, pp. 29–40, 2020.
[19] A. Iqbal, S. Aftab, U. Ali, Z. Nawaz, L. Sana et al., “Performance analysis of machine learning techniques
on software defect prediction using nasa datasets,” International Journal of Advanced Computer Science and
Applications, vol. 10, no. 5, pp. 300–308, 2019.
[20] R. Mahajan, S. K. Gupta and R. K. Bedi, “Design of software fault prediction model using br technique,”
Procedia Computer Science, vol. 46, pp. 849–858, 2015.
[21] M. Shepperd, Q. Song, Z. Sun and C. Mair, “Data quality: Some comments on the nasa software defect
datasets,” IEEE Transactions on Software Engineering, vol. 39, no. 9, pp. 1208–1215, 2013.
[22] Y. J. Cruz, M. Rivas, R. Quiza, A. Villalonga, R. E. Haber et al., “Ensemble of convolutional neural
networks based on an evolutionary algorithm applied to an industrial welding process,” Computers in
Industry, vol. 133, pp. 103530–103538, 2021.
[23] M. Shahhosseini, G. Hu and H. Pham, “Optimizing ensemble weights and hyperparameters of machine
learning models for regression problems,” Machine Learning with Applications, vol. 7, pp. 100251–100260,
2022.
[24] A. U. Rahman, S. Abbas, M. Gollapalli, R. Ahmed, S. Aftab et al., “Rainfall prediction system using
machine learning fusion for smart cities,” Sensors, vol. 22, no. 9, pp. 3504–3519, 2022.
[25] S. Goyal and P. K. Bhatia, “Comparison of machine learning techniques for software quality prediction,”
International Journal of Knowledge and Systems Science, vol. 11, no. 2, pp. 20–40, 2020.
[26] A. O. Balogun, F. B. L. Balogun, H. A. Mojeed, V. E. Adeyemo, O. N. Akande et al., “Smote-based
homogeneous ensemble methods for software defect prediction,” in 22nd Int. Conf. on Computational
Science and its Applications, Cagliari, Italy, pp. 615–631, 2022.
[27] T. T. Khuat and M. H. Le, “Evaluation of sampling-based ensembles of classifiers on imbalanced data for
software defect prediction problems,” SN Computer Science, vol. 1, no. 2, pp. 1–16, 2020.
[28] S. Goyal and P. K. Bhatia, “Heterogeneous stacked ensemble classifier for software defect prediction,”
Multimedia Tools and Applications, vol. 81, pp. 37033–37055, 2022.
[29] P. Kumudha and R. Venkatesan, “Cost-sensitive radial basis function neural network classifier for software
defect prediction,” the Scientific World Journal, vol. 2016, pp. 1–20, 2016.
[30] A. S. Abdou and N. R. Darwish, “Early prediction of software defect using ensemble learning: A
comparative study,” International Journal of Computer Applications, vol. 179, no. 46, pp. 29–40, 2018.
... The authors in [36] proposed an intelligent system based on feature selection and ensemble machine learning techniques to predict defective modules in the software. A novel metric selection technique was introduced to select the most relevant features, and a three-step nested approach was employed for accurate prediction. ...
... Moreover, these standalone classifiers might not capture the diverse patterns present in complex software datasets, leading to suboptimal predictive performance [3], [28]. Although some studies have explored ensemble techniques, most have predominantly focused on homogeneous classifiers within their ensembles [23], [36]. In contrast, the proposed framework introduces a paradigm shift by integrating the predictive accuracy of heterogeneous classifiers-Random Forest (RF), Support Vector Machine 7 This article has been accepted for publication in IEEE Access. ...
Article
Full-text available
Software defect prediction plays a crucial role in enhancing software quality while achieving cost savings in testing. Its primary objective is to identify and send only defective modules to the testing stage. This research introduces an intelligent ensemble-based software defect prediction model that combines diverse classifiers. The proposed model employs a two-stage prediction process to detect defective modules. In the first stage, four supervised machine learning algorithms are employed: Random Forest, Support Vector Machine, Naïve Bayes, and Artificial Neural Network. These algorithms are optimized through iterative parameter optimization to achieve the highest accuracy possible. In the second stage, the predictive accuracy of the individual classifiers is integrated into a voting ensemble to make the final predictions. This ensemble approach further improves the accuracy and reliability of the defect predictions. Seven historical defect datasets from the NASA MDP repository, namely CM1, JM1, MC2, MW1, PC1, PC3, and PC4, were utilized to implement and evaluate the proposed defect prediction system. The results demonstrate that each dataset’s proposed intelligent system achieved remarkable accuracy, outperforming twenty state-of-the-art defect prediction techniques, including base classifiers and ensemble methods.
... The second dataset, IDRiD (Indian Diabetic Retinopathy Image Dataset), includes 516 images. These datasets are merged[28] [30], resulting in a total of 4176TABLE II. CONFUSION MATRIX OF TRAINING SET USING DENSENET121 ...
Conference Paper
Diabetic retinopathy is a condition associated with diabetes that damages the blood vessels within the retina resulting in vision impairment or even blindness. Early detection and classification of DR enables timely intervention, which is crucial for preventing vision loss and blindness in diabetic patients. This research presents a framework for binary classification, employing transfer learning to identify diabetic retinopathy in individuals with diabetes. APTOS19 and IDRiD, two datasets containing fundus images, are merged together for training the transfer learning models to predict the presence or absence of the disease. Many preprocessing techniques have been applied to these images like resizing, Gaussian filtering, and dataset splitting. After the split, training set is augmented using zooming, rotation, flipping etc. to increase diversity. The transfer learning models used are: ResNet50 and DenseNet121. These models are fine tuned for classification. The results highlight that the DenseNet121 model achieved a superior test accuracy of 97.22% as compared to ResNet50.
... DR occurs as a result of high sugar levels damaging the retinal blood vessels, leading to insufficient oxygen supply, blood leakage, swelling, or complete blockage of the vessels [2][3][4][5]. Diabetic Retinopathy can be detected in the retina through abnormalities like exudates, micro-aneurysms, abnormal blood vessels, and hemorrhages [6]. If DR is detected at its early stage, it can be easily managed [7]. ...
Conference Paper
Diabetic Retinopathy is a serious eye condition resulting from long-term diabetes mellitus that can lead to permanent blindness if not treated on time. Early detection can reduce the likelihood of serious disability. This research introduces a binary classification framework using Transfer Learning to identify diabetic retinopathy in diabetic patients. The research makes use of an image-based dataset, APTOS 2019, sourced from Kaggle. These images are used to train transfer learning models for predicting the presence or absence of this disease among patients. Preprocessing steps including resizing, Gaussian filtering and dataset split are employed prior to classification. Afterwards, data augmentation techniques like rotation, zooming, flipping, shifting and rescaling are applied to increase the training set. For classification, two Transfer Learning models, EfficientNetB3 and VGG16, are used after fine-tuning. The results indicate that EfficientNetB3 model outperformed VGG16 model due to its computationally efficient architecture, achieving a test accuracy of 97.82%.
... In the paper [40], they present a technique for SDP that integrates feature selection, data fusion, and ensemble machine learning methods. This framework utilizes a range of ML models that consists of SVM, DT, and NB, in combination with ensemble machine learning methods like voting, bagging, and stacking, to predict faulty software modules. ...
Article
Full-text available
Software defect prediction is a crucial area of study focused on enhancing software quality and cutting down on software upkeep expenses. Cross Project Defect Prediction (CPDP) is a method meant to use information from different source projects to spot software issues in a specific project. CPDP comes in handy when the project being analyzed lacks enough or any data about defects for creating a dependable defect prediction model. Machine learning that is a part of artificial intelligence learns from data and then makes forecasts or choices. Machine learning (ML) is a key component of CPDP because it can learn from heterogeneous and imbalanced data sources. However, there are many challenges and open issues in applying machine learning to CPDP, such as data selection, feature extraction, model selection, evaluation metrics, and transfer learning. In this study, we provide a complete review of existing literature from 2018 to 2023 on Defect Prediction using Machine Learning, covering the main methods, applications, and limitations. We also use ML to identify current research gaps and future directions for CPDP. This paper will serve as a useful reference for researchers interested in using ML for CPDP.
Conference Paper
Diabetic retinopathy is an eye disease damaging the blood vessels of retina as a result of long term diabetes. In this research, an ensemble classification system is proposed by combining two transfer learning models to identify diabetic retinopathy among diabetic patients. A fusion of two datasets containing fundus images is used in this study: 1) APTOS, which contains 3662 images and 2) IDRiD, which contains 516 images. For binary classification, the dataset is divided into two classes: DR and No DR. Preprocessing steps are applied on the dataset such as resizing the images to 150x150x3, applying Gaussian filtering, balancing minority classes using SMOTE and splitting the dataset to 80:20 ratios. Data augmentation techniques like zooming, rotation etc. are also used to augment the images. Two transfer learning models, Xception and EfficientNetB3, are used for classification after fine-tuning. An ensemble of these models is built which achieved the highest test accuracy of 97.47% outperforming individual models.
Article
Full-text available
Software Defect Prediction (SDP) is crucial for enhancing software quality and minimizing issues after release. The advent of machine learning, particularly in Cross-Project Defect Prediction (CPDP), has garnered significant attention for its potential to enhance defect predictions in one project by leveraging information from another. A critical factor influencing CPDP effectiveness is feature selection, the process of identifying the most relevant features from an available set. This review article thoroughly examines the role of feature selection in CPDP. Existing feature selection methods are systematically analyzed and classified within the CPDP context, encompassing both traditional and state-of-the-art approaches. The review delves into the challenges and opportunities presented by diverse project characteristics, data heterogeneity, and the curse of dimensionality. Additionally, the article underscores how feature selection impacts model performance, generalization, and adaptability across various software projects. Through synthesizing findings from multiple studies, trends, best practices, and potential research directions in the field are identified. In conclusion, this review article provides valuable insights into the significance of feature selection for enhancing the reliability and efficiency of CPDP models.
Article
Full-text available
Precipitation in any form-such as rain, snow, and hail-can affect day-today outdoor activities. Rainfall prediction is one of the challenging tasks in weather forecasting process. Accurate rainfall prediction is now more difficult than before due to the extreme climate variations. Machine learning techniques can predict rainfall by extracting hidden patterns from historical weather data. Selection of an appropriate classification technique for prediction is a difficult job. This research proposes a novel real-time rainfall prediction system for smart cities using a machine learning fusion technique. The proposed framework uses four widely used supervised machine learning techniques , i.e., decision tree, Naïve Bayes, K-nearest neighbors, and support vector machines. For effective prediction of rainfall, the technique of fuzzy logic is incorporated in the framework to integrate the predictive accuracies of the machine learning techniques, also known as fusion. For prediction , 12 years of historical weather data (2005 to 2017) for the city of Lahore is considered. Pre-processing tasks such as cleaning and normalization were performed on the dataset before the classification process. The results reflect that the proposed machine learning fusion-based framework outperforms other models.
Article
Full-text available
Production of high-quality software at lower cost has always been the main concern of developers. However, due to exponential increases in size and complexity, the development of qualitative software with lower costs is almost impossible. This issue can be resolved by identifying defects at the early stages of the development lifecycle. As a significant amount of resources are consumed in testing activities, if only those software modules are shortlisted for testing that is identified as defective, then the overall cost of development can be reduced with the assurance of high quality. An artificial neural network is considered as one of the extensively used machine-learning techniques for predicting defect-prone software modules. In this paper, a cloud-based framework for real-time software-defect prediction is presented. In the proposed framework, empirical analysis is performed to compare the performance of four training algorithms of the back-propagation technique on software-defect prediction: Bayesian regularization (BR), Scaled Conjugate Gradient, Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton, and Levenberg-Marquardt algorithms. The proposed framework also includes a fuzzy layer to identify the best training function based on performance. Publicly available cleaned versions of NASA datasets are used in this study. Various measures are used for performance evaluation including specificity, precision , recall, F-measure, an area under the receiver operating characteristic curve, accuracy, R 2 , and mean-square error. Two graphical user interface tools are developed in MatLab software to implement the proposed framework. The first tool is developed for comparing training functions as well as for extracting the results; the second tool is developed for the selection of the best training function using fuzzy logic. A BR training algorithm is selected by the fuzzy layer as it This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Article
Full-text available
Software defect prediction (SDP) plays an important role to ensure that software meets quality standards; by highlighting the modules which are prone to errors and hence allows to focus the test efforts on them. Class imbalance nature of the defect dataset hinders the defect predictors to correctly classify the buggy modules. Here, we introduce a novel heterogenous ensemble classifier built with stacking methodology to overcome this problem of imbalanced datasets and hence, significant improvement in the prediction power is being proposed. Stacked ensemble is achieved with the best known classifiers from SDP literature as base classifiers (artificial neural network, nearest neighbor, tree based classifier, Bayesian classifier and support vector machines). For experimental work, five public datasets from NASA corpus are used. A comparative analysis for the proposed heterogenous stacking based ensemble method is made with the base classifiers and with the state-of-the art ensemble based SDP models over the evaluation criteria of ROC, AUC and accuracy. It is found that the proposed heterogenous stacking based ensemble classifier outperforms the base classifiers by 12% in terms of AUC score and by 8% in terms of Accuracy. It improves the performance of state-of-the-art ensemble methods by 4% in terms of AUC score and by 9% in terms of Accuracy. It can be concluded from the comparative analysis that the proposed SDP classifier is best performer among the candidate SDP classifiers statistically.
Article
Full-text available
This paper presents an approach for image classification based on an ensemble of convolutional neural networks and the application to a real case study of an industrial welding process. The ensemble consists of five convolutional neural networks, whose outputs are combined through a voting policy. In order to select appropriate network parameters (i.e., the number of convolutional layers and layers hyperparameters) and voting policy, an efficient search process was carried out by using an evolutionary algorithm. The proposed method is applied and validated in a case study focused on detecting misalignment of metal sheets to be joined through submerged arc welding process. After selecting the most convenient setup, the ensemble outperforms other seven strategies considered in a comparison in several metrics, while maintaining an adequate computational cost.
Article
Full-text available
Recent advances in the domain of software defect prediction (SDP) include the integration of multiple classification techniques to create an ensemble or hybrid approach. This technique was introduced to improve the prediction performance by overcoming the limitations of any single classification technique. This research provides a systematic literature review on the use of the ensemble learning approach for software defect prediction. The review is conducted after critically analyzing research papers published since 2012 in four well-known online libraries: ACM, IEEE, Springer Link, and Science Direct. In this study, five research questions that cover the different aspects of research progress on the use of ensemble learning for software defect prediction are addressed. To extract the answers to identified questions, 46 most relevant papers are shortlisted after a thorough systematic research process. This study will provide compact information regarding the latest trends and advances in ensemble learning for software defect prediction and provide a baseline for future innovations and further reviews. Through our study, we discovered that frequently employed ensemble methods by researchers are the random forest, boosting, and bagging. Less frequently employed methods include stacking, voting and Extra Trees. Researchers proposed many promising frameworks, such as EMKCA, SMOTE-Ensemble, MKEL, SDAEsTSE, TLEL, and LRCR, using ensemble learning methods. The AUC, accuracy, F-measure, Recall, Precision, and MCC were mostly utilized to measure the prediction performance of models. WEKA was widely adopted as a platform for machine learning. Many researchers showed through empirical analysis that feature selection and data sampling were important pre-processing steps that improve the performance of ensemble classifiers.
Article
Full-text available
Software defect prediction (SDP) is the process of detecting defect-prone software modules before the testing stage. The testing stage in the software development life cycle is expensive and consumes the most resources of all the stages. SDP can minimize the cost of the testing stage, which can ultimately lead to the development of higher-quality software at a lower cost. With this approach, only those modules classified as defective are tested. Over the past two decades, many researchers have proposed methods and frameworks to improve the performance of the SDP process. The main research topics are association, estimation, clustering , classification, and dataset analysis. This study provides a systematic literature review that highlights the latest research trends in the area of SDP by providing a critical review of papers published between 2016 and 2019. Initially, 1012 papers were shortlisted from three online libraries (IEEE Xplore, ACM, and ScienceDirect); following a systematic research protocol, 22 of these papers were selected for detailed critical review. This review will serve researchers by providing the most current picture of the published work on software defect classification.
Article
Full-text available
Testing is considered as one of the expensive activities in software development process. Fixing the defects during testing process can increase the cost as well as the completion time of the project. Cost of testing process can be reduced by identifying the defective modules during the development (before testing) stage. This process is known as "Software Defect Prediction", which has been widely focused by many researchers in the last two decades. This research proposes a classification framework for the prediction of defective modules using variant based ensemble learning and feature selection techniques. Variant selection activity identifies the best optimized versions of classification techniques so that their ensemble can achieve high performance whereas feature selection is performed to get rid of such features which do not participate in classification and become the cause of lower performance. The proposed framework is implemented on four cleaned NASA datasets from MDP repository and evaluated by using three performance measures,
Chapter
Full-text available
Class imbalance is a prevalent problem in machine learning which affects the prediction performance of classification algorithms. Software Defect Prediction (SDP) is no exception to this latent problem. Solutions such as data sampling and ensemble methods have been proposed to address the class imbalance problem in SDP. This study proposes a combination of Synthetic Minority Oversampling Technique (SMOTE) and homogeneous ensemble (Bagging and Boosting) methods for predicting software defects. The proposed approach was implemented using Decision Tree (DT) and Bayesian Network (BN) as base classifiers on defects datasets acquired from NASA software corpus. The experimental results showed that the proposed approach outper-formed other experimental methods. High accuracy of 86.8% and area under operating receiver characteristics curve value of 0.93% achieved by the proposed technique affirmed its ability to differentiate between the defective and non-defective labels without bias.
Article
Aggregating multiple learners through an ensemble of models aim to make better predictions by capturing the underlying distribution of the data more accurately. Different ensembling methods, such as bagging, boosting, and stacking/blending, have been studied and adopted extensively in research and practice. While bagging and boosting focus more on reducing variance and bias, respectively, stacking approaches target both by finding the optimal way to combine base learners. In stacking with the weighted average, ensembles are created from weighted averages of multiple base learners. It is known that tuning hyperparameters of each base learner inside the ensemble weight optimization process can produce better performing ensembles. To this end, an optimization-based nested algorithm that considers tuning hyperparameters as well as finding the optimal weights to combine ensembles (Generalized Weighted Ensemble with Internally Tuned Hyperparameters (GEM-ITH)) is designed. Besides, Bayesian search was used to speed-up the optimizing process and a heuristic was implemented to generate diverse and well-performing base learners. The algorithm is shown to be generalizable to real data sets through analyses with ten publicly available data sets.
Chapter
Measurement of the attributes of software processes, products, projects and people associated with the software development is necessary so that the industry can deliver quality product, that is, high-quality software within the limits of time 50and cost. It is evident that accurate software measurements using empirical techniques are essential. As per the Chaos Report (Chaos Report 2015), only 23% of total projects get the status of “successful project completion.” The reason for this poor successful completion rate is the inaccurate measurement of attributes of software quality and quantity (Demarco 1982). Empirical techniques are essential for accurate measurements in the field of software engineering. We need to evaluate, assess, predict, monitor and control the various aspects of software development. For successful project completion, the quantitative methods need to be followed. This chapter discusses the empirical approach for software measurements using machine learning (ML) techniques. A majority of research work has already been done in this field; ML has found software measurements a very fertile ground. Both dimensions including software quality and quantity can easily be measured empirically using ML techniques. Software quantity measurement is analogous to effort estimation, cost estimation, schedule prediction and several software measurement tasks, which can be modeled as regression-based tasks. Software quality measurements is analogous to defect prediction, quality prediction, prediction of faulty modules and other such problems, which can be formulated as classification tasks in the world of ML. In this way, software quantity and quality measurements together can be formulated as supervised ML-based problems. This is the base point which is being utilized in this research field for measuring software using ML techniques empirically. Since the 1980s, this field is resonating with software researchers, which is quite fascinating. This chapter demonstrates the usage of ML techniques for both software quality and quantity measurements. With a basic introduction to the current trends of the field and moving through problem definition, we will reach the experimental set-up and then draw inferences from the experiments. This chapter aims to provide the reader practical and applicable knowledge of ML and deep learning for empirical software measurements.