Conference PaperPDF Available

Combining integreted sampling technique with feature selection for software defect prediction

Authors:
  • Nusa Mandiri Institute

Abstract and Figures

Good quality software is a supporting factor that is important in any line of work in of society. But the software component defective or damaged resulting in reduced performance of the work, and can increase the cost of development and maintenance. An accurate prediction on software module prone defects as part of efforts to reduce the increasing cost of development and maintenance of software. An accurate prediction on software module prone defects as part of efforts to reduce the increasing cost of development and maintenance of software. From the results of these studies are known, there are two problems that can decrease performance prediction of classifiers such imbalances in the distribution of the class and not all of the attributes that exist in the dataset are relevant. So as to handle both of these issues, we conducted this research using integrated a sample technique with feature selection method. Based on research done previously, there are two methods of samples including random under sampling and SMOTE for random over sampling. While on feature selection method such as chi square, information gain and relief methods. After doing the research process, integration SMOTE technique with relief method used on Naïve Bayes classifiers, the result of the predicted value better than any other method that is 82%.
Content may be subject to copyright.
Combining Integreted Sampling Technique with
Feature Selection for Software Defect Prediction
Sukmawati Anggraeni Putri
STMIK Nusa Mandiri, Information System Program
Jakarta, Indonesia
sukmawati@nusamandiri.ac.id
Frieyadie
AMIK BSI Jakarta, Management Informatic Program
Jakarta, Indonesia
frieyadie@bsi.ac.id
AbstractGood quality software is a supporting factor that is
important in any line of work in of society. But the software
component defective or damaged resulting in reduced
performance of the work, and can increase the cost of
development and maintenance. An accurate prediction on
software module prone defects as part of efforts to reduce the
increasing cost of development and maintenance of software. An
accurate prediction on software module prone defects as part of
efforts to reduce the increasing cost of development and
maintenance of software. From the results of these studies are
known, there are two problems that can decrease performance
prediction of classifiers such imbalances in the distribution of the
class and irrelevant of the attributes that exist in the dataset. So
as to handle both of these issues, we conducted this research
using integrated a sample technique with feature selection
method. Based on research done previously, there are two
methods of samples including random under sampling and
SMOTE for random over sampling. While on feature selection
method such as chi square, information gain and relief methods.
After doing the research process, integration SMOTE technique
with relief method used on Naïve Bayes classifiers, the result of
the predicted value better than any other method that is 82%.
Keywordsimbalance class, feature selection, software defect
prediction
I. INTRODUCTION
In the development of the use of software to support the
activities and the work increases, certainly the quality of the
software must be considered. But the software component
defective or damaged resulting in a decrease in customer
satisfaction, as well as an increase in the cost of development
and maintenance [1].
An accurate prediction on software module software
defects as part of effort to reduce the increasing cost of
development and maintenance of software that has been done
by previous researchers [2]. In this study focuses on 1)
estimating the amount of defect in the software, 2) find the
relationship of software defects, 3) classifying defect software
components, which defect module and non defect module [3].
While the software defect prediction research that has been
done by previous research such as Naïve Bayes classifier [4]
produce a good performance with an average probability of
71%. Naïve Bayesian is a simple classification [5] with a time
of learning process is faster than any other machine learning
[4]. Additionally it has a good reputation on the accuracy of
prediction [6]. However, this method is not optimal in the case
of having an unbalanced dataset [7].
The predicted performance of this method gets worse when
the dataset has an irrelevant attribute [8]. While NASA MPD
dataset [9] which have been used by previous researchers on
software predictions have unbalanced defect datasets with
attributes that are not all usable. To deal with unbalanced
datasets there are three approaches that can be used, including
data level (sample technique), algorithm level and ensamble
method [10].
In general, the sample technique is divided into two types,
including over sampling method is Random Over Sampling
[11]. While the under sampling method is Random Under
Sampling [12] and Resample method [13].
As for solving the problem of attributes that are irrelevant
using attribute selection methods such as Information Gain,
Chi Square, and Relief [14].
In this study, we propose to integrate the sample technique
with feature selection method to handle imbalance class and
attribute irrelevant to the Naïve Bayesian classification to
produce a better accuracy in the software defect prediction.
There are several steps done in this study, First, sample
technique to handle the imbalance class. Then, approaching the
selection attributes thrown clasifiying for software defect
predicition. Then calculating the validation and evaluation
technique to determine the proposed method is wheter it better
or not with the existing methods.
II. RELATED WORK
Research on the software defect prediction are one of the
research that has been done by previous researches. From these
studies it is known state of the art about software defect
prediction research that discusses the imbalance class.
As research done by Chawla [15] who proposed the use of
Synthetic Minority Oversampling Technique to handle the
class imbalance by using a Naïve Bayesian classifier,
2017 5TH INTERNATIONAL CONFERENCE ON CYBER AND IT SERVICE MANAGEMENT (CITSM)
978-1-5386-2739-6 @2017 IEEE
10.1109/CITSM.2017.8089264
implemented in eight different dataset from the UCI repository.
The results showed for all the data using SMOTE technique on
balancing process has a greater potential to improve the
performance of Naïve Bayesian and C.45 classifier use in the
classification process.
While the research done by Riquelme [13] which states that
the dataset in software engineering is very unbalanced.
Therefore to balance using two technique, including SMOTE
and Weka Randomly Resampling using J48 and Naïve
Bayesian classifier is applied to the five datasets form
PROMISE repository. The results show the approach SMOTE
able to increase the average AUC value of 11.6%. Based on
these results, balancing techniques can better classify minority
classes.
While the research done by Putri, Wahono [16] states that
NASA MDP dataset has unbalanced classes and attribute not
relevant. Using the balancing class SMOTE and feature
selection information gain, can improve the prediction results
are better than Riquelme research only to rebalance the dataset
class.
Furthermore the study done by Gao using one of the feature
selection algorithm which Relief that have been used in the
research done by Kira. The research Gao shows that the Relief
method as well as the Information Gain [17].
Therefore in this study implement an approach sample
techniques which Synthetic Minority Over-sampling
Technique (SMOTE) to reduce the influence of class
imbalance and improve the ability to predict the minority class.
Relief algorithm as well as for selection the relevant attribute.
It also use Naïve Bayes algorithm used in the classification
process.
III. METHODE
3.1. Sample Technique
The sample approach is one approach to solve the
problem of class imbalance in a dataset. The commonly used
sample approaches are over-sampling and under-sampling
techniques [10].
a. Over-Sampling Technique
Over-sampling causes excessive duplication in the
positive class cause over-fitting. Moreover, over-sampling
can increase the number of training dataset, thus causing
excessive computational costs [15].
Nevertheless, in research carried out by Chawla [15]
found Synthetic Minority Over-sampling Technique
(SMOTE) which produces artificially interpolated data on the
over-sampling in the minority. The algorithm is simulated by
finding k nearest to each minority sample, and then for each
neighbor, randomly pick a point on the line connecting
neighbors and sample itself. Finally, the data at that point is
entered as an example of the new minority. By adding new
minority sample into training data, is expected to over-fitting
can be resolved [15].
b. Under-sampling Technique
Under-sampling approaches have been reported to
outperform over-sampling approaches in previous literatures.
However, the under-sampling approach reduces the majority
class, perhaps losing useful information. This results in less
accurate predictions [11].
Sampling is done randomly, so the majority of the
sample is as large as the number of minority samples.
Meanwhile, the sample used in the under-skilled approach is
the majority sample that is under the sample [18] .
We implemented our proposed Random Under-
Sampling and SMOTE in the WEKA tool.
3.2. Feature Selection
At dataset software defects, attributes represent software
metrics taken from the source code of the software used in the
learning process. However, some attributes that are not
relevant to require the removal to improve the accuracy of
software defects prediction.
There are two algorithms used in the selection of attributes
that wrapper and a filter [17]. In the wrapper algorithm using
feedback from learning algorithm. While on the filter
algorithm, the training data are analyzed using methods that
do not require learning algorithms to determine the most
relevant attributes [17]. In this study only uses algorithms to
filter the selection attribute. Such as, chi-square (CS),
information gain (IG), and Relief algorithm (RLF) [17].
a. Chi Square (CS)
CS can evaluate attribute values by calculating the
statistical value related to the class. Statistical CS (also
symbolized as 𝜒2) is a nonparametric statistical techniques by
using nominal data (category) with the test frequency.
(1)
where 𝜒2 is the test statistic is asymptotically approaching the
𝜒2 distribution, Oi is the observed frequencies, and Ei is the
expected frequency. n is the number of possible outcomes of
each event.
b. Information Gain (IG)
In the IG is able to assess the importance atibut by
measuring the information gain associated with the class.
Generally IG estimates that the change in entropy of
information before the state took some information.
IG (Class,Attribute)=H(Class)−H(Class|Attribute) (2)
where H determine entropy. More specifically, suppose
that A is the set of all attributes and class attributes being
dependent of all the training examples, the value of (a, y)
with y Class defines the value of specific examples to
attribute a A, V is the set of attribute values, namely V =
{value (a, y) | a A y Class} and | s | is the number of
elements in the set s. G to attribute a. A defined as
follows:
(3)
c. Relief (RLF)
For a given sample R, Relief find the nearest neighbor of
the same class or different, which is called the 'nearest hit H'
and 'nearest miss M'. It will be updated estimate of the quality
of W [A] for all attributes A depending on their values for R,
F, and H. The process is repeated in accordance with the
value of m, where m is determined by the user. Diff function
(Attribute, Instance1, Instance2) clearly defined in
accordance with the type attribute. For discrate attributes are
defined as follows:
(4)
The underlying hypothesis is that the relevant attributes are
able to distinguish between things of different classes and
showed no difference between instances of the same class.
3.3. Naive Bayesian (NB) Classifier
Naïve Bayes assumes that the impact of a certain
class attribute value is independent of the values of other
attributes. This assumption is called the independent class
conditional. This is done to simplify the calculation involved,
and in this sense it is considered naive. Naïve Bayes allows
representation of dependencies among a subset of attributes
[19]. By mathematical calculation as follows:
(5)
The probability P(X1|C1), P(X2|Cj), …, P(Xn|Ci) can be easily
estimated from the training set. Given that Xk refers to the
attribute values for the sample X.
a. If Ak is a category, then P(Xk|Cj) is number of tuples in D
class Cj has a value Xk to attribute Ak, divided from |C1,D|,
number of class Cj tuples in D.
b. If Ak is a continuous value, it is usually assumed that
the values have a Gaussian distribution with mean (μ) and
standard deviation (σ), can be defined as follows:
(6)
While
(7)
We need to calculate and , where the mean and
standard deviation of the value attribute Ak for training
samples of class Cj.
3.4. Validation Technique
In this study using validation techniques 10 fold cross
validation, with resulting confusion matrix [20] which are
described in Table 1. In the confusion matrix, TN is true
negative results are classified (true negative). FN is a positive
result that is not properly classified as negative. TP is a
positive result correctly classified (true positive). FP is the
negative results are not correctly classified as positive (false
positive). TABLE 1.
CONFUSION MATRIX
Confusion matrix of values will produce the ROC curve
(Receive Operationg Characteristics) whose task is to
evaluate the performance of the classifier algorithm. Then the
Area Under the ROC as a reference for evaluating which
provides a summary of the performance of the classifier
algorithm [20]. Area Under the ROC (Receive Operating
Characteristic) (AUC) is a single value measurements are
derived from signal detection. AUC values range from 0 to 1.
The ROC curve is used to characterize the trade-offs between
true positive rate (TPR) and false positive rate (FPR). A
classifier that provides a large area under the curve is more of
a classifier with a smaller area under the curve [21].
3.5. Evaluation Technique
In the statistical evaluation consisted of testing
parametric and non-parametric test. As for testing the
significant difference of the classifier algorithm performance
using the non-parametric tests, such as tests friedman [22].
Friedman test is a non-parametric test that is equivalent
to the ANOVA parametric test. In the Friedman test ranking
algorithm for each data set separately, the algorithm
performance is good to be ranked first, while for the second-
best given. Friedman test carried out by the appropriate post
hoc test for comparison of more than one classifier with
multiple datasets [22].
Below will be shown on the test friedman formula:
(8)
Where
= khai value - the level of two-way squares friedman
N = amout of sample
K = the number of groups samples
1, 3, 12 = constanta
Class
Initial Value
True
False
True
TP
FP
False
FN
TN
IV. EXPERIMENT RESULT
4.1. Dataset
In this study, using a dataset of software metric (National
Aeronautics and Space Administration) MDP repository. They
are public datasets used by previous researchers in the field of
software defects prediction. NASA dataset MDP can be
obtained via the official website Wikispaces (http://nasa-
softwaredefectdatasets.wikispaces.com/). Dateset used in this
study consisted of CM1, MW1, PC1 and PC4 are described in
Table 2.
TABLE 2.
NASA MDP DATASET
As shown in Table 2, that each dataset consists of several
software modules, along with the number of errors and
attributes characteristic code. NASA dataset preprocessing
MDP has 38 attributes plus one attribute disabled or not
disabled (defective?). The attribute consists of an attribute type
Halstead, McCabe, Line of Code (LOC) and miscellaneous
attributes [23].
The dataset was obtained from NASA MDP software
matrices which are described in Table 3, as follows:
TABLE 3.
SPECIFICATIONS AND ATTRIBUTES NASA MDP
4.2. Implementation and Experiment Results
In this study using Naive Bayesian classifier algorithm at 4
dateset NASA MDP (CM1, MW1. PC1 and PC4). Classifier
algorithm will be applied on the integration sample technique
with a selection attribute method. Like, NB classifier with
SMOTE and CS, NB classifier with SMOTE and IG, NB
classifier with SMOTE and RLF, NB classifier with RUS +
CS, NB classifier with RUS and IG, and NB classifier with
RUS and RLF.
TABLE 4.
AUC VALUE
Clasification
CM1
MW1
PC1
PC4
NB
0,694
0,727
0,768
0,825
NB with SMOTE and
CS
0,766
0,759
0,734
0,856
NB with RUS and CS
0,752
0,722
0,79
0,859
NB with SMOTE and
IG
0,751
0,767
0,817
0,856
NB with RUS and IG
0,753
0,722
0,79
0,859
NB with SMOTE and
RLF
0,761
0,779
0,821
0,86
NB with RUS and
RLF
0,755
0,747
0,793
0,878
In Table 4 shows the results AUC values were well on the
use of models NB with SMOTE and RLF on two datasets
(MW1, PC1). As for the CM1 dataset shows AUC good value
on NB with SMOTE and CS models. And for PC4 dataset
shows AUC good value on NB with RUS and RLF model.
4.3. Comparison Between Previous Models
To know that the proposed model has increased the
accuracy after the optimized use of integration between
sampling technique and feature selection algorithm, then do a
comparison between the proposed model and a model that has
been proposed by Menzies [4], Requilme [13] and Putri [24].
NASA MDP Dataset
CM1
MW1
PC1
PC4
LOC
Count
LOC_total
X
X
X
X
LOC_blank
X
X
X
X
LOC_code_and_com
ment
X
X
X
X
LOC_comment
X
X
X
X
LOC_executable
X
X
X
X
Number_of_lines
X
X
X
X
Halstead
Attributes
Content
X
X
X
X
Difficulity
X
X
X
X
Effort
X
X
X
X
Error_est
X
X
X
X
Length
X
X
X
X
Level
X
X
X
X
Prog_time
X
X
X
X
Volume
X
X
X
X
Num_operands
X
X
X
X
Num_operators
X
X
X
X
Num_unique_operan
ds
X
X
X
X
Num_unique_operat
ors
X
X
X
X
McCabe
Attriutes
Cyclomatic_complex
ity
X
X
X
X
Cyclomatic_density
X
X
X
X
Design_complexity
X
X
X
X
Essential_complexity
X
X
X
X
Miscellan
eous
Attributes
(another)
Branch_count
X
X
X
X
Call_pairs
X
X
X
X
Condition_count
X
X
X
X
Decision_count
X
X
X
X
Decision_density
X
X
X
X
Design_density
X
X
X
X
Edge_count
X
X
X
X
Essential_density
X
X
X
X
Parameter_count
X
X
X
X
Maitenance_severity
X
X
X
X
Modified_condition_
count
X
X
X
X
Multiple_condition_
count
X
X
X
X
Global_data_comple
xity
Global_data_density
Normalized_cycloma
tic_compl
X
X
X
X
Precent_comments
X
X
X
X
Node_count
X
X
X
X
Number of code attribute
37
37
37
37
Number of Modul
342
266
759
1399
Number of defect modul
41
28
61
178
System
Language
Program
Dataset
LOC
Instruments a spacecraft
C
CM1
17K
Database
C
MW1
8K
Flight software for satellites
orbiting the Earth
C
PC1
26K
PC4
30K
TABLE 5.
AUC OF COMPARISON BETWEEN PREVIOUS MODELS
Model
CM1
MW1
PC1
PC4
Menzies (2011), NB
0,694
0,727
0,768
0,825
Requilme (2008), NB with SMOTE
0,739
0,751
0,793
0,858
Putri, Wahono (2015), NB with
SMOTE and IG
0,751
0,767
0,817
0,856
Propose Model, NB with SMOTE
and RLF
0,761
0,779
0,821
0,86
Results of the experiments are shown in Table 5 to produce
the best classification model in the dataset displayed in bold.
Shows the proposed model produces increased AUC values
compared to other models.
While in Figure 1 describes the a comparison chart of AUC
values for the four models of the four datasets NASA MDP.
Figure 1. Chart of Comparison AUC values between Previous Model
To know the difference any proposed model, then do a
comparison using the non-parametric statistical calculations
used for the computation of the classifier algorithm. Like,
friedman test.
AUC values model of NB, NB with SMOTE, NB with
SMOTE and IG, and NB with SMOTE and RLF compared
using friedman test described in Table 6.
TABLE 6.
THE P VALUE OF AUC COMPARISON FRIEDMAN TEST
NB
NB with
SMOTE
NB with
SMOTE and
IG
NB with
SMOTE
and RLF
NB (Menzies, 2011)
1
0,046
(Sig)
0,046
(Sig)
0,046
(Sig)
NB with SMOTE
(Riquelme, 2008)
0,046
(Sig)
1
0,317
(No Sig)
0,046
(Sig)
NB with SMOTE and
IG (Putri, 2015)
0,046
(Sig)
0,317
(No Sig)
1
0,046
(Sig)
NB with SMOTE and
RLF (Proposed
Model)
0,046
(Sig)
0,046
(Sig)
0,046
(Sig)
1
As shown in Table 6 shows the proposed model of model
NB with SMOTE and RLF having P value 0.046, then P
(0.05). So the NB model with SMOTE and RLF has significant
differences with the pure NB model. The model of his study
also has significant differences with NB, with each P value for
NB with SMOTE is 0.046, while P value NB with SMOTE and
IG is 0.046.
From these results, the SMOTE and RLF model applied to
the Naive Bayesian classification has better calculation
performance than the model that has been proposed with
previous researchers.
V. CONCLUSION
From the results of calculations on research application of
integration of sample method with the selection attribute of
SMOTE and RLF in Naive Bayes classification yields better
AUC value compared to the other model. SMOTE and RLF
model is superior to the two datasets of the four datasets used,
with a value of 78% in the MW1 and 82% datasets on the PC1
dataset.
Whereas when compared with models that have been
proposed by previous researchers, such as Naive Bayesian,
SMOTE on Naive Bayesian, SMOTE and IG on Naive
Bayesian. From the results of research the value of AUC
SMOTE and RLF on Naive Bayesian better performance than
the model in all dataset used in the other research.
This result can be concluded from comparison result using
friedman test, where P value is 0,046, which means P < α
(0,05).
But from these results, the use of sample techniques and
attribute selection algorithms in software defect prediction
research can be done in the next research development,
including:
1. For the selection of attributes in future studies may use
techniques wrapper on attribute selection methods.
2. In further research can use a combination of sample
technique with ensemble algorithm to improve the
performance of the classifier.
3. In further research can use other classifiers, such as
Logistic Regression, Neural Networks and SVM.
ACKNOWLEDGMENT
We should like to express our gratitude to RSW (Romi Satria
Wahono) Intelligent Research Group for warm discussion
about this research. Also for PPPM STMIK Nusa Mandiri
Jakarta and PPPM AMIK BSI Jakarta, which has supported us
to do this research.
REFERENCES
[1] A. B. de Carvalho, A. Pozo, and S. R. Vergilio, “A symbolic fault-
prediction model based on multiobjective particle swarm
optimization,” J. Syst. Softw., vol. 83, no. 5, pp. 868882, May
2010.
[2] C. Catal, “Software fault prediction: A literature review and current
trends,” Expert Syst. Appl., vol. 38, no. 4, pp. 46264636, Apr.
2011.
[3] Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, “A General
Software Defect-Proneness Prediction Framework,” IEEE Trans.
Softw. Eng., vol. 37, no. 3, pp. 356370, May 2011.
[4] T. Menzies, J. Greenwald, and A. Frank, “Data Mining Static Code
Attributes to Learn Defect Predictors,” IEEE Trans. Softw. Eng.,
vol. 33, no. 1, pp. 213, Jan. 2007.
[5] P. Domingos, “On the Optimality of the Simple Bayesian Classifier
under Zero-One Loss,” Mach. Learn., vol. 29, no. 23, pp. 103130,
1997.
[6] B. Turhan and A. Bener, “Analysis of Naive Bayes’ assumptions on
software fault data: An empirical study,” Data Knowl. Eng., vol. 68,
no. 2, pp. 278290, Feb. 2009.
[7] C. Andersson, “A replicated empirical study of a selection method
for software reliability growth models,” Empir. Softw. Eng., vol. 12,
no. 2, pp. 161182, Oct. 2006.
[8] T. M. Khoshgoftaar and K. Gao, “Feature Selection with
Imbalanced Data for Software Defect Prediction,” 2009 Int. Conf.
Mach. Learn. Appl., pp. 235240, Dec. 2009.
[9] M. Shepperd, Q. Song, Z. Sun, and C. Mair, “Data Quality : Some
Comments on the NASA Software Defect Data Sets,” Softw. Eng.
IEEE Trans., vol. 39, no. 9, pp. 113, 2013.
[10] B. W. Yap, K. A. Rani, H. Aryani, A. Rahman, S. Fong, Z.
Khairudin, and N. N. Abdullah, “An Application of Oversampling,
Undersampling, Bagging and Boosting in Handling Imbalanced
Datasets,” Proc. First Int. Conf. Adv. Data Inf. Eng., vol. 285, pp.
1323, 2014.
[11] Y. Liu, X. Yu, J. X. Huang, and A. An, “Combining integrated
sampling with SVM ensembles for learning from imbalanced
datasets,” Inf. Process. Manag., vol. 47, no. 4, pp. 617631, Jul.
2011.
[12] K. Gao and T. M. Khoshgoftaar, “Software Defect Prediction for
High-Dimensional and Class-Imbalanced Data,” Proc. 23rd Int.
Conf. Softw. Eng. Knowl. Eng., no. 2, 2011.
[13] J. C. Riquelme, R. Ruiz, and J. Moreno, “Finding Defective
Modules from Highly Unbalanced Datasets,” Engineering, vol. 2,
no. 1, pp. 6774, 2008.
[14] K. Gao, T. M. Khoshgoftaar, H. Wang, and N. Seliya, “Choosing
software metrics for defect prediction : an investigation on feature
selection techniques,” Softw. Pract. Exp., vol. 41, no. 5, pp. 579
606, 2011.
[15] N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,
“SMOTE : Synthetic Minority Over-sampling Technique,” J. Artif.
Intell., vol. 16, pp. 321357, 2002.
[16] S. A. Putri and R. S. Wahono, “Integrasi SMOTE dan Information
Gain pada Naive Bayes untuk Prediksi Cacat Software,” J. Softw.
Eng., vol. 1, no. 2, pp. 8691, 2015.
[17] K. Gao and T. M. Khoshgoftaar, “Software Defect Prediction for
High-Dimensional and Class-Imbalanced Data,” Conf. Proc. 23rd
Int. Conf. Softw. Eng. Knowl. Eng., no. 2, 2011.
[18] N. Japkowicz, “The Class Imbalance Problem : Significance and
Strategies,” Proc. the2000 Int. Conf. Artif. Intell. Spec. Track
Inductive Learn. Vegas, 2000.
[19] M. Jain and V. Richariya, “An Improved Techniques Based on
Naive Bayesian for Attack Detection,” Int. J. Emerg. Technol. Adv.
Eng., vol. 2, no. 1, pp. 324331, 2012.
[20] C. X. Ling, “Using AUC and Accuracy in Evaluating Learning
Algorithms,” pp. 1–31, 2003.
[21] C. X. Ling and H. Zhang, “AUC: a statistically consistent and more
discriminating measure than accuracy,” Proc. 18th Int. Jt. Conf.
Artif. Intell., 2003.
[22] J. Demsar, “Statistical Comparisons of Classifiers over Multiple
Data Sets,” J. Mach. Learn. Res., vol. 7, pp. 130, 2006.
[23] S. Lessmann, S. Member, B. Baesens, C. Mues, and S. Pietsch,
“Benchmarking Classification Models for Software Defect
Prediction : A Proposed Framework and Novel Findings,” IEEE
Trans. Softw. Eng., vol. 34, no. 4, pp. 485496, 2008.
[24] S. A. Putri and Wahono, “Integrasi SMOTE dan Information Gain
pada Naive Bayes untuk Prediksi Cacat Software,” J. Softw. Eng.,
vol. 1, no. 2, pp. 8691, 2015.
... The proposed approach led to a more accurate identification of faulty software modules. In [49], the researchers combined an integrated sampling method with FS that effectively addresses the problems of class imbalance and attribute relevance in SDP. Three attribute selection algorithms were used: chi-square (CS), information gain (IG), and relief (RLF). ...
... The fusion of these two metrics led to a further comprehensive performance measure, the socalled balance. In [2], [7], [33], [39], [40], [49], [54]- [56], [58], the AUC measure was included in the power analysis. In [5], the researchers used the Macro-F1 performance measure, which calculates the average of the F1 measure. ...
... Each dataset that comes from the NASA repository corresponds to a NASA software system or subsystem and includes static code metrics that capture corresponding defect data for individual modules. Over the past decade, these datasets have been used extensively in [1], [4], [7], [9], [14], [34], [38], [39], [48], [49], [52], [56], [61], [63], [65], [70], [72], [74]. Publicly available datasets from the PROMISE repository are used by the authors of [3], [8], [15], [33], [40], [51], [53], [55], [59], [66], [68], [69], [73]. ...
Article
Full-text available
Improving software quality by proactively detecting potential defects during development is a major goal of software engineering. Software defect prediction plays a central role in achieving this goal. The power of data analytics and machine learning allows us to focus our efforts where they are needed most. A key factor in the success of software fault prediction is selecting relevant features and reducing data dimensionality. Feature selection methods contribute by filtering out the most critical attributes from a plethora of potential features. These methods have the potential to significantly improve the accuracy and efficiency of fault prediction models. However, the field of feature selection in the context of software fault prediction is vast and constantly evolving, with a variety of techniques and tools available. Based on these considerations, our systematic literature review conducts a comprehensive investigation of feature selection methods used in the context of software fault prediction. The research uses a refined search strategy involving four reputable digital libraries, including IEEE Explore, Science Direct, ACM Digital Library, and Springer Link, to provide a comprehensive and exhaustive review through a rigorous analysis of 49 selected primary studies from 2014. The results highlight several important issues. First, there is a prevalence of filtering and hybrid feature selection methods. Second, single classifiers such as Naïve Bayes, Support Vector Machine, and Decision Tree, as well as ensemble classifiers such as Random Forest, Bagging, and AdaBoost are commonly used. Third, evaluation metrics such as area under the curve, accuracy, and F-measure are commonly used for performance evaluation. Finally, there is a clear preference for tools such as WEKA, MATLAB, and Python. By providing insights into current trends and practices in the field, this study offers valuable guidance to researchers and practitioners to make informed decisions to improve software fault prediction models and contribute to the overall improvement of software quality.
... Perangkat lunak yang berkualitas baik merupakan faktor pendukung yang penting dalam setiap lini pekerjaan (S. A. Putri & Frieyadie, 2017). Untuk itu diperlukan metode pengembangan system yang handal. ...
Article
Full-text available
Kenaikan kasus covid-19 sebesar 2,8% diindonesia terjadi sejak 27 desember 2020 walaupun penerapan protokol kesehatan sudah sangat efektif. Berdasarkan rekomendasi WHO penegakkan diagnosa covid-19 menggunakan metode RT-PCR SARSCoV-2 dan pemeriksaan wajib mengisi form 6 sesuai dengan keputusan menteri Kesehatan Indonesia sebagai Langkah tracing. Rumah Sakit dalam mengurangi kontak antar pasien menerapkan model pemeriksaan drive thru. Rumah sakit yang sudah memiliki SIM-RS harus memasukkan data ulang dari formulir 6 yang diisi pasien, sehingga bisa terjadi penumpukkan pada area pendaftaran terutama area parkir kendaraan. Hal ini berlaku juga untuk pengambilan sampling area Drive Thru sehingga membutuhkan waktu lebih lama dan tidak sesuai dengan kebijakan management rumah sakit pelayanan Drive Thru maksimal 15 menit. Sistem Informasi registrasi Drive Thru pemeriksaan laboratorium PCR Covid-19 yang Terintegrasi SIM-RS sangat membantu pelayanan rumah sakit dan sangat efektif diterapkan. Hasil observasi terdapat perbedaan yang signifikan rata-rata selisih waktu sebesar 25 menit/kendaraan, dengan rata-rata waktu sebesar 35 menit/kendaraan sebelum penggunaan aplikasi dan rata-rata 10 menit setelah penggunaan. Model drive thru juga dapat diterapkan bukan hanya untuk restoran tapi bisa diterapkan untuk pelayanan rumah sakit
... • the use of learning techniques to predict the existence of a software defect or vulnerability based on source code analysis (e.g., in [30] and [31], features do not refer to configurable systems, instead features refer to properties or metrics of the source code). ...
Article
Most modern software systems (operating systems like Linux or Android, Web browsers like Firefox or Chrome, video encoders like ffmpeg, x264 or VLC, mobile and cloud applications, etc.) are highly configurable. Hundreds of configuration options, features, or plugins can be combined, each potentially with distinct functionality and effects on execution time, security, energy consumption, etc. Due to the combinatorial explosion and the cost of executing software, it is quickly impossible to exhaustively explore the whole configuration space. Hence, numerous works have investigated the idea of learning it from a small sample of configurations’ measurements. The pattern ”sampling, measuring, learning” has emerged in the literature, with several practical interests for both software developers and end-users of configurable systems. In this systematic literature review, we report on the different application objectives (e.g., performance prediction, configuration optimization, constraint mining), use-cases, targeted software systems, and application domains. We review the various strategies employed to gather a representative and cost-effective sample. We describe automated software techniques used to measure functional and non-functional properties of configurations. We classify machine learning algorithms and how they relate to the pursued application. Finally, we also describe how researchers evaluate the quality of the learning process. The findings from this systematic review show that the potential application objective is important; there are a vast number of case studies reported in the literature related to particular domains or software systems. Yet, the huge variant space of configurable systems is still challenging and calls to further investigate the synergies between artificial intelligence and software engineering.
Chapter
An important goal of Software Defect Prediction is to help increase the efficiency of software development. Increase in defect prediction accuracy would result in less resource consumption, and hence, effective feature selection techniques are needed to provide better inputs to the classifier. A carefully selected subset of features can not only increase prediction accuracy, but also result in less expensive computation. Wrapper methods for feature selection are based on evaluating feature subsets with a predetermined criterion. In this paper, the authors suggest an approach for feature selection by carefully evaluating feature subsets using clustering-based method. Cost based feature selection method based on Self-Organizing maps (CFSSOM) can be divided into three steps. First, computing the subsets of the feature set which are to be considered for evaluation. Second, clustering each of those feature subsets into two clusters. We use an ANN-based learning algorithm called self-organizing maps for clustering feature subsets. Third, applying labels on those two clusters based on data representation and measuring how strongly they are related to original labels. We have successfully implemented and proved that this feature selection technique improves prediction results based on experiments on PROMISE repository datasets.
Article
Foraging as a natural visual search for multiple targets has increasingly been studied in humans in recent years. Here, we aimed to model the differences in foraging strategies between feature and conjunction foraging tasks found by Kristjánsson et al. (2014). Bundesen (1990) proposed the Theory of Visual Attention (TVA) as a computational model of attentional function that divides the selection process into filtering and pigeonholing. The theory describes a mechanism by which the strength of sensory evidence serves to categorize elements. We combined these ideas to train augmented Naïve Bayesian classifiers using data from Kristjánsson et al. (2014) as input. Specifically, we attempted to answer whether it is possible to predict how frequently observers switch between different target types during consecutive selections (switches) during feature and conjunction foraging using Bayesian classifiers. We formulated eleven new parameters that represent key sensory and bias information that could be used for each selection during the foraging task and tested them with multiple Bayesian models. Separate Bayesian networks were trained on feature and conjunction foraging data, and parameters that had no impact on the model's predictability were pruned away. We report high accuracy for switch prediction in both tasks from the classifiers, although the model for conjunction foraging was more accurate. We also report our Bayesian parameters in terms of their theoretical associations to TVA parameters, π_j (denoting the pertinence value) and β_i (denoting the decision-making bias).
Chapter
In today’s time, software quality assurance is the most essential and costly set of activities during software development in the information technology (IT) industries. Finding defects in system modules has always been one of the most relevant problems in software engineering, leading to increased costs and reduced confidence in the product, resulting in dissatisfaction with customer requirements. Therefore, to provide and deliver an efficient software product with as few defects as possible on time and of good quality, it is necessary to use machine learning techniques and models, such as supervised learning to accurately classify and predict defects in each of the software development life cycle (SDLC) phases before delivering a software product to the customer. The main objective is to evaluate the performance of different machine learning models in software defect prediction applied to 4 NASA datasets, such as CM1, JM1, KC1, and PC1, then de-terminate and select the best performing model using the MCDM: VIKOR multi-criteria decision-making method.
Article
Full-text available
Abstrak: Perangkat lunak memainkan peran penting banyak. Oleh karena itu, kewajiban untuk memastikan kualitas, seperti pengujian perangkat lunak dapat dianggap mendasar dan penting. Tapi di sisi lain, pengujian perangkat lunak adalah pekerjaan yang sangat mahal, baik dalam biaya dan waktu penggunaan. Oleh karena itu penting untuk sebuah perusahaan pengembangan perangkat lunak untuk melakukan pengujian kualitas perangkat lunak dengan biaya minimum. Naive Bayes pada prediksi cacat perangkat lunak telah menunjukkan kinerja yang baik dan menghsilkan probabilitas rata-rata 71 persen. Selain itu juga merupakan classifier yang sederhana dan waktu yang dibutuhkan dalam proses belajar mengajar lebih cepat dari algoritma pembelajaran mesin lainnya. NASA adalah dataset yang sangat populer digunakan dalam pengembangan model prediksi cacat software, umum dan dapat digunakan secara bebas oleh para peneliti. Dari penelitian yang dilakukan sebelumnya ada dua isu utama pada prediksi cacat perangkat lunak yaitu noise attribute dan imbalance class. Penerapan teknik SMOTE (Minority Synthetic Over-Sampling Technique) menghasilkan hasil yang baik dan efektif untuk menangani ketidakseimbangan kelas pada teknik oversampling untuk memproses kelas minoritas (positif). Dan Information Gain digunakan dalam pemilihan atribut untuk menangani kemungkinan noise attribute. Setelah dilakukan percobaan bahwa penerapan model SMOTE dan Information Gain terbukti menangani imbalance class dan noise attribute untuk prediksi cacat software.
Article
Full-text available
Background--Self-evidently empirical analyses rely upon the quality of their data. Likewise, replications rely upon accurate reporting and using the same rather than similar versions of datasets. In recent years, there has been much interest in using machine learners to classify software modules into defect-prone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research. Objective--This short note investigates the extent to which published analyses based on the NASA defect datasets are meaningful and comparable. Method--We analyze the five studies published in the IEEE Transactions on Software Engineering since 2007 that have utilized these datasets and compare the two versions of the datasets currently in use. Results--We find important differences between the two versions of the datasets, implausible values in one dataset and generally insufficient detail documented on dataset preprocessing. Conclusions--It is recommended that researchers 1) indicate the provenance of the datasets they use, 2) report any preprocessing in sufficient detail to enable meaningful replication, and 3) invest effort in understanding the data prior to applying machine learners.
Article
Full-text available
Many software engineering datasets are highly unbalanced, i.e., the number of instances of a one class outnumber the number of instances of the other class. In this work, we analyse two balancing tech-niques with two common classification algorithms using five open public datasets from the PROMISE repository in order to find defective mod-ules. The results show that although balancing techniques may not im-prove the percentage of correctly classified instances, they do improve the AUC measure, i.e., they classify better those instances that belong to the minority class from the minority class.
Conference Paper
Full-text available
In this paper, we study the learning impact of data sampling followed by attribute selection on the classification models built with binary class imbalanced data within the scenario of software quality engineering. We use a wrapper-based attribute ranking technique to select a subset of attributes, and the random undersampling technique (RUS) on the majority class to alleviate the negative effects of imbalanced data on the prediction models. The datasets used in the empirical study were collected from numerous software projects. Five data preprocessing scenarios were explored in these experiments, including: (1) training on the original, unaltered fit dataset, (2) training on a sampled version of the fit dataset, (3) training on an unsampled version of the fit dataset using only the attributes chosen by feature selection based on the unsampled fit dataset, (4) training on an unsampled version of the fit dataset using only the attributes chosen by feature selection based on a sampled version of the fit dataset, and (5) training on a sampled version of the fit dataset using only the attributes chosen by feature selection based on the sampled version of the fit dataset. We compared the performances of the classification models constructed over these five different scenarios. The results demonstrate that the classification models constructed on the sampled fit data with or without feature selection (case 2 and case 5) significantly outperformed the classification models built with the other cases (unsampled fit data). Moreover, the two scenarios using sampled data (case 2 and case 5) showed very similar performances, but the subset of attributes (case 5) is only around 15% or 30% of the complete set of attributes (case 2).
Article
Full-text available
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Article
In the literature the fault-proneness of classes or methods has been used to devise strategies for reducing testing costs and efforts. In general, fault-proneness is predicted through a set of design metrics and, most recently, by using Machine Learning (ML) techniques. However, some ML techniques cannot deal with unbalanced data, characteristic very common of the fault datasets and, their produced results are not easily interpreted by most programmers and testers. Considering these facts, this paper introduces a novel fault-prediction approach based on Multiobjective Particle Swarm Optimization (MOPSO). Exploring Pareto dominance concepts, the approach generates a model composed by rules with specific properties. These rules can be used as an unordered classifier, and because of this, they are more intuitive and comprehensible. Two experiments were accomplished, considering, respectively, fault-proneness of classes and methods. The results show interesting relationships between the studied metrics and fault prediction. In addition to this, the performance of the introduced MOPSO approach is compared with other ML algorithms by using several measures including the area under the ROC curve, which is a relevant criterion to deal with unbalanced data.
Conference Paper
Software quality and reliability can be improved using various techniques during the software development process. One effective method is to utilize software metrics and defect data collected during the software development life cycle and build defect predictors using data mining techniques to estimate the quality of target program modules. Such a strategy allows practitioners to intelligently allocate project resources and focus more on the potentially problematic modules. Effectiveness of a defect predictor is influenced, among other factors, by the quality of input data. Two problems which often arise in the software measurement and defect data are high dimensionality and class imbalance. This paper presents an approach for using feature selection and data sampling together to deal with the problems. Three scenarios are considered: 1) feature selection based on sampled data, and modeling based on original data; 2) feature selection based on sampled data, and modeling based on sampled data; and 3) feature selection based on original data, and modeling based on sampled data. Several software measurement data sets, obtained from the PROMISE repository, are used in the case study. The empirical results demonstrate that classification models built in scenario It result in significantly better performance than the models built in the other two scenarios.
Article
Software defect prediction is important for reducing test times by allocating testing resources effectively. In terms of predicting the defects in software, Naive Bayes outperforms a wide range of other methods. However, Naive Bayes assumes the ‘independence’ and ‘equal importance’ of attributes. In this work, we analyze these assumptions of Naive Bayes using public software defect data from NASA. Our analysis shows that independence assumption is not harmful for software defect data with PCA pre-processing. Our results also indicate that assigning weights to static code attributes may increase the prediction performance significantly, while removing the need for feature subset selection.
Article
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.