Conference PaperPDF Available

Combining integreted sampling technique with feature selection for software defect prediction

August 2017

August 2017

DOI:10.1109/CITSM.2017.8089264

Conference: 2017 5th International Conference on Cyber and IT Service Management (CITSM)

Authors:

Sukmawati anggraeni putri

Nusa Mandiri Institute

Good quality software is a supporting factor that is important in any line of work in of society. But the software component defective or damaged resulting in reduced performance of the work, and can increase the cost of development and maintenance. An accurate prediction on software module prone defects as part of efforts to reduce the increasing cost of development and maintenance of software. An accurate prediction on software module prone defects as part of efforts to reduce the increasing cost of development and maintenance of software. From the results of these studies are known, there are two problems that can decrease performance prediction of classifiers such imbalances in the distribution of the class and not all of the attributes that exist in the dataset are relevant. So as to handle both of these issues, we conducted this research using integrated a sample technique with feature selection method. Based on research done previously, there are two methods of samples including random under sampling and SMOTE for random over sampling. While on feature selection method such as chi square, information gain and relief methods. After doing the research process, integration SMOTE technique with relief method used on Naïve Bayes classifiers, the result of the predicted value better than any other method that is 82%.

…

. THE P VALUE OF AUC COMPARISON FRIEDMAN TEST

…

Figures - uploaded by Sukmawati anggraeni putri

Content may be subject to copyright.

Content uploaded by Sukmawati anggraeni putri

Content may be subject to copyright.

Combining Integreted Sampling Technique with

Feature Selection for Software Defect Prediction

Sukmawati Anggraeni Putri

STMIK Nusa Mandiri, Information System Program

Jakarta, Indonesia

sukmawati@nusamandiri.ac.id

Frieyadie

AMIK BSI Jakarta, Management Informatic Program

Jakarta, Indonesia

frieyadie@bsi.ac.id

Abstract—Good quality software is a supporting factor that is

important in any line of work in of society. But the software

component defective or damaged resulting in reduced

performance of the work, and can increase the cost of

development and maintenance. An accurate prediction on

software module prone defects as part of efforts to reduce the

increasing cost of development and maintenance of software. An

accurate prediction on software module prone defects as part of

efforts to reduce the increasing cost of development and

maintenance of software. From the results of these studies are

known, there are two problems that can decrease performance

prediction of classifiers such imbalances in the distribution of the

class and irrelevant of the attributes that exist in the dataset. So

as to handle both of these issues, we conducted this research

using integrated a sample technique with feature selection

method. Based on research done previously, there are two

methods of samples including random under sampling and

SMOTE for random over sampling. While on feature selection

method such as chi square, information gain and relief methods.

After doing the research process, integration SMOTE technique

with relief method used on Naïve Bayes classifiers, the result of

the predicted value better than any other method that is 82%.

Keywords—imbalance class, feature selection, software defect

prediction

I. INTRODUCTION

In the development of the use of software to support the

activities and the work increases, certainly the quality of the

software must be considered. But the software component

defective or damaged resulting in a decrease in customer

satisfaction, as well as an increase in the cost of development

and maintenance [1].

An accurate prediction on software module software

defects as part of effort to reduce the increasing cost of

development and maintenance of software that has been done

by previous researchers [2]. In this study focuses on 1)

estimating the amount of defect in the software, 2) find the

relationship of software defects, 3) classifying defect software

components, which defect module and non defect module [3].

While the software defect prediction research that has been

done by previous research such as Naïve Bayes classifier [4]

produce a good performance with an average probability of

71%. Naïve Bayesian is a simple classification [5] with a time

of learning process is faster than any other machine learning

[4]. Additionally it has a good reputation on the accuracy of

prediction [6]. However, this method is not optimal in the case

of having an unbalanced dataset [7].

The predicted performance of this method gets worse when

the dataset has an irrelevant attribute [8]. While NASA MPD

dataset [9] which have been used by previous researchers on

software predictions have unbalanced defect datasets with

attributes that are not all usable. To deal with unbalanced

datasets there are three approaches that can be used, including

data level (sample technique), algorithm level and ensamble

method [10].

In general, the sample technique is divided into two types,

including over sampling method is Random Over Sampling

[11]. While the under sampling method is Random Under

Sampling [12] and Resample method [13].

As for solving the problem of attributes that are irrelevant

using attribute selection methods such as Information Gain,

Chi Square, and Relief [14].

In this study, we propose to integrate the sample technique

with feature selection method to handle imbalance class and

attribute irrelevant to the Naïve Bayesian classification to

produce a better accuracy in the software defect prediction.

There are several steps done in this study, First, sample

technique to handle the imbalance class. Then, approaching the

selection attributes thrown clasifiying for software defect

predicition. Then calculating the validation and evaluation

technique to determine the proposed method is wheter it better

or not with the existing methods.

II. RELATED WORK

Research on the software defect prediction are one of the

research that has been done by previous researches. From these

studies it is known state of the art about software defect

prediction research that discusses the imbalance class.

As research done by Chawla [15] who proposed the use of

Synthetic Minority Oversampling Technique to handle the

class imbalance by using a Naïve Bayesian classifier,

2017 5TH INTERNATIONAL CONFERENCE ON CYBER AND IT SERVICE MANAGEMENT (CITSM)

978-1-5386-2739-6 @2017 IEEE

10.1109/CITSM.2017.8089264

implemented in eight different dataset from the UCI repository.

The results showed for all the data using SMOTE technique on

balancing process has a greater potential to improve the

performance of Naïve Bayesian and C.45 classifier use in the

classification process.

While the research done by Riquelme [13] which states that

the dataset in software engineering is very unbalanced.

Therefore to balance using two technique, including SMOTE

and Weka Randomly Resampling using J48 and Naïve

Bayesian classifier is applied to the five datasets form

PROMISE repository. The results show the approach SMOTE

able to increase the average AUC value of 11.6%. Based on

these results, balancing techniques can better classify minority

classes.

While the research done by Putri, Wahono [16] states that

NASA MDP dataset has unbalanced classes and attribute not

relevant. Using the balancing class SMOTE and feature

selection information gain, can improve the prediction results

are better than Riquelme research only to rebalance the dataset

class.

Furthermore the study done by Gao using one of the feature

selection algorithm which Relief that have been used in the

research done by Kira. The research Gao shows that the Relief

method as well as the Information Gain [17].

Therefore in this study implement an approach sample

techniques which Synthetic Minority Over-sampling

Technique (SMOTE) to reduce the influence of class

imbalance and improve the ability to predict the minority class.

Relief algorithm as well as for selection the relevant attribute.

It also use Naïve Bayes algorithm used in the classification

process.

III. METHODE

3.1. Sample Technique

The sample approach is one approach to solve the

problem of class imbalance in a dataset. The commonly used

sample approaches are over-sampling and under-sampling

techniques [10].

a. Over-Sampling Technique

Over-sampling causes excessive duplication in the

positive class cause over-fitting. Moreover, over-sampling

can increase the number of training dataset, thus causing

excessive computational costs [15].

Nevertheless, in research carried out by Chawla [15]

found Synthetic Minority Over-sampling Technique

(SMOTE) which produces artificially interpolated data on the

over-sampling in the minority. The algorithm is simulated by

finding k nearest to each minority sample, and then for each

neighbor, randomly pick a point on the line connecting

neighbors and sample itself. Finally, the data at that point is

entered as an example of the new minority. By adding new

minority sample into training data, is expected to over-fitting

can be resolved [15].

b. Under-sampling Technique

Under-sampling approaches have been reported to

outperform over-sampling approaches in previous literatures.

However, the under-sampling approach reduces the majority

class, perhaps losing useful information. This results in less

accurate predictions [11].

Sampling is done randomly, so the majority of the

sample is as large as the number of minority samples.

Meanwhile, the sample used in the under-skilled approach is

the majority sample that is under the sample [18] .

We implemented our proposed Random Under-

Sampling and SMOTE in the WEKA tool.

3.2. Feature Selection

At dataset software defects, attributes represent software

metrics taken from the source code of the software used in the

learning process. However, some attributes that are not

relevant to require the removal to improve the accuracy of

software defects prediction.

There are two algorithms used in the selection of attributes

that wrapper and a filter [17]. In the wrapper algorithm using

feedback from learning algorithm. While on the filter

algorithm, the training data are analyzed using methods that

do not require learning algorithms to determine the most

relevant attributes [17]. In this study only uses algorithms to

filter the selection attribute. Such as, chi-square (CS),

information gain (IG), and Relief algorithm (RLF) [17].

a. Chi Square (CS)

CS can evaluate attribute values by calculating the

statistical value related to the class. Statistical CS (also

symbolized as 𝜒2) is a nonparametric statistical techniques by

using nominal data (category) with the test frequency.

(1)

where 𝜒2 is the test statistic is asymptotically approaching the

𝜒2 distribution, Oi is the observed frequencies, and Ei is the

expected frequency. n is the number of possible outcomes of

each event.

b. Information Gain (IG)

In the IG is able to assess the importance atibut by

measuring the information gain associated with the class.

Generally IG estimates that the change in entropy of

information before the state took some information.

IG (Class,Attribute)=H(Class)−H(Class|Attribute) (2)

where H determine entropy. More specifically, suppose

that A is the set of all attributes and class attributes being

dependent of all the training examples, the value of (a, y)

with y ∈ Class defines the value of specific examples to

attribute a ∈ A, V is the set of attribute values, namely V =

{value (a, y) | a ∈ A ∩ y ∈ Class} and | s | is the number of

elements in the set s. G to attribute a∈. A defined as

follows:

(3)

c. Relief (RLF)

For a given sample R, Relief find the nearest neighbor of

the same class or different, which is called the 'nearest hit H'

and 'nearest miss M'. It will be updated estimate of the quality

of W [A] for all attributes A depending on their values for R,

F, and H. The process is repeated in accordance with the

value of m, where m is determined by the user. Diff function

(Attribute, Instance1, Instance2) clearly defined in

accordance with the type attribute. For discrate attributes are

defined as follows:

(4)

The underlying hypothesis is that the relevant attributes are

able to distinguish between things of different classes and

showed no difference between instances of the same class.

3.3. Naive Bayesian (NB) Classifier

Naïve Bayes assumes that the impact of a certain

class attribute value is independent of the values of other

attributes. This assumption is called the independent class

conditional. This is done to simplify the calculation involved,

and in this sense it is considered naive. Naïve Bayes allows

representation of dependencies among a subset of attributes

[19]. By mathematical calculation as follows:

(5)

The probability P(X1|C1), P(X2|Cj), …, P(Xn|Ci) can be easily

estimated from the training set. Given that Xk refers to the

attribute values for the sample X.

a. If Ak is a category, then P(Xk|Cj) is number of tuples in D

class Cj has a value Xk to attribute Ak, divided from |C1,D|,

number of class Cj tuples in D.

b. If Ak is a continuous value, it is usually assumed that

the values have a Gaussian distribution with mean (μ) and

standard deviation (σ), can be defined as follows:

(6)

While

(7)

We need to calculate and , where the mean and

standard deviation of the value attribute Ak for training

samples of class Cj.

3.4. Validation Technique

In this study using validation techniques 10 fold cross

validation, with resulting confusion matrix [20] which are

described in Table 1. In the confusion matrix, TN is true

negative results are classified (true negative). FN is a positive

result that is not properly classified as negative. TP is a

positive result correctly classified (true positive). FP is the

negative results are not correctly classified as positive (false

positive). TABLE 1.

CONFUSION MATRIX

Confusion matrix of values will produce the ROC curve

(Receive Operationg Characteristics) whose task is to

evaluate the performance of the classifier algorithm. Then the

Area Under the ROC as a reference for evaluating which

provides a summary of the performance of the classifier

algorithm [20]. Area Under the ROC (Receive Operating

Characteristic) (AUC) is a single value measurements are

derived from signal detection. AUC values range from 0 to 1.

The ROC curve is used to characterize the trade-offs between

true positive rate (TPR) and false positive rate (FPR). A

classifier that provides a large area under the curve is more of

a classifier with a smaller area under the curve [21].

3.5. Evaluation Technique

In the statistical evaluation consisted of testing

parametric and non-parametric test. As for testing the

significant difference of the classifier algorithm performance

using the non-parametric tests, such as tests friedman [22].

Friedman test is a non-parametric test that is equivalent

to the ANOVA parametric test. In the Friedman test ranking

algorithm for each data set separately, the algorithm

performance is good to be ranked first, while for the second-

best given. Friedman test carried out by the appropriate post

hoc test for comparison of more than one classifier with

multiple datasets [22].

Below will be shown on the test friedman formula:

(8)

Where

= khai value - the level of two-way squares friedman

N = amout of sample

K = the number of groups samples

1, 3, 12 = constanta

Class

Initial Value

True

False

Predicition

Value

True

False

IV. EXPERIMENT RESULT

4.1. Dataset

In this study, using a dataset of software metric (National

Aeronautics and Space Administration) MDP repository. They

are public datasets used by previous researchers in the field of

software defects prediction. NASA dataset MDP can be

obtained via the official website Wikispaces (http://nasa-

softwaredefectdatasets.wikispaces.com/). Dateset used in this

study consisted of CM1, MW1, PC1 and PC4 are described in

Table 2.

TABLE 2.

NASA MDP DATASET

As shown in Table 2, that each dataset consists of several

software modules, along with the number of errors and

attributes characteristic code. NASA dataset preprocessing

MDP has 38 attributes plus one attribute disabled or not

disabled (defective?). The attribute consists of an attribute type

Halstead, McCabe, Line of Code (LOC) and miscellaneous

attributes [23].

The dataset was obtained from NASA MDP software

matrices which are described in Table 3, as follows:

TABLE 3.

SPECIFICATIONS AND ATTRIBUTES NASA MDP

4.2. Implementation and Experiment Results

In this study using Naive Bayesian classifier algorithm at 4

dateset NASA MDP (CM1, MW1. PC1 and PC4). Classifier

algorithm will be applied on the integration sample technique

with a selection attribute method. Like, NB classifier with

SMOTE and CS, NB classifier with SMOTE and IG, NB

classifier with SMOTE and RLF, NB classifier with RUS +

CS, NB classifier with RUS and IG, and NB classifier with

RUS and RLF.

TABLE 4.

AUC VALUE

Clasification

CM1

MW1

PC1

PC4

0,694

0,727

0,768

0,825

NB with SMOTE and

0,766

0,759

0,734

0,856

NB with RUS and CS

0,752

0,722

0,79

0,859

NB with SMOTE and

0,751

0,767

0,817

0,856

NB with RUS and IG

0,753

0,722

0,79

0,859

NB with SMOTE and

RLF

0,761

0,779

0,821

0,86

NB with RUS and

RLF

0,755

0,747

0,793

0,878

In Table 4 shows the results AUC values were well on the

use of models NB with SMOTE and RLF on two datasets

(MW1, PC1). As for the CM1 dataset shows AUC good value

on NB with SMOTE and CS models. And for PC4 dataset

shows AUC good value on NB with RUS and RLF model.

4.3. Comparison Between Previous Models

To know that the proposed model has increased the

accuracy after the optimized use of integration between

sampling technique and feature selection algorithm, then do a

comparison between the proposed model and a model that has

been proposed by Menzies [4], Requilme [13] and Putri [24].

NASA MDP Dataset

CM1

MW1

PC1

PC4

LOC

Count

LOC_total

LOC_blank

LOC_code_and_com

ment

LOC_comment

LOC_executable

Number_of_lines

Halstead

Attributes

Content

Difficulity

Effort

Error_est

Length

Level

Prog_time

Volume

Num_operands

Num_operators

Num_unique_operan

Num_unique_operat

ors

McCabe

Attriutes

Cyclomatic_complex

ity

Cyclomatic_density

Design_complexity

Essential_complexity

Miscellan

eous

Attributes

(another)

Branch_count

Call_pairs

Condition_count

Decision_count

Decision_density

Design_density

Edge_count

Essential_density

Parameter_count

Maitenance_severity

Modified_condition_

count

Multiple_condition_

count

Global_data_comple

xity

Global_data_density

Normalized_cycloma

tic_compl

Precent_comments

Node_count

Number of code attribute

Number of Modul

342

266

759

1399

Number of defect modul

178

System

Language

Program

Dataset

LOC

Instruments a spacecraft

CM1

17K

Database

MW1

Flight software for satellites

orbiting the Earth

PC1

26K

PC4

30K

TABLE 5.

AUC OF COMPARISON BETWEEN PREVIOUS MODELS

Model

CM1

MW1

PC1

PC4

Menzies (2011), NB

0,694

0,727

0,768

0,825

Requilme (2008), NB with SMOTE

0,739

0,751

0,793

0,858

Putri, Wahono (2015), NB with

SMOTE and IG

0,751

0,767

0,817

0,856

Propose Model, NB with SMOTE

and RLF

0,761

0,779

0,821

0,86

Results of the experiments are shown in Table 5 to produce

the best classification model in the dataset displayed in bold.

Shows the proposed model produces increased AUC values

compared to other models.

While in Figure 1 describes the a comparison chart of AUC

values for the four models of the four datasets NASA MDP.

Figure 1. Chart of Comparison AUC values between Previous Model

To know the difference any proposed model, then do a

comparison using the non-parametric statistical calculations

used for the computation of the classifier algorithm. Like,

friedman test.

AUC values model of NB, NB with SMOTE, NB with

SMOTE and IG, and NB with SMOTE and RLF compared

using friedman test described in Table 6.

TABLE 6.

THE P VALUE OF AUC COMPARISON FRIEDMAN TEST

NB with

SMOTE

NB with

SMOTE and

NB with

SMOTE

and RLF

NB (Menzies, 2011)

0,046

(Sig)

0,046

(Sig)

0,046

(Sig)

NB with SMOTE

(Riquelme, 2008)

0,046

(Sig)

0,317

(No Sig)

0,046

(Sig)

NB with SMOTE and

IG (Putri, 2015)

0,046

(Sig)

0,317

(No Sig)

0,046

(Sig)

NB with SMOTE and

RLF (Proposed

Model)

0,046

(Sig)

0,046

(Sig)

0,046

(Sig)

As shown in Table 6 shows the proposed model of model

NB with SMOTE and RLF having P value 0.046, then P <α

(0.05). So the NB model with SMOTE and RLF has significant

differences with the pure NB model. The model of his study

also has significant differences with NB, with each P value for

NB with SMOTE is 0.046, while P value NB with SMOTE and

IG is 0.046.

From these results, the SMOTE and RLF model applied to

the Naive Bayesian classification has better calculation

performance than the model that has been proposed with

previous researchers.

V. CONCLUSION

From the results of calculations on research application of

integration of sample method with the selection attribute of

SMOTE and RLF in Naive Bayes classification yields better

AUC value compared to the other model. SMOTE and RLF

model is superior to the two datasets of the four datasets used,

with a value of 78% in the MW1 and 82% datasets on the PC1

dataset.

Whereas when compared with models that have been

proposed by previous researchers, such as Naive Bayesian,

SMOTE on Naive Bayesian, SMOTE and IG on Naive

Bayesian. From the results of research the value of AUC

SMOTE and RLF on Naive Bayesian better performance than

the model in all dataset used in the other research.

This result can be concluded from comparison result using

friedman test, where P value is 0,046, which means P < α

(0,05).

But from these results, the use of sample techniques and

attribute selection algorithms in software defect prediction

research can be done in the next research development,

including:

1. For the selection of attributes in future studies may use

techniques wrapper on attribute selection methods.

2. In further research can use a combination of sample

technique with ensemble algorithm to improve the

performance of the classifier.

3. In further research can use other classifiers, such as

Logistic Regression, Neural Networks and SVM.

ACKNOWLEDGMENT

We should like to express our gratitude to RSW (Romi Satria

Wahono) Intelligent Research Group for warm discussion

about this research. Also for PPPM STMIK Nusa Mandiri

Jakarta and PPPM AMIK BSI Jakarta, which has supported us

to do this research.

REFERENCES

[1] A. B. de Carvalho, A. Pozo, and S. R. Vergilio, “A symbolic fault-

prediction model based on multiobjective particle swarm

optimization,” J. Syst. Softw., vol. 83, no. 5, pp. 868–882, May

2010.

[2] C. Catal, “Software fault prediction: A literature review and current

trends,” Expert Syst. Appl., vol. 38, no. 4, pp. 4626–4636, Apr.

2011.

[3] Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, “A General

Software Defect-Proneness Prediction Framework,” IEEE Trans.

Softw. Eng., vol. 37, no. 3, pp. 356–370, May 2011.

[4] T. Menzies, J. Greenwald, and A. Frank, “Data Mining Static Code

Attributes to Learn Defect Predictors,” IEEE Trans. Softw. Eng.,

vol. 33, no. 1, pp. 2–13, Jan. 2007.

[5] P. Domingos, “On the Optimality of the Simple Bayesian Classifier

under Zero-One Loss,” Mach. Learn., vol. 29, no. 2–3, pp. 103–130,

1997.

[6] B. Turhan and A. Bener, “Analysis of Naive Bayes’ assumptions on

software fault data: An empirical study,” Data Knowl. Eng., vol. 68,

no. 2, pp. 278–290, Feb. 2009.

[7] C. Andersson, “A replicated empirical study of a selection method

for software reliability growth models,” Empir. Softw. Eng., vol. 12,

no. 2, pp. 161–182, Oct. 2006.

[8] T. M. Khoshgoftaar and K. Gao, “Feature Selection with

Imbalanced Data for Software Defect Prediction,” 2009 Int. Conf.

Mach. Learn. Appl., pp. 235–240, Dec. 2009.

[9] M. Shepperd, Q. Song, Z. Sun, and C. Mair, “Data Quality : Some

Comments on the NASA Software Defect Data Sets,” Softw. Eng.

IEEE Trans., vol. 39, no. 9, pp. 1–13, 2013.

[10] B. W. Yap, K. A. Rani, H. Aryani, A. Rahman, S. Fong, Z.

Khairudin, and N. N. Abdullah, “An Application of Oversampling,

Undersampling, Bagging and Boosting in Handling Imbalanced

Datasets,” Proc. First Int. Conf. Adv. Data Inf. Eng., vol. 285, pp.

13–23, 2014.

[11] Y. Liu, X. Yu, J. X. Huang, and A. An, “Combining integrated

sampling with SVM ensembles for learning from imbalanced

datasets,” Inf. Process. Manag., vol. 47, no. 4, pp. 617–631, Jul.

2011.

[12] K. Gao and T. M. Khoshgoftaar, “Software Defect Prediction for

High-Dimensional and Class-Imbalanced Data,” Proc. 23rd Int.

Conf. Softw. Eng. Knowl. Eng., no. 2, 2011.

[13] J. C. Riquelme, R. Ruiz, and J. Moreno, “Finding Defective

Modules from Highly Unbalanced Datasets,” Engineering, vol. 2,

no. 1, pp. 67–74, 2008.

[14] K. Gao, T. M. Khoshgoftaar, H. Wang, and N. Seliya, “Choosing

software metrics for defect prediction : an investigation on feature

selection techniques,” Softw. Pract. Exp., vol. 41, no. 5, pp. 579–

606, 2011.

[15] N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer,

“SMOTE : Synthetic Minority Over-sampling Technique,” J. Artif.

Intell., vol. 16, pp. 321–357, 2002.

[16] S. A. Putri and R. S. Wahono, “Integrasi SMOTE dan Information

Gain pada Naive Bayes untuk Prediksi Cacat Software,” J. Softw.

Eng., vol. 1, no. 2, pp. 86–91, 2015.

[17] K. Gao and T. M. Khoshgoftaar, “Software Defect Prediction for

High-Dimensional and Class-Imbalanced Data,” Conf. Proc. 23rd

Int. Conf. Softw. Eng. Knowl. Eng., no. 2, 2011.

[18] N. Japkowicz, “The Class Imbalance Problem : Significance and

Strategies,” Proc. the2000 Int. Conf. Artif. Intell. Spec. Track

Inductive Learn. Vegas, 2000.

[19] M. Jain and V. Richariya, “An Improved Techniques Based on

Naive Bayesian for Attack Detection,” Int. J. Emerg. Technol. Adv.

Eng., vol. 2, no. 1, pp. 324–331, 2012.

[20] C. X. Ling, “Using AUC and Accuracy in Evaluating Learning

Algorithms,” pp. 1–31, 2003.

[21] C. X. Ling and H. Zhang, “AUC: a statistically consistent and more

discriminating measure than accuracy,” Proc. 18th Int. Jt. Conf.

Artif. Intell., 2003.

[22] J. Demsar, “Statistical Comparisons of Classifiers over Multiple

Data Sets,” J. Mach. Learn. Res., vol. 7, pp. 1–30, 2006.

[23] S. Lessmann, S. Member, B. Baesens, C. Mues, and S. Pietsch,

“Benchmarking Classification Models for Software Defect

Prediction : A Proposed Framework and Novel Findings,” IEEE

Trans. Softw. Eng., vol. 34, no. 4, pp. 485–496, 2008.

[24] S. A. Putri and Wahono, “Integrasi SMOTE dan Information Gain

pada Naive Bayes untuk Prediksi Cacat Software,” J. Softw. Eng.,

vol. 1, no. 2, pp. 86–91, 2015.

Analysis of Feature Selection Methods in Software Defect Prediction Models

Article

Full-text available

Jan 2023

Improving software quality by proactively detecting potential defects during development is a major goal of software engineering. Software defect prediction plays a central role in achieving this goal. The power of data analytics and machine learning allows us to focus our efforts where they are needed most. A key factor in the success of software fault prediction is selecting relevant features and reducing data dimensionality. Feature selection methods contribute by filtering out the most critical attributes from a plethora of potential features. These methods have the potential to significantly improve the accuracy and efficiency of fault prediction models. However, the field of feature selection in the context of software fault prediction is vast and constantly evolving, with a variety of techniques and tools available. Based on these considerations, our systematic literature review conducts a comprehensive investigation of feature selection methods used in the context of software fault prediction. The research uses a refined search strategy involving four reputable digital libraries, including IEEE Explore, Science Direct, ACM Digital Library, and Springer Link, to provide a comprehensive and exhaustive review through a rigorous analysis of 49 selected primary studies from 2014. The results highlight several important issues. First, there is a prevalence of filtering and hybrid feature selection methods. Second, single classifiers such as Naïve Bayes, Support Vector Machine, and Decision Tree, as well as ensemble classifiers such as Random Forest, Bagging, and AdaBoost are commonly used. Third, evaluation metrics such as area under the curve, accuracy, and F-measure are commonly used for performance evaluation. Finally, there is a clear preference for tools such as WEKA, MATLAB, and Python. By providing insights into current trends and practices in the field, this study offers valuable guidance to researchers and practitioners to make informed decisions to improve software fault prediction models and contribute to the overall improvement of software quality.

REGISTRASI DRIVE THRU UNTUK PEMERIKSAAN LABORATORIUM PCR COVID-19 TERINTEGRASI SIM-RS

Article

Full-text available

Apr 2022

Kenaikan kasus covid-19 sebesar 2,8% diindonesia terjadi sejak 27 desember 2020 walaupun penerapan protokol kesehatan sudah sangat efektif. Berdasarkan rekomendasi WHO penegakkan diagnosa covid-19 menggunakan metode RT-PCR SARSCoV-2 dan pemeriksaan wajib mengisi form 6 sesuai dengan keputusan menteri Kesehatan Indonesia sebagai Langkah tracing. Rumah Sakit dalam mengurangi kontak antar pasien menerapkan model pemeriksaan drive thru. Rumah sakit yang sudah memiliki SIM-RS harus memasukkan data ulang dari formulir 6 yang diisi pasien, sehingga bisa terjadi penumpukkan pada area pendaftaran terutama area parkir kendaraan. Hal ini berlaku juga untuk pengambilan sampling area Drive Thru sehingga membutuhkan waktu lebih lama dan tidak sesuai dengan kebijakan management rumah sakit pelayanan Drive Thru maksimal 15 menit. Sistem Informasi registrasi Drive Thru pemeriksaan laboratorium PCR Covid-19 yang Terintegrasi SIM-RS sangat membantu pelayanan rumah sakit dan sangat efektif diterapkan. Hasil observasi terdapat perbedaan yang signifikan rata-rata selisih waktu sebesar 25 menit/kendaraan, dengan rata-rata waktu sebesar 35 menit/kendaraan sebelum penggunaan aplikasi dan rata-rata 10 menit setelah penggunaan. Model drive thru juga dapat diterapkan bukan hanya untuk restoran tapi bisa diterapkan untuk pelayanan rumah sakit

Learning Software Configuration Spaces: A Systematic Literature Review

Article

Aug 2021
J SYST SOFTWARE

Most modern software systems (operating systems like Linux or Android, Web browsers like Firefox or Chrome, video encoders like ffmpeg, x264 or VLC, mobile and cloud applications, etc.) are highly configurable. Hundreds of configuration options, features, or plugins can be combined, each potentially with distinct functionality and effects on execution time, security, energy consumption, etc. Due to the combinatorial explosion and the cost of executing software, it is quickly impossible to exhaustively explore the whole configuration space. Hence, numerous works have investigated the idea of learning it from a small sample of configurations’ measurements. The pattern ”sampling, measuring, learning” has emerged in the literature, with several practical interests for both software developers and end-users of configurable systems. In this systematic literature review, we report on the different application objectives (e.g., performance prediction, configuration optimization, constraint mining), use-cases, targeted software systems, and application domains. We review the various strategies employed to gather a representative and cost-effective sample. We describe automated software techniques used to measure functional and non-functional properties of configurations. We classify machine learning algorithms and how they relate to the pursued application. Finally, we also describe how researchers evaluate the quality of the learning process. The findings from this systematic review show that the potential application objective is important; there are a vast number of case studies reported in the literature related to particular domains or software systems. Yet, the huge variant space of configurable systems is still challenging and calls to further investigate the synergies between artificial intelligence and software engineering.

Class Balancing Approaches in Dataset for Software Defect Prediction: A Systematic Literature Review

Conference Paper

Nov 2023

A Survey on Feature Selection in Imbalanced Data for Software Defect Prediction

Conference Paper

Dec 2023

A Feature Selection Technique Using Self-Organizing Maps for Software Defect Prediction

Chapter

Jan 2024

An important goal of Software Defect Prediction is to help increase the efficiency of software development. Increase in defect prediction accuracy would result in less resource consumption, and hence, effective feature selection techniques are needed to provide better inputs to the classifier. A carefully selected subset of features can not only increase prediction accuracy, but also result in less expensive computation. Wrapper methods for feature selection are based on evaluating feature subsets with a predetermined criterion. In this paper, the authors suggest an approach for feature selection by carefully evaluating feature subsets using clustering-based method. Cost based feature selection method based on Self-Organizing maps (CFSSOM) can be divided into three steps. First, computing the subsets of the feature set which are to be considered for evaluation. Second, clustering each of those feature subsets into two clusters. We use an ANN-based learning algorithm called self-organizing maps for clustering feature subsets. Third, applying labels on those two clusters based on data representation and measuring how strongly they are related to original labels. We have successfully implemented and proved that this feature selection technique improves prediction results based on experiments on PROMISE repository datasets.

Empirical evaluation of the performance of data sampling and feature selection techniques for software fault prediction

Article

Mar 2023
EXPERT SYST APPL

EXPRESS: Bayesian approximations to the theory of visual attention (TVA) in a foraging task

Article

Apr 2022

Foraging as a natural visual search for multiple targets has increasingly been studied in humans in recent years. Here, we aimed to model the differences in foraging strategies between feature and conjunction foraging tasks found by Kristjánsson et al. (2014). Bundesen (1990) proposed the Theory of Visual Attention (TVA) as a computational model of attentional function that divides the selection process into filtering and pigeonholing. The theory describes a mechanism by which the strength of sensory evidence serves to categorize elements. We combined these ideas to train augmented Naïve Bayesian classifiers using data from Kristjánsson et al. (2014) as input. Specifically, we attempted to answer whether it is possible to predict how frequently observers switch between different target types during consecutive selections (switches) during feature and conjunction foraging using Bayesian classifiers. We formulated eleven new parameters that represent key sensory and bias information that could be used for each selection during the foraging task and tested them with multiple Bayesian models. Separate Bayesian networks were trained on feature and conjunction foraging data, and parameters that had no impact on the model's predictability were pruned away. We report high accuracy for switch prediction in both tasks from the classifiers, although the model for conjunction foraging was more accurate. We also report our Bayesian parameters in terms of their theoretical associations to TVA parameters, π_j (denoting the pertinence value) and β_i (denoting the decision-making bias).

A Machine Learning Model Comparison and Selection Framework for Software Defect Prediction Using VIKOR

Chapter

Jan 2022

In today’s time, software quality assurance is the most essential and costly set of activities during software development in the information technology (IT) industries. Finding defects in system modules has always been one of the most relevant problems in software engineering, leading to increased costs and reduced confidence in the product, resulting in dissatisfaction with customer requirements. Therefore, to provide and deliver an efficient software product with as few defects as possible on time and of good quality, it is necessary to use machine learning techniques and models, such as supervised learning to accurately classify and predict defects in each of the software development life cycle (SDLC) phases before delivering a software product to the customer. The main objective is to evaluate the performance of different machine learning models in software defect prediction applied to 4 NASA datasets, such as CM1, JM1, KC1, and PC1, then de-terminate and select the best performing model using the MCDM: VIKOR multi-criteria decision-making method.

An Empirical Study on The Impact of The Interaction between Feature Selection and Sampling in Defect Prediction

Conference Paper

Nov 2020

Integrasi SMOTE dan Information Gain pada Naive Bayes untuk Prediksi Cacat Software

Article

Full-text available

Dec 2015
J Software Eng

Abstrak: Perangkat lunak memainkan peran penting banyak. Oleh karena itu, kewajiban untuk memastikan kualitas, seperti pengujian perangkat lunak dapat dianggap mendasar dan penting. Tapi di sisi lain, pengujian perangkat lunak adalah pekerjaan yang sangat mahal, baik dalam biaya dan waktu penggunaan. Oleh karena itu penting untuk sebuah perusahaan pengembangan perangkat lunak untuk melakukan pengujian kualitas perangkat lunak dengan biaya minimum. Naive Bayes pada prediksi cacat perangkat lunak telah menunjukkan kinerja yang baik dan menghsilkan probabilitas rata-rata 71 persen. Selain itu juga merupakan classifier yang sederhana dan waktu yang dibutuhkan dalam proses belajar mengajar lebih cepat dari algoritma pembelajaran mesin lainnya. NASA adalah dataset yang sangat populer digunakan dalam pengembangan model prediksi cacat software, umum dan dapat digunakan secara bebas oleh para peneliti. Dari penelitian yang dilakukan sebelumnya ada dua isu utama pada prediksi cacat perangkat lunak yaitu noise attribute dan imbalance class. Penerapan teknik SMOTE (Minority Synthetic Over-Sampling Technique) menghasilkan hasil yang baik dan efektif untuk menangani ketidakseimbangan kelas pada teknik oversampling untuk memproses kelas minoritas (positif). Dan Information Gain digunakan dalam pemilihan atribut untuk menangani kemungkinan noise attribute. Setelah dilakukan percobaan bahwa penerapan model SMOTE dan Information Gain terbukti menangani imbalance class dan noise attribute untuk prediksi cacat software.

Data Quality: Some Comments on the NASA Software Defect Datasets

Article

Full-text available

Sep 2013

Background--Self-evidently empirical analyses rely upon the quality of their data. Likewise, replications rely upon accurate reporting and using the same rather than similar versions of datasets. In recent years, there has been much interest in using machine learners to classify software modules into defect-prone and not defect-prone categories. The publicly available NASA datasets have been extensively used as part of this research. Objective--This short note investigates the extent to which published analyses based on the NASA defect datasets are meaningful and comparable. Method--We analyze the five studies published in the IEEE Transactions on Software Engineering since 2007 that have utilized these datasets and compare the two versions of the datasets currently in use. Results--We find important differences between the two versions of the datasets, implausible values in one dataset and generally insufficient detail documented on dataset preprocessing. Conclusions--It is recommended that researchers 1) indicate the provenance of the datasets they use, 2) report any preprocessing in sufficient detail to enable meaningful replication, and 3) invest effort in understanding the data prior to applying machine learners.

Finding Defective Modules from Highly Unbalanced Datasets

Article

Full-text available

Jan 2008

Many software engineering datasets are highly unbalanced, i.e., the number of instances of a one class outnumber the number of instances of the other class. In this work, we analyse two balancing tech-niques with two common classification algorithms using five open public datasets from the PROMISE repository in order to find defective mod-ules. The results show that although balancing techniques may not im-prove the percentage of correctly classified instances, they do improve the AUC measure, i.e., they classify better those instances that belong to the minority class from the minority class.

Feature Selection with Imbalanced Data for Software Defect Prediction

Conference Paper

Full-text available

Dec 2009

In this paper, we study the learning impact of data sampling followed by attribute selection on the classification models built with binary class imbalanced data within the scenario of software quality engineering. We use a wrapper-based attribute ranking technique to select a subset of attributes, and the random undersampling technique (RUS) on the majority class to alleviate the negative effects of imbalanced data on the prediction models. The datasets used in the empirical study were collected from numerous software projects. Five data preprocessing scenarios were explored in these experiments, including: (1) training on the original, unaltered fit dataset, (2) training on a sampled version of the fit dataset, (3) training on an unsampled version of the fit dataset using only the attributes chosen by feature selection based on the unsampled fit dataset, (4) training on an unsampled version of the fit dataset using only the attributes chosen by feature selection based on a sampled version of the fit dataset, and (5) training on a sampled version of the fit dataset using only the attributes chosen by feature selection based on the sampled version of the fit dataset. We compared the performances of the classification models constructed over these five different scenarios. The results demonstrate that the classification models constructed on the sampled fit data with or without feature selection (case 2 and case 5) significantly outperformed the classification models built with the other cases (unsampled fit data). Moreover, the two scenarios using sampled data (case 2 and case 5) showed very similar performances, but the subset of attributes (case 5) is only around 15% or 30% of the complete set of attributes (case 2).

SMOTE: Synthetic Minority Over-sampling Technique

Article

Full-text available

Jun 2002
JAIR

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

A General Software Defect-Proneness Prediction Framework

Article

Jul 2011

A symbolic fault-prediction model based on multiobjective particle swarm optimization

Article

May 2010
J SYST SOFTWARE

In the literature the fault-proneness of classes or methods has been used to devise strategies for reducing testing costs and efforts. In general, fault-proneness is predicted through a set of design metrics and, most recently, by using Machine Learning (ML) techniques. However, some ML techniques cannot deal with unbalanced data, characteristic very common of the fault datasets and, their produced results are not easily interpreted by most programmers and testers. Considering these facts, this paper introduces a novel fault-prediction approach based on Multiobjective Particle Swarm Optimization (MOPSO). Exploring Pareto dominance concepts, the approach generates a model composed by rules with specific properties. These rules can be used as an unordered classifier, and because of this, they are more intuitive and comprehensible. Two experiments were accomplished, considering, respectively, fault-proneness of classes and methods. The results show interesting relationships between the studied metrics and fault prediction. In addition to this, the performance of the introduced MOPSO approach is compared with other ML algorithms by using several measures including the area under the ROC curve, which is a relevant criterion to deal with unbalanced data.

Software Defect Prediction for High-Dimensional and Class-Imbalanced Data.

Conference Paper

Jan 2011

Software quality and reliability can be improved using various techniques during the software development process. One effective method is to utilize software metrics and defect data collected during the software development life cycle and build defect predictors using data mining techniques to estimate the quality of target program modules. Such a strategy allows practitioners to intelligently allocate project resources and focus more on the potentially problematic modules. Effectiveness of a defect predictor is influenced, among other factors, by the quality of input data. Two problems which often arise in the software measurement and defect data are high dimensionality and class imbalance. This paper presents an approach for using feature selection and data sampling together to deal with the problems. Three scenarios are considered: 1) feature selection based on sampled data, and modeling based on original data; 2) feature selection based on sampled data, and modeling based on sampled data; and 3) feature selection based on original data, and modeling based on sampled data. Several software measurement data sets, obtained from the PROMISE repository, are used in the case study. The empirical results demonstrate that classification models built in scenario It result in significantly better performance than the models built in the other two scenarios.

Analysis of Naive Bayes' assumptions on software fault data: An empirical study

Article

Feb 2009
DATA KNOWL ENG

Software defect prediction is important for reducing test times by allocating testing resources effectively. In terms of predicting the defects in software, Naive Bayes outperforms a wide range of other methods. However, Naive Bayes assumes the ‘independence’ and ‘equal importance’ of attributes. In this work, we analyze these assumptions of Naive Bayes using public software defect data from NASA. Our analysis shows that independence assumption is not harmful for software defect data with PCA pre-processing. Our results also indicate that assigning weights to static code attributes may increase the prediction performance significantly, while removing the need for feature subset selection.

Statistical Comparisons of Classifiers over Multiple Data Sets

Article

Jan 2006

Janez Demsar

While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.

Combining integreted sampling technique with feature selection for software defect prediction

Abstract and Figures

Recommended publications

Information-theoretic algorithm for feature selection

Random Resampling in the One-Versus-All Strategy for Handling Multi-class Problems

Konuşma tabanlı duygu tanımada ön işleme ve öznitelik seçim yöntemlerinin etkisi - The impact of pre...

Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy