Content uploaded by Jayalath Ekanayake
Author content
All content in this area was uploaded by Jayalath Ekanayake on Aug 18, 2023
Content may be subject to copyright.
CNN-Based Priority Prediction of Bug Reports
R.M.D.S Rathnayake
Department of Computing and
Information Systems,
Sabaragamuwa University of Sri Lanka
Belihuloya, Sri Lanka
dilkisr@gmail.com
B.T.G.S Kumara
Department of Computing and
Information Systems,
Sabaragamuwa University of Sri Lanka
Belihuloya, Sri Lanka
btgsk2000@gmail.com
E.M.U.W.J.B Ekanayake
Department of Computer Science and
Informatics,
Uva Wellassa University of Sri Lanka
Sri Lanka
jayalath@uwu.ac.lk
Abstract— When considering software maintenance, priority
prediction is an essential part of it. Thousands of bugs are
reported daily in the Bug Tracking System (BTS); Bugzilla,
JIRA, and GitHub are commonly used. Priority assignment for
the reported bugs is conducted manually. Therefore, this task
takes considerable time to do, and there is also a high
possibility of making a mistake. Therefore, it is imperative to
have a way to predict the bug report's priority automatically.
Our study proposed a model based on the Convolutional
Neural Network (CNN) to predict the bug report's priority.
First, preprocess the textual content in bug reports using
natural language processing (NLP) approaches. Then extract
the features from the textual context (short description) using
the Bag-of-word feature extraction method. Finally, train a
CNN-based classifier to make priority predictions based on its
input. Then our result is compared with the Support Vector
Machine (SVM) and Temporal Convolutional Network (TCN)
to find a better model for priority prediction. The final results
show that the proposed approach based on the CNN classifier
performs better than the other approaches, and it shows a 71%
accuracy while others have low accuracy, like 63% and 48%
for SVM and TCN, respectively. The proposed model's
performance was evaluated using the Bugzilla dataset, which
included over 25,000 bug reports.
Keywords— Bug reports, Priority prediction, CNN, TCN,
SVM
I. INTRODUCTION
Both open and closed software projects have many bugs,
and due to these bugs, the quality and performance of the
software systems may decrease. In practice, it is impossible
to create defect-free software. Bugs are common in
software, and many projects will be delivered with defects
[1]. In order to enhance the next version of the system, the
developers let the users report bugs in the bug tracking
system (BTS). Developers can use BTS to manage bug
reports and triage them [2]. Users support developers in the
resolution of reported defects by reporting bugs. It is the
common procedure in the software maintenance procedure
[3]. One of the essential phases of software development is
software maintenance. The resolution of reported issues is
important, expansive, and critical because the complex
systems have explosive growth [4].
A bug report includes details that help to regenerate the
bug. A standard bug report includes several predefined
fields: bugID, creation time, summary, description, bug
status, resolution, priority, and severity. The severity of a
defect, on the other hand, is an effect and scope on the
system; meanwhile, a bug's priority defines the order in
which the developer should resolve it. A bug report's
priority in Bugzilla is assigned to it on a range of P1 to P5,
with P1 being the highest priority and P5 being the lowest
priority. The bug report's severity might vary from trivial,
minor, normal, major, critical, or blocker. After reporting
the bug, a trigger examines the reported bug and decides and
manually assigns the priority and severity to bugs. This
process is known as bug triaging, and it is a manual process
that requires considerable time to resolve a bug [4].
In some cases, priority and severity are left blank due to
a lack of experience and technical understanding. Bug
triaging is time-consuming and needs domain knowledge.
As a result, determining priority and severity should be
automated.
Different methods to automate the severity prediction of
bug reports have been developed by researchers [5–12].
Most methods use standard machine learning algorithms
such as Naïve Bayes (NB), decision trees, support vector
machines (SVM), and j48. Different methods based on the
Deep Learning approach to automate the severity prediction
of bug reports have been developed by researchers [2], [14],
and [15]. Most of them used a CNN-based approach with
some machine learning algorithms to compare the
performance of proposed approaches.
Various automated techniques to predict the bug report's
priority has been presented to automate the process [16-18].
However, their performance is inaccurate, and such an
approach's performance requires significant improvement. A
variety of Neural Network-Based automated techniques to
predict the bug report's priority has been presented to
automate the process [1], and [18-21].
However, the thing is, the performance of these
approaches needs to be significantly improved. W. Y.
Ramay et al. [2] researched the emotions of the reporters to
predict the severity. According to Umer et al. [3], the
number of reporters with unfavorable feelings is more
severe than non-severe bugs. As a result, reports are
expressive when writing bugs. Such sentiments (emotions)
could aid in prioritizing and predicting the bug report's
severity.
The following is a breakdown of the structure of the
paper. The literature review is described in Part II. Then, in
Part III, we go over the details of our proposed approach.
The proposed approach's evaluation process and results are
described in Part IV. The threats are explained in Part V.
Part VI concludes this article. Many researchers proposed
several approaches to prioritize the bug reports. Most of
them seem to give good results. Recent studies consider the
emotional analysis in bug reports' priority predictions.
II. LITERATURE REVIEW
The researcher proposed traditional and novel
approaches in their previous studies to predict the bug
2021 International Conference on Decision Aid Sciences and Application (DASA)
978-1-6654-1634-4/21/$31.00 ©2021 IEEE 299
reports' priority and severity. Most of the traditional
approaches are proposed using machine learning algorithms.
In recent years researchers have been focused on the deep
neural network-based approaches in the predictions.
Mrunalini M et al. [9], in their study, proposed the
bagging ensemble approach to predict the bug report's
severity. Also, the proposed method is comparable to the
C4.5 classifier.
In [10], used many machine learning (ML) approaches
to determine the defect's severity depending on the bug
report's textual description. It has been noticed that when the
count of terms increases from 125 onwards, the performance
of machine learning approaches is steady.
[11], proposed a technique based on the concept profile
to severity prediction for a provided bug report by analyzing
historical bug reports and building cps from them. Because
of the greater Recall, Precision, and F-measures, the
proposed approach can successfully predict the severity of a
given problem better than the Naïve Bayes (NB), NB
Multinomial and, k-nearest neighbor (KNN).
A study by A. Kaur et al. [12] compared multiple ML
algorithms at two different levels for severity predicting the
bugs in the software systems. According to the findings,
predicting severity at the component level yields better
outcomes than predicting severity at the system level.
The CNN and Random Forest with Boosting techniques
have been proposed to increase the severity of the binary
bug classifications over the latest technologies to increase
the performance of the severity of the binary bug
classifications over the latest technologies [13]. It is a new
deep learning model for classifying the severity of multiple
classes.
In [3], [22], and [23], the authors introduced the ML
based models to priority prediction of the bug reports. In [3],
they proposed an approach combined with NLP techniques
and machine learning algorithms. They have proposed the
"DRONE" approach, which is focused on the emotional
value to predict the priority of the bug reports.
P. A. Choudhary et al. [21] developed priority prediction
models using neural networks and text classification
techniques. They found that textual, temporal, author-
related, severity, product, and component features impact a
bug's priority.
The researcher in [2] used a deep learning-based
automatic technique. First, they use natural language
preprocessing (NLP) algorithms to extract the text
preprocessing of bug reports. Then, for each bug report,
analyze and apply an emotion score. They then generate a
vector for every bug report that has been preprocessed.
Fourth, refer to the vector and emotional scores generated
from each bug or report for severity prediction to a deep
learning-based classification. The cross-product outcomes
show that the proposed method for predicting the bug
report's severity works better than the more advanced
approaches.
In [1], M.Sallam et al. used the RNN-LSTM approach to
predict the bug report's priority, and they compared the
results with SVM and KNN. They conclude that the LSTM
accurately predicts and allocates the priority of bugs.
Neural Network-Based approaches are used [1], and [18-
20] to predict the bug report's priority. The Multilayer
Perception (MLP) based approach proposed by P. A.
Choudhary et al. [18] to priority prediction and finally shows
that the MLP performs better than the Naïve Bayes
Classifier. In [18], they have used several fields of the bug
reports for classification, and W. Zhang et al. [19] used the
description field to extract the features, and M. Kumari et al.
[1] identified four fields that can affect the priority level of
the bug reports, and they have used these fields for their
study. RNN-LSTM [1] and CNN-based approaches [20]
were also used in priority prediction.
In conclusion, researchers have developed several ML
algorithms for predicting the priority and severity of bug
reports. A few types of research are conducted to predict the
priority of bug reports using deep learning-based
classifications. In those studies, most of the research used
emotion analysis with the deep learning-based classifier.
The proposed approach to this work is different from the
existing approaches. Most researchers consider semantic
analysis with the deep neural classifier, but we do not
consider semantic analysis in our study. We applied a deep
neural network-based approach, NLP techniques, and
feature extractions to predict the bug report's priority.
III. P
ROPOSED
A
PPROACH
A. Overview
An overview of the bug report priority prediction using
deep neural networks is shown in Fig. 1. The following is
how our approach was used to predict the bug report's
priority:
First, we collect a dataset for our research. We used
various open-source projects to extract the bug reports.
In the second step, we used NLP techniques to
preprocess the dataset.
Third, we extract the features from the short
description part of the bug report using the Bag-of-Word
feature extraction method.
Finally, for priority prediction, we develop deep
learning-based classifiers (i.e., CNN-based classifiers).
Fig. 1. An overview of the proposed approach
2021 International Conference on Decision Aid Sciences and Application (DASA)
300
B. Data Acquisition
Bugs in software projects are reported to bug tracking
systems, which allow engineers to keep track of reported
bugs in real-time. As previously stated, we used a dataset
taken from Bugzilla for this work. The datasets consist of
bug reports extracted from four open-source projects
(Eclipse, Netbeans, Mozilla, and Open Office) and contain
more than 20,000 bug reports. Table I illustrates the total
number of bug reports downloaded from each of the open-
source software projects.
There are 11 columns in this dataset. The fields are bug
id, description, classification, product, platform, component,
operating system, bug status, resolution, priority, and
severity. We used a unique feature from the selected dataset,
a short textual description of the bug report. Because we
consider the description part of the bug is most suitable for
the prediction of the priority. This feature can be considered
as the most appropriate to the reported bug report's priority
prediction.
Our study filters the dataset according to the total
number of bugs related to the priority level.
TABLE II. T
OTAL
N
UMBER
O
F
B
UG
R
EPORTS
Project
Total Number
of Bug Reports
Eclipse 8,478
Mozilla 4,165
Netbeans 4,305
Open Office 4,553
Total 21,501
After the filtering process, we obtain data containing
bugs in P1, P2, P4 & P5, and we eliminate the bug with P3
as a priority level. P3 is taken as a default value for the
priority of the bug report. Therefore, developers also do not
give much consideration when they assign P3 priority for
the bug. As we earlier mentioned, the majority of the bug
reports are labeled priority as P3. Therefore, removing these
bugs is like removing most of the bugs from the database.
Fig. 2 shows the total number of bug reports for the
selected priority level.
Fig. 2. Total number of bug reports for the selected priority level
There were many bug reports that kept the priority
column empty or assigned as P3. For research purposes, we
remove those bug reports from our dataset.
The extracted dataset consists of features such as bugID,
short description, classification, product, component,
resolution, operation system, severity, bug status, and
priority of the bug. We select the main feature as a short
description. This description is presented in the form of a
brief text. The priority prediction technique in this study was
primarily based on the priority and a description of the bug
report.
C. Preprocessing
This part will go over each phase of the NLP technique
to preprocess the text (short description) of bug reports. The
majority of bug reports include unnecessary and irrelevant
content. Therefore, we do preprocess to enhance the
performance of our approach. Most researchers used the
NLP technique to preprocess bug reports. They also used the
below steps when preprocessing. To clean the textual
information in bug reports, we used the following
preprocessing steps in this research:
Tokenization: - This is the process of breaking text
into sentences, words, and clauses. It removes
useless symbols from the text and separates them
into tokens. Preprocessing necessitates this step
because bug reports are frequently written as a
combination of words and meaningless symbols
like punctuation marks and spaces. As a result,
tokenization eliminates symbols and breaks the text
down into tokens.
Stop-word removal: - "the," "in," "am," "our,"
"is," "I," "he," and "that" are frequently used stop-
words and which have no meaning individually.
Therefore, in the context of bug reports, such terms
do not include important information, and this data
may reduce the classification's efficiency. As a
result, these terms are removed from the tokens
derived in the first preprocessing step.
Stemming: - The process of reducing words to
their stems is known as stemming. Each word in a
bug report's description can take several different
forms. All selected words are stemmed and
converted to their ground words. Stemming is
highly important in the fields of text mining and
information retrieval. For instance, the words
"give," "gives," "gave," and "given" can all be
substituted with a single word, "give." Stemming
can be done using a variety of algorithms. For
lemmatization, however, we employ Porter's
stemming algorithm [3], which is a widely used
stemming approach by many researchers.
Finally, all preprocessing terms are referred to as
"features," and these terms are mainly used to develop the
priority prediction model.
The extracted features from the description of the bug
reports can be used to characterize bug reports in a
classification problem. We use Bugzilla to gather the bug
reports and construct a model for the priority prediction of a
new bug. We review the bug reports and look for feature
words in the description of each bug. Creating a prediction
model needs to convert text data from a bug report into a
Bag-of-Words feature vector. A dataset is converted from a
dataset (description) to a vector of features, representing
terms in the dataset.
2021 International Conference on Decision Aid Sciences and Application (DASA)
301
TABLE II. PERFORMANCE ON PRIORITY LEVEL
Used
Approach
P1
P2
P4
P5
Precision
Recall
F1-socre
Precision
Recall
F1-socre
Precision
Recall
F1-socre
Precision
Recall
F1-socre
CNN 58% 58% 58% 74% 70% 72% 72% 81% 76% 79% 74% 76%
TCN 34% 48% 40% 45% 46% 45% 46% 31% 37% 70% 65% 68%
SVM 51% 63% 57% 67% 59% 63% 64% 70% 67% 77% 59% 67%
D. Feature Extraction
After the initial text is cleaned and normalized, we need
to transform it into their features for modeling. Bag-of-
Words (BoW) is a method of extracting features from the
text for modeling purposes. It is a text representation that
may extract features from a text by describing the
recurrence of words inside the text. In this approach, we
consider each word count as a feature.
To obtain the Bag-of-Words, always perform
preprocessing steps and generate a set of all the available
words before sending it for modeling. In the bug reports, the
text is messy and unstructured, and to train our model, we
need structured, well-defined, fixed-length inputs. Using the
Bag-of-Words method, the unstructured text can be
converted into a fix-length vector.
E. CNN-Based Classifier
In this research, we developed a one-dimensional CNN
model to predict the bug report's priority. After
preprocessing the data then we take the features in the short
description of the bug reports. Then we loaded data for the
I-D (Dimensional) CNN model. First, define the model and
provide required three-dimensional inputs (samples, time
steps & features). For this model, we used 32 parallel
features maps and a kernel size of 5. The kernel size refers
to the number of input time steps taken into account while
reading or processing the input sequence into feature maps,
whereas feature maps refer to the number of times the input
is processed or interpreted. The developed model fits a fixed
number of epochs (400), and 32 samples are used as a batch
size.
IV. EVALUATION
A. Metrics
The performance measurements should be applied to a
test data set after the classifier has been adequately trained
and developed the priority prediction model. In this
research, we determine the priority-specific Accuracy,
Precision, Recall, and F1-score of the approach to
measuring the performance of the proposed approach of
CNN, TCN, and SVM on the given bug reports.
In this work, we consider about five priority classes
(levels). So, we conduct microanalysis and macroanalysis
for all priority levels of C. We calculate macroprecision
(Precisionmac ), macrorecall (Recallmac ), macro F1-score
(F1-scoremac ), microprecision (Precisionmic), microrecall
(Recallmic), and micro F1-score (F1-scoremic).
B. Results
This research compares the CNN, TCN, and SVM
approaches in the bug report's priority prediction. Up to
now, we have used microanalysis to determine how much
the performance of CNN has improved versus each class.
We also make a priority-level comparison to evaluate
CNN's performance at each priority level.
1) Comparison on Priority Level:
Table II shows the evaluation findings for every priority
level for the approaches applied. The first column in Table II
lists the three approaches, whereas columns 2-4, 5-7, 8-10,
11-13, and 14-16 list the priority levels we used for this
study. The table content represents the performance results of
CNN, TCN, and SVM.
At every priority level, CNN performs better than
TCN and SVM.
The performance increment of CNN over SVM in F1-
score differs from 1.72% = (58%-57%)/58% to
11.84% = (76%-67%)/76%.
2) Comparison of Macroanalysis and Microanalysis:
Evaluation results of Macroanalysis and Microanalysis
are shown in Table III. The three approaches used in this
study are presented in the first column. The results of
Macroanalysis and Microanalysis are presented in columns
2-4 and 5-7, respectively. The following observations can be
derived from Table III:
In both macro and microanalysis, CNN
outperforms TCN and SVM; however, SVM
outperforms TCN in both macro and microanalysis.
For both macro and microanalysis, the performance
improvement of CNN over SVM in F1-score is
12.70% = (71%-63%)/63%.
Moreover, the performance improvement of CNN
upon TCN for macro and microanalysis in F1-score
is 51.06% = (71%-47%)/47% and 47.92% = (71%-
48%)/48% respectively.
TABLE III. EVALUATION RESULTS OF MACRO &
MICROANALYSIS
Approach
Macro Micro
Precision Recall F1-
score
Precision Recall F1-
score
CNN 71% 71% 71% 71% 71% 71%
TCN 49% 47% 47% 49% 48% 48%
SVM 65% 63% 63% 65% 63% 63%
Table IV shows the accuracy of three different
classifications. Column one includes the three approaches
used for this work. The second column presents the
accuracy of each approach. We can observe that CNN
outperforms TCN and SVM by achieving 71% accuracy.
The final results showed that our proposed deep learning
(CNN) approach gave the highest performance outcomes
across all performance metrics. (Accuracy = 71%, Precision
= 71%, Recall = 71%, F1-score = 71%)
2021 International Conference on Decision Aid Sciences and Application (DASA)
302
TABLE I. ACCURACY OF DIFFERENT CLASSIFICATIONS
Approach Accuracy
CNN
71%
TCN 48%
SVM 63%
V. CONCLUSION AND FUTURE WORKS
Assigning a bug report's priority is a manual process.
Therefore, there is a high probability that the priority
assigned may be incorrect. As a result, automating the
process of prioritizing bug reports is critical. The bug
reports from the four open-source projects are gathered
using the Java program for this study. The extracted dataset
comprises almost 20,000 problem reports, and after filtering,
the dataset reduces as bug reports with a default priority
value are removed. First, preprocessing the bug reports'
short descriptions using NLP techniques and used steps are
discussed in the above section. In the second step, features
are extracted using the Bag-of-Words feature extraction
method. In the third step, we developed a model for this
study using CNN, TCN, and SVM. The performance of the
developed models is measured at a class level using
Accuracy, Precision, Recall, and F1-score. The CNN-based
approach outperformed other approaches by obtaining the
highest accuracy of 71%. Therefore, the proposed approach
performs well in the bug reports' priority prediction.
We hope to predict the severity and priority of bug
reports in the future using different feature extraction
methods and deep learning-based approaches. We hope to
use a different dataset than the one we used for this work.
REFERENCES
[1] H. Bani-Salameh, M. Sallam, and B. Al shboul, "A deep-learning-
based bug priority prediction using RNN-LSTM neural," e-Inform.
Softw. Eng. J., vol. 15, no. 1, 2021
[2] W. Y. Ramay, Q. Umer, X. C. Yin, C. Zhu, and I. Illahi, "Deep neural
network-based severity prediction of bug reports," IEEE Access, vol. 7,
pp. 46846–46857, 2019.
[3] Q. Um3er, H. Liu, and Y. Sultan, "Emotion based automated priority
prediction for bug reports," IEEE Access, vol. 6, pp. 35743–35752,
2018
[4] Q. Umer, H. Liu, and I. Illahi, "CNN-based automatic prioritization of
bug reports," IEEE trans. reliab., vol. 69, no. 4, pp. 1341–1354, 2020.
[5] M. Sharma, M. Kumari, R. K. Singh, and V. B. Singh, "Multiattribute
based machine learning models for severity prediction in cross project
context," in Computational Science and Its Applications—ICCSA.
Cham, Switzerland: Springer, 2014, pp. 227–241
[6] Menzies and A. Marcus, "Automated severity assessment of software
defect reports," in Proc. IEEE Int. Conf. Softw. Maintenance, Sep./Oct.
2008, pp. 346–355.
[7] A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals, "Predicting the
severity of a reported bug," in Proc. 7th IEEE Working Conf. Mining
Softw. Repositories (MSR), May 2010, pp. 1–10
[8] Y. Tian, D. Lo, and C. Sun, "Information retrieval based nearest
neighbor classification for fine-grained bug severity prediction," in
Proc. 19th Work. Conf. Reverse Eng., Washington, DC, USA, Oct.
2012, pp. 215–224. doi: 10.1109/WCRE.2012.31.
[9] M. N. Pushpalatha and M. Mrunalini, "Predicting the severity of bug
reports using classification algorithms," in the 2016 International
Conference on Circuits, Controls, Communications and Computing
(I4C), 2016, pp. 1–4.
[10] K. K. Chaturvedi and V. B. Singh, "Determining Bug severity using
machine learning techniques," in 2012 CSI Sixth International
Conference on Software Engineering (CONSEG), 2012, pp. 1–6.
[11] T. Zhang, G. Yang, B. Lee, and A. T. S. Chan, "Predicting the severity
of bug report by mining bug repository with concept profile," I
Proceedings of the 30th Annual ACM Symposium on Applied
Computing - SAC '15, 2015.
[12] A. Kaur and S. G. Jindal, "Text analytics-based severity prediction of
software bugs for apache projects," Int. j. Syst. assur. eng. manag.,
vol. 10, no. 4, pp. 765–782, 2019.
[13] A. Kukkar, R. Mohana, A. Nayyar, J. Kim, B.-G. Kang, and N.
Chilamkurti, "A novel deep-learning-based Bug Severity
classification technique using convolutional neural networks and
random forest with boosting," Sensors (Basel), vol. 19, no. 13, p.
2964, 2019.
[14] A. Chauhan and R. Kumar, "Bug severity classification using
semantic feature with convolution neural network," in Advances in
Intelligent Systems and Computing, Singapore: Springer Singapore,
2020, pp. 327–335.
[15] Y. Tian, D. Lo, and C. Sun, "Drone: Predicting priority of reported
bugs by multi-factor analysis," in Proc. IEEE Int. Conf. Softw.
Maintenance (ICSM), Sep. 2013, pp. 200–209, doi: 10.1109/
ICSM.2013.31.
[16] J. Kanwal and O. Maqbool, "Bug prioritization to facilitate bug report
triage," J. Comput. Sci. Technol., vol. 27, no. 2, pp. 397–412, Mar.
2012, doi: 10.1007/s11390-012-1230-3
[17] L. Yu, W.-T. Tsai, W. Zhao, and F. Wu, "Predicting defect priority
based on neural networks," in Advanced Data Mining and
Applications, L. Cao, J. Zhong, and Y. Feng, Eds. Berlin, Germany:
Springer, 2010, pp. 356–367
[18] P. A. Choudhary, "Neural Network based bug priority prediction
model using text classification techniques," Int. j. adv. res. comput.
sci., vol. 8, no. 5, pp. 1315–1319, 2017.
[19] W. Zhang and C. Challis, "Automatic bug priority prediction using
DNN based regression," in Advances in Natural Computation, Fuzzy
Systems and Knowledge Discovery, Cham: Springer International
Publishing, 2020, pp. 333–340
[20] M. Kumari and V. B. Singh, "An improved classifier based on
entropy and deep learning for bug priority prediction," in Advances in
Intelligent Systems and Computing, Cham: Springer International
Publishing, 2020, pp. 571–580.
[21] P. A. Choudhary, "Neural Network-based bug priority prediction
model using text classification techniques," Int. j. adv. res. Comput.
Sci., vol. 8, no. 5, pp. 1315–1319, 2017.
[22] Pushpalatha, Mrunalini, and S. R. Bista, "Predicting the priority of
bug reports using classification algorithms," Indian j. comput. sci.
eng., vol. 11, no. 6, pp. 811–818, 2020.
[23] R. Malhotra, A. Dabas, H. A, and M . Pant, "A study on machine
learning applied to software bug priority prediction," in 2021 11th
International Conference on Cloud Computing, Data Science &
Engineering (Confluence), 2021.
2021 International Conference on Decision Aid Sciences and Application (DASA)
303