Conference PaperPDF Available

Software Risk Prediction at Requirement and Design Phase : An Ensemble Machine Learning Approach

Authors:
Software Risk Prediction at Requirement and
Design Phase : An Ensemble Machine Learning
Approach
Yibeltal Assefa
Software Engineering
Kombolcha Institute of Technology
Wollo University
yibebdu21@gmail.com
Esubalew Alemneh
ICT4D Research Center
Bahir Dar Institute of Technology
Bahir Dar University
esubalew.alemneh@bdu.edu.et
Shegaw Nibret
ICT4D Research Center
Bahir Dar Institute of Technology
Bahir Dar University
ze.nibret12@gmail.com
Abebaw Worku
Software Engineering
College Natural and Computation Science
Mekdela University
abebawworku10@gmail.com
Abstract—Software development is a highly structured process
that involves the creation and maintenance of a particular system,
ranging from simple applications to complex enterprise software.
Despite following a well-defined process, unforeseen events can
occur at any stage of the SDLC that may impact the software
development process, leading to losses or failures in software
development. Software projects inherently involve risks, and no
software development project is immune to these risks. Identify-
ing and predicting such risks accurately is a challenge in software
project development. To address this challenge, this study aims
to develop a software risk prediction model using homogenous
ensemble machine learning algorithms. These algorithms were
selected due to their proven effectiveness in handling complex
datasets and their ability to achieve high prediction accuracy.
We have used an experimental research methodology to develop
a software risk prediction model. The methodology involved
collecting datasets related to requirements and design from
publicly available websites such as zenodo and Harvard education
dataset. These datasets were then used to train and validate
the performance of the machine learning algorithms. Our study
has achieved impressive prediction scores of 98.67%, 97.3%,
96.0%, and 96.0% for the algorithms Gradient Boost, Random
Forest, AdaBoost, and bagging algorithms with their homogenous
decision tree respectively. Using the four different homogeneous
ensemble machine learning algorithms we develop software risk
predictive models. Ultimately, Gradient Boost was selected as
the algorithm to construct our risk predictive model due to
its superior performance and ability to handle complex data.
By employing this model, software development organizations
can improve their ability to identify and mitigate risks, thereby
improving the quality and reliability of their software products.
Index Terms—ensemble machine learning algorithms, require-
ments phase, design phase, software risk prediction.
I. INTRODUCTION
The process of software development is a systematic method
to develop software. It involves the development and main-
tenance of software [17]. There is always the possibility of
unexpected events occurring during the Software Development
Life Cycle (SDLC) that may result in loss or failure in software
projects [1]. Incompleteness and omissions of requirements
caused many software hazards as software risks could be
generated due to incomplete and unclear requirements [2]. The
risks burst from various risk influences that are established in
an assortment of exercises in the SDLC [3].
A risk is an uncertain event that occurs in the SDLC
process and because of which has led to the potential loss
of software in most organizations. It increases most software
projects’ failure rate [5]. Therefore, risk assessment early
in the life cycle of software development is very important
[4]. Software risks arise as a consequence of various factors
such as inadequate resources, limited skills, and insufficient
information.Risk management focuses on the identification of
risks and appropriate treatment of them. Projects have individ-
ual risks or overall risks. Risk management methods specify
search procedures for information gathering, organization and
interpretation to simplify complex decisions under conditions
of bounded rationality [6].
Therefore, developing automation systems using machine
learning is highly essential to support risk management man-
agers. Machine learning studies how to automatically learn to
make accurate predictions, classification and clustering based
on past observations. There are different types of Machine
Learning (ML) techniques [7]. There is no software devel-
opment project which is free from risks. There are some
software products that fail due to the negligence of addressing
risks at the early stage of development. Previous research has
explored the domain of software product risk prediction but
the accuracy of previous findings was low to indicate potential
improvement. One approach to enhance the accuracy of risk
prediction results is using artificial intelligence algorithms
.Ensemble learning techniques is one such algorithm that can
be used for this purpose. The algorithm involves combining the
979-8-3503-2848-6/23/$31.00 ©2023 IEEE 19
risk prediction models. Therefore, there is still room for
improvement in risk prediction of software products using
advanced ensemble machine learning techniques.
Khalid [9] conducted a study on predicting risk through
artificial intelligence using machine learning algorithms, fo-
cusing on nonfinancial firms in Pakistan. The study utilized
various techniques, including random forest, decision tree,
naive Bayes, and KNN, to assess and predict risks based on
a dataset from the finance sector. However, this research did
not consider the risks associated with software development.
The study aimed to explore the possibility of predicting risks
by applying machine learning algorithms, but its scope was
limited to the finance sector only. The study could not provide
insights into software development risk prediction, which is
essential in the current software-driven business environment.
Therefore, our research aims to address this gap by examining
the performance of various machine learning algorithms to
predict software development risks.
The scholarly article authored by Otoom [3] focuses on
developing an ensemble model for predicting risks in software
requirements. The article proposes an ensemble classifier that
combines AdaBoostM1 and J48 algorithms, called the ABMJ
model. Although this model is shown to be effective in predict-
ing software risks associated with requirements, it overlooks
the potential risks that may arise during the design stage of
software development. Hence, there exists a gap in the research
on software risk prediction that aims to address the project risk
that can emerge from the design stage. This research aims
to fill this gap by proposing a more comprehensive approach
that considers the risks associated with both requirement and
design stages of software development.
Filippetto [10] created a model for predicting risks in
software project management based on similarity analysis of
context histories. However, the study’s case study approach
had limitations, as it only examined a small number of cases
and did not employ automated prediction techniques. The
model relied on manual analysis of historical cases, with the
evaluation mechanism heavily reliant on expert evaluation.
This approach was limited by its dependence on individual
expertise and the lack of generalizability of the results to other
cases. Also, the study did not explain the evaluation process
used, making it difficult to assess the validity and reliability
of the results. Therefore, a more systematic and automated
approach to risk prediction in software project management is
necessary.
Another study, referenced as [11], focuses on predicting
the risk percentage in software projects by training machine
learning classifiers. The study utilizes various machine learn-
ing algorithms, including SVM, k-NN, RF, and ANN-MLP,
and trains them using the collected data. However, the dataset
used in the study is obtained through a questionnaire, which
may not be entirely reliable since it depends on the user’s
expertise. Moreover, the accuracy of the results obtained using
this dataset is low and requires improvement
Dillibabu’s paper [12] presents an innovative approach for
software risk prediction, which combines fuzzy-based TOP-
strengths
of
multiple
machine
learning
algorithms
to
improve
overall
predictive
performance.
In
addition
to
this,
previous
researches
[5]
[8]
[9]
[10]
are
limited
in
specifying
the
stages
of
risk
occurrence
during
dataset
collection.
These
studies
primarily
focus
on
risks
that
arise
during
the
requirement
stage,
neglecting
the
risks
associated
with
the
design
stage.
Therefore,
it
is
crucial
to
expand
the
scope
of
the
dataset
to
include
risks
occurring
at
both
the
requirement
and
design
stages.
II.
REL ATED
WOR K
This
chapter
aims
to
provide
an
extensive
review
of
relevant
literature
in
software
risk
prediction.
One
of
the
specific
objectives
of
this
review
is
to
identify
gaps
in
the
context
of
predicting
software
project
risk
at
the
early
stage
of
development.To
achieve
this
goal,
we
have
reviewed
previous
research
and
literature
in
software
risk
prediction.
We
have
explored
a
wide
range
of
sources,
including
academic
journals,
conference
proceedings,
and
books,
to
gather
the
most
relevant
and
up-to-date
information
on
the
topic.
There
are
research
studies
have
been
conducted
in
the
field
of
risk
prediction
for
software
products.
However,
there
exist
certain
gaps
in
the
existing
literature
that
require
attention.
The
following
scholarly
articles
are
the
most
relevant
and
recent
publications
that
have
been
reviewed
in
detail.
Akumba’s
study
[5]
aimed
to
predict
software
risk
during
the
requirement
stage.
However,
their
research
did
not
consider
other
critical
software
development
stages
such
as
the
design
stages.
Consequently,
the
risk
assessment
they
provided
was
incomplete,
leading
to
potential
risks
that
could
harm
the
software
project.
In
their
study,
Akumba
employed
the
Na¨
ıve
Bayes
machine
learning
algorithm,
but
did
not
explore
the
use
of
other
predictive
algorithms
that
may
yield
better
results.
Furthermore,
the
authors
did
not
explain
the
rationale
behind
their
choice
of
the
Na¨
ıve
Bayes
algorithm.
This
approach
has
a
weakness
when
there
are
no
occurrences
of
a
class
label
and
a
specific
attribute
value
together,
leading
to
a
probability
estimate
of
zero
based
on
frequency.
When
all
the
probabilities
are
multiplied,
this
problem
could
significantly
impact
the
predictive
model.
Our
research
aims
to
address
these
gaps
by
experimenting
with
other
predictive
algorithms
to
improve
the
accuracy
of
the
predictive
model.
We
will
use
datasets
from
both
the
requirement
and
design
stages
of
software
development
to
identify
risks
at
an
early
stage.
The
research
paper
by
[8]
delved
into
Risk
Prediction
Applied
to
Global
Software
Development
utilizing
various
Machine
Learning
Methods
including
logistic
regression,
deci-
sion
tree
(DT),
Random
Forest
(RF),
Support
Vector
Machine
(SVM),
K-Nearest
Neighbors
(KNN),
and
Na¨
ıve
Bayes.
The
findings
showed
an
accuracy
of
89%,
which
although
impres-
sive,
could
still
be
further
improved.
It
is
noteworthy
that
the
paper
focused
on
global
software
development,
a
topic
that
poses
unique
challenges
and
risks
that
require
careful
consideration.
Despite
various
machine
learning
algorithms,
the
paper
did
not
explore
ensemble
methods
such
as
bagging,
boosting,
or
stacking
that
could
enhance
the
accuracy
of
20
SIS, ANFIS MCDM, and fuzzy decision-making trial and
evaluation laboratory methods. The study utilizes the NASA
93 dataset that includes 93 software project values. The results
of the research show that the proposed approach has high
accuracy in predicting software risk factors. However, the
paper notes that there is still room for accuracy improve-
ment. Therefore, the use of machine learning approaches
could potentially enhance accuracy and improve the decision-
making process for predicting software risk factors. Overall,
this integrated approach provides a promising framework for
software risk prediction and can be further developed in future
studies.
Iftikhar [13] conducted a study focused on Risk Prediction
in Global Software Development using Artificial Neural Net-
work (ANN). The research employed various ANN techniques,
namely Levenberg–Marquardt, Bayesian Regularization, and
Scaled Conjugate Gradient, to predict risks. The accuracy of
the model was measured using Mean Squared Error (MSE),
with a resulting value of 2.157 MSE. However, this value
indicates a low level of accuracy, highlighting the need for
improvement. To enhance the model’s performance, it is
recommended to expand the sample data set by including data
from multiple companies. Additionally, employing random
data collection methods would aid in generalizing the model’s
predictions.
In general, while conducting a review of the related liter-
ature, it is apparent that there exists a discernible gap that
must be addressed. Specifically, this pertains to the data set
and the techniques that are utilized to improve the accuracy
of predictions, thereby mitigating the risks associated with
various project and software products. The need to develop
a more precise approach to prediction is of paramount impor-
tance, and it is one that cannot be overstated. Failure to do so
may result in a wide range of potential risks, which may have
adverse implications for the project’s success and the resulting
software products. As such, it is imperative that researchers
and practitioners alike focus their efforts on identifying and
implementing more advanced techniques that can help to fill
this critical gap in the existing literature.
III. RESEARCH METHODOLOGY
A. Research Design
The study utilizes an experimental research design, em-
ploying a scientific approach to investigate the relationship
between variables. Two sets of variables are utilized, with
the first serving as a constant to measure the variances in
the second variable [13]. In our particular case, the dependent
variable is the software risk level. The reason behind choosing
the experimental research design is to explore potential cause-
and-effect relationships and gain a clear understanding of how
certain variables impact others. This approach allows us to
assess the effects of various factors on the software risk level
and determine causal connections between them.
B. Proposed architecture
The architecture of a system provides a high-level view
of its key components, their relationships, and the way they
interact with one another [15]. The proposed architecture of
our work is indicated below figure 1.
Fig.
1.
Proposed
Architecture
of
Our
predictive
model.
1)
Software
risk
Data
set:
We
are
collected
our
dataset
from
software
repositories
such
as
Zenodo
and
Harvard
educa-
tion
dataset.
We
used
requirement
risk
datasets
which
contain
risk
data
belongs
to
requirement
stage
such
as
functional,
nonfunctional
and
domain
requirement
and
design
stage
which
have
database
design,
user
interface
design
getting
from
Zen-
odo.
This
dataset
has
400
instances
and
14
number
of
instances
and
we
also
add
100
instances
by
collecting
from
Harvard
education
dataset
which
help
our
model
train
more.
2)
Data
prepossessing:
The
data
preprocessing
phase
in
research
projects
is
often
overlooked
and
underestimated
by
researchers
[16].
It
is
a
critical
step
that
involves
examining
the
quality
of
the
data
before
developing
a
model.
We
perform
the
different
like
operations
such
as
data
cleaning,
data
integration,
data
transformation,
handling
data
imbalance,
and
feature
engineering
for
making
preprocessing
of
our
data
set.
3)
Data
cleaning
:
One
common
challenge
in
data
cleaning
is
handling
data
that
contains
missing
values,
as
these
missing
values
can
impact
the
quality
and
accuracy
of
the
data
analysis.
To
address
this,
various
methods
can
be
employed
during
the
data
cleaning
process.
One
approach
is
to
avoid
including
irrelevant
data
or
attributes
in
the
analysis,
as
these
can
introduce
noise
and
inconsistencies
into
the
results.
In
our
case,
since
our
dataset
is
small
and
has
only
a
few
missing
21
values, we manually fill in those missing values. It is worth
noting that our dataset is now free from any missing values.
4) Outliers Handling: We detect outliers using a boxplot
and address them by using the previously calculated Interquar-
tile Range (IQR) scores. The general guideline is that any data
point falling outside the range of (Q1 - 1.5 IQR) and (Q3 +
1.5 IQR) is considered an outlier. Outliers are represented as
values that lie far from the main box, as depicted in the figure
below.
Fig. 2. The dataset with outlier represents in box plot.
To eliminate the outliers, we employed the Interquartile
Range (IQR) method. This technique involves the computation
of the first quartile (Q1), third quartile (Q3), and the IQR
for each numeric column using the quantile () function. The
cleared dataset representation is shown below figure 3.
Fig. 3. The dataset with out outlier represents in box plot.
5) Data transformation: In this study, the researchers em-
ploy data transformation techniques to mathematically alter the
values of a variable. Researchers employ data normalization
and standardization techniques as part of data transformation
to tackle modeling challenges and enhance the effectiveness
of their analyses. These techniques involve modifying the
scale and distribution of data to ensure compatibility and
comparability across different attributes.
IV. EXPERIMENTS
In this section entails various essential tasks including
data description, development of predictive models, evaluation
of predictive models, and validating the research question
responses.
A. Data Description
This research study was conducted on a dataset comprising
400 instances to make predictions about software risk. After
preprocessing the data, 14 features were considered for this
study.
The software risk levels were categorized into ve distinct
classes. Out of the total 400 instances, there were 175 instances
classified as risk level 2, 95 instances classified as risk level 3,
60 instances classified as risk level 4, 50 instances classified
as risk level 1, and 15 instances classified as risk level 5. It is
important to note that these classifications were made prior to
addressing the issue of imbalanced data. The comprehensive
results of the study can be observed in the Figure below
4. From this figure can understand that risks that belong to
Fig. 4. The distribution of risk in our dataset.
the second division of severity have higher number from our
dataset and also the risks which have third risk severity has the
second the greatest number of the dataset and also the fourth
severity, second severity are order according to their amounts
from our dataset.
B. Predictive Model Development for Software Risk
Upon completing the necessary data pre-processing steps,
we proceeded to employ various suitable homogeneous en-
semble machine learning techniques to construct a predictive
model for software project risk. These techniques included
Adaboost with a decision tree, Gradient Boosting with a
decision tree, Bagging with a decision tree, and Bagging with
random forest.
In our study, we employed the following homogeneous
ensemble machine learning methods:
1) Adaboost: : This method combines multiple weak learn-
ers, specifically decision trees, to iteratively improve the
model’s predictive accuracy by focusing on misclassified in-
stances. Gradient Boosting: Like Adaboost,
22
2) Gradient Boosting: Gradient Boosting also utilizes de-
cision trees as weak learners. However, it differs in the way
it assigns weights to the misclassified instances, aiming to
minimize the overall loss function.
3) Bagging with Decision Trees: : Bagging, short for Boot-
strap Aggregation, involves creating multiple subsets of the
training data through bootstrap sampling. Each subset is used
to train a separate decision tree, and the final prediction is
obtained through aggregation.
4) Random Forest with Decision Tree: : Random Forest
is an extension of Bagging that further enhances performance
by introducing randomness in the feature selection process for
each decision tree. This randomness helps to reduce overfitting
and improve generalization.
C. Dataset splitting
To evaluate the performance of our homogeneous ensemble
machine learning algorithms on unseen data, we utilized the
train-test split technique. In this research work we used 0.25
percent from the total dataset which led to 100 instances
for test our model. By averaging the performance metrics
across multiple iterations by different values, we obtain a more
reliable assessment of our models’ performance.
D. Experimental Setup
To create a prediction model for software risk, few ex-
periments were conducted to identify the best classification
model and extract sample-relevant rules. Four experiments
were conducted using ensemble machine learning methods
along with 14 selected features deemed risk factors. The
goal was to utilize the most appropriate ensemble machine
learning methods and relevant features to construct an effective
prediction model for software risk.
1) Random Forest with Decision Tree: The aim of our
experiment was to create a predictive model for software
project risk using the Random Forest algorithm and decision
tree. Our results indicate that Random Forest outperformed
other algorithms in terms of accuracy, achieving an impressive
97.3%.
2) Gradient boosting with Decision Tree: In this study,
our objective was to build a predictive model for assessing
the risk of software projects. The confusion matrix generated
by gradient boosting is indicated in the following figure.
Specifically, when evaluating the accuracy metric, gradient
boosting achieved an impressive accuracy rate of 98.67%.
3) AdaBoosting with Decision Tree: In this research study,
our aim was to develop a predictive model to assess the
risk associated with software projects using the AdaBoost-
ing machine learning algorithm. The figure below indicates
our confusion matrix representation. we achieved a 96.0%
accuracy rate using AdaBoosting, indicating that our model
can effectively predict the likelihood of software project risks.
Generally, the result is indicated below table 1.
TABLE I
EXP ERI ME NTAL RE SU LT OF AL L ALG OR ITH MS
Evaluation Algorithms
metrics Random Forest AdaBoosting Gradient Boosting Bagging
Accuracy 97.3 % 96.0% 98.67% 96.0%
Precision 97.8% 96.6% 99.1% 96.4%
Recall 97.3% 96.0% 98.67% 96.0%
F1Score 97.34% 96.06% 98.8% 96.02%
E. Bagging Algorithm with Decision Tree
The aim of this study was to create a predictive model
for evaluating the risk associated with software projects using
the Bagging ensemble algorithm. The results showed that the
Bagging algorithm performed well in terms of accuracy.
F. Discussion of the Result
The predictive model for software project risk was con-
structed using 400 instances, as mentioned earlier. In this study
aimed to identify the factors that hold significant influence over
the risk levels in software projects. By using heat map graph
we identify the priority and probability of risk consistently
emerged as the predominant features among the targeted
attributes. We conducted four experiments using homogeneous
ensemble machine learning methods: gradient Boost, Ad-
aBoost, Bagging, and Random Forest. These experiments were
to determine the most effective approach for predicting soft-
ware project risks. The results of the experiments revealed the
following accuracies: 98.67% for gradient Boost, 97.3% for
Random Forest, and 96.0% for both Bagging and AdaBoost.
Based on these findings, it can be concluded that the gradient
Boost classification algorithm is the most appropriate homo-
geneous ensemble machine learning algorithm for developing
a predictive model capable of accurately predicting software
risk levels using a software risk dataset.These results provide
valuable insights into the selection of ensemble machine
learning methods for predicting software project risks. The
overall result is indicated in the following figure 6. Gener-
Fig.
5.
Result
representation
using
bar
graph
for
different
algorithms
ally,
findings
of
these
experiments
revealed
that
the
gradient
23
boosting with decision tree homogeneous ensemble machine
learning algorithm achieved an exceptional accuracy rate of
98.67%. This level of accuracy in software risk prediction
surpasses the results reported in previous related work. By
selecting the gradient boosting algorithm as the foundation
for the predictive model, the study ensures a high level of
accuracy in predicting software project risks.
V. C ONCLUSION
The process of software development is a systematic method
to develop software. It involves the development and main-
tenance of software. There is always the possibility of unex-
pected events occurring during the Software Development Life
Cycle (SDLC) that may result in loss or failure in software
development. Because of the nature of risk and software
projects there are no tools to identify risk and predict risk
severity at an early stage of development. The purpose of
this study is to support risk management and managers in the
automation system required by developing predictive models
using ensembled machine learning. We used homogenous
ensembled machine learning algorithms to predict software
project risk level at requirement and design stage.
We used four different homogenous ensembled machine
learning algorithms for developing our software risk predictive
model such as random forest, AdaBoost, Gradient boost-
ing and bagging which achieves accuracy of 97.7%, 95.5%,
98.9%, 97.7% respectively. In addition, we used software risk
dataset from software repositories such as Zenodo to train our
algorithms. The dataset also contains 14 attributes or features
and 400 instances. Based on their accuracy result we select
that Gradient boosting algorithm is selected for developing
final predictive model.
This study used experimental research methodology to come
up with different algorithms for making experiments and select
the appropriate or outperformed algorithm that could be used
for building our predictive model. Therefore, we performed
four experiments with homogenous ensembled machine al-
gorithm and finally we selected gradient boosting algorithm
which score 98.9 % of predictive performance.
REFERENCES
[1] Mahmud, Mahmudul Hoque. ”Software Risk Prediction: Systematic
Literature Review on Machine Learning Techniques.” Applied Sciences
12.22 (2022): 11694.
[2] Bhukya, Shankar Nayak, and Suresh Pabboju. ”Software engineering:
risk features in requirement engineering.” Cluster Computing 22 (2019):
14789-14801.
[3] Qureshi, Muhammad Shahroz Gul, Bilal Khan, and Muhammad Arshad.
”ML-Based Model for Risk Prediction in Software Requirements.”
International Journal of Technology Diffusion (IJTD) 13.1 (2022): 1-
17.
[4] Bhukya, Shankar Nayak, and Suresh Pabboju. ”Software engineering:
risk features in requirement engineering.” Cluster Computing 22 (2019):
14789-14801.
[5] Akumba, Beatrice O. ”A Predictive Risk Model for Software Projects’
Requirement Gathering Phase.” International Journal of Innovative Sci-
ence and Research Technology 5 (2020): 231-236.
[6] Sarigiannidis, Lazaros, and Prodromos D. Chatzoglou. ”Software devel-
opment project risk management: A new conceptual framework. Journal
of Software Engineering and Applications 4.05 (2011): 293.
[7] Dasgupta, Ariruna, and Asoke Nath. ”Classification of machine learning
algorithms.” International Journal of Innovative Research in Advanced
Engineering (IJIRAE) 3.3 (2016): 6-11.
[8] , . ”Risk Prediction using Machine Learning Techniques in the Domain
of Global Software Development: A Review.” 5.1 (2023): 7-15.
[9] Khalid, Shamsa. ”Predicting Risk through Artificial Intelligence Based
on Machine Learning Algorithms: A Case of Pakistani Nonfinancial
Firms.” Complexity 2022 (2022).
[10] Filippetto, Alexsandro Souza, Robson Lima, and Jorge Luis Vict´
oria
Barbosa. ”A risk prediction model for software project management
based on similarity analysis of context histories.” Information and
Software Technology 131 (2021): 106497.
[11] Gouthaman, P., and Suresh Sankaranarayanan. ”Prediction of Risk Per-
centage in Software Projects by Training Machine Learning Classifiers.
Computers Electrical Engineering 94 (2021): 107362.
[12] Suresh, K., and R. Dillibabu. ”An integrated approach using IF-TOPSIS,
fuzzy DEMATEL, and enhanced CSA optimized ANFIS for software
risk prediction.” Knowledge and Information Systems 63.7 (2021): 1909-
1934.
[13] Iftikhar, Asim, ”Risk prediction by using artificial neural network in
global software development. Computational Intelligence and Neuro-
science 2021 (2021).
[14] Sweet, S. A., and K. A. Grace-Martin. ”Modeling relationships of
multiple variables with linear regression. Data analysis with SPSS: A
first course in applied statistics (2012): 161-188.
[15] Eeles, Peter. ”What is a software architecture?. IBM. Retrieved March
21 (2006): 2007.
[16] Ampomah, Ernest Kwame, Zhiguang Qin, and Gabriel Nyame. ”Eval-
uation of tree-based ensemble machine learning models in predicting
stock price direction of movement. Information 11.6 (2020): 332.
[17] Assefa, Yibeltal, et al. ”Software Effort Estimation using Machine
learning Algorithm.” 2022 International Conference on Information and
Communication Technology for Development for Africa (ICT4DA).
IEEE, 2022.
24
... Risks can be categorized using different criteria, including their origin, type, or impact on the project [6]. Usually, this classification of risks is done manually, a practice that the personal judgment of the risk analyst or the project manager might influence [7]. That process can introduce biases and variability in risk management, leading to inconsistent risk evaluations and unpreparedness for critical issues. ...
Article
Full-text available
Software requirements are the most critical phase focused on documenting, eliciting, and maintaining the stakeholders' requirements. Risk identification and analysis are preemptive actions designed to anticipate and prepare for potential issues. Usually, this classification of risks is done manually, a practice that the personal judgment of the risk analyst or the project manager might influence. Machine learning (ML) techniques were proposed to predict the risk level in software requirements. The techniques used were logistic regression (LR), multilayer perceptron (MLP) neural network, support vector machine (SVM), decision tree (DT), naive bayes, and random forest (RF). Each model was trained and tested using cross-validation with k-folds, each with its respective parameters, to provide optimal results. Finally, they were compared based on precision, accuracy, and recall metrics. Statistical tests were performed to determine if there were significant differences between the different ML techniques used to classify risks. The results concluded that the DT and RF are the techniques that best predict the risk level in software requirements.
Article
Full-text available
Software risk prediction is the most sensitive and crucial activity of the SDLC. It may lead to the success or failure of the project. The requirement gathering stage is the most important and challenging stage of the SDLC. The risks should be tackled at this stage and saved to be used in future projects. However, a model is proposed for the prediction of software requirement risks using the requirement risk dataset and ML classification. This research study proposed a model for risk prediction in software requirements that will be evaluated using several evaluation measures (e.g., precision, F-measure, MCC, recall, and accuracy). For the completion of this study, the dataset is taken from Zenodo repository. The model is evaluated using ML techniques. After the finding and analysis of results, DT shows best performance with accuracy of 99%.
Article
Full-text available
The Software Development Life Cycle (SDLC) includes the phases used to develop software. During the phases of the SDLC, unexpected risks might arise due to a lack of knowledge, control, and time. The consequences are severe if the risks are not addressed in the early phases of SDLC. This study aims to conduct a Systematic Literature Review (SLR) and acquire concise knowledge of Software Risk Prediction (SRP) from the published scientific articles from the year 2007 to 2022. Furthermore, we conducted a qualitative analysis of published articles on SRP. Some of the key findings include: (1) 16 articles are examined in this SLR to represent the outline of SRP; (2) Machine Learning (ML)-based detection models were extremely efficient and significant in terms of performance; (3) Very few research got excellent scores from quality analysis. As part of this SLR, we summarized and consolidated previously published SRP studies to discover the practices from prior research. This SLR will pave the way for further research in SRP and guide both researchers and practitioners
Article
Full-text available
AI (artificial intelligence) is a significant technological advancement that has everyone buzzing about its incredible potential. The current research study evaluates the influence of supervised artificial intelligence techniques, i.e., machine learning techniques on the nonfinancial firms of Pakistan and focuses on the practical application of AI techniques for the accurate prediction of corporate risks which in turn will lead to the automation of corporate risk management. So, in this study, we used financial ratios for accurate risk assessment and for the automation of corporate risk management by developing machine learning algorithms using techniques, namely, random forest, decision tree, naïve Bayes, and KNN. A secondary data collection technique will be used. For this purpose, we collected annual data of nonfinancial companies in Pakistan for the period ranging from 2006 to 2020, and the data are analyzed and tested through Python software. Our results prove that AI techniques can accurately predict risk with minimum error values, and among all the techniques used, the random forest technique outperforms as compared to the rest of the techniques.
Article
Full-text available
The term risk is defined as the potential future harm that may arise due to some present actions. Risk management in software engineering is related to the various future harms that could be possible on the software due to some minor or non-noticeable mistakes in software development project or process. There are quite different types of risk analysis that can be used. Basically, risk analysis identifies the high risk elements of a project in software engineering. Also, it provides ways of detailing the impact of risk mitigation strategies. Risk analysis has also been found to be most important in the software design phase to evaluate criticality of the system. The main purpose of risk analysis understands the risks in better ways and to verify and correct the attributes. A successful risk analysis includes important elements like problem definition, problem formulation, data collection. Some of the requirement risks are Poor definition of requirements, Inadequate of requirements, Lack of testing, poor definition of requirements etc. The likelihood of the events which tends to the goal can be evaluated from the evidence of Satisfaction and denial of the goal and it can be achieved through Tropos goal model. Original Tropos model is modified to meet the risk assessment requirements in requirements engineering. The event considers as a risk which based on the likelihood values. The relations are defined between multiple goals and events, which identify the necessity of a particular goal. In order to analyze the risk in achieving some particular goals, a set of candidate solutions are generated. Based on the risk affinitive value, the candidate solutions can be evaluated. There are three risk parameters to compute the risk affinitive value, which are (1) low (2) medium (3) high. The risk parameters and cost analysis clearly evaluate the affinity of that event to a particular set of goals.
Article
Full-text available
The demand for global software development is growing. The nonavailability of software experts at one place or a country is the reason for the increase in the scope of global software development. Software developers who are located in different parts of the world with diversified skills necessary for a successful completion of a project play a critical role in the field of software development. Using the skills and expertise of software developers around the world, one could get any component developed or any IT-related issue resolved. The best software skills and tools are dispersed across the globe, but to integrate these skills and tools together and make them work for solving real world problems is a challenging task. The discipline of risk management gives the alternative strategies to manage risks that the software experts are facing in today’s world of competitiveness. This research is an effort to predict risks related to time, cost, and resources those are faced by distributed teams in global software development environment. To examine the relative effect of these factors, in this research, neural network approaches like Levenberg–Marquardt, Bayesian Regularization, and Scaled Conjugate Gradient have been implemented to predict the responses of risks related to project time, cost, and resources involved in global software development. Comparative analysis of these three algorithms is also performed to determine the highest accuracy algorithms. The findings of this study proved that Bayesian Regularization performed very well in terms of the MSE (validation) criterion as compared with the Levenberg–Marquardt and Scaled Conjugate Gradient approaches.
Article
Full-text available
Successful project is determined based on its effective performance and prioritization of all unavoidable software project risks. In this paper, the risk evaluation in software projects is done through developing a new hybridized fuzzy-based risk evaluation framework. During decision making process, this proposed scheme has determined and ranked all the significant project risks. Software project risks are better assessed with the incorporation of Intuitionistic fuzzy-based TOPSIS, adaptive neuro-fuzzy inference system-based multi-criteria decision making (ANFIS MCDM), and fuzzy decision making trial and evaluation laboratory methods. In order to attain accurate software risk estimation, the ANFIS parameters are adjusted with the help of enhanced crow search algorithm (ECSA). To the ANFIS approach, the ECSA is combined to make free the solutions sticking inside the local optimum and adopting only small changes for the adjustment of ANFIS parameters. NASA 93 dataset with 93 software project values was used to conduct the experimental validation. Experimental outcomes have proved evidently that the software project risks evaluation were done accurately and effectively using proposed integrated fuzzy-based framework.
Article
Full-text available
The initial stage of the software development lifecycle is the requirement gathering and analysis phase. Predicting risk at this phase is very crucial because cost and efforts can be saved while improving the quality and efficiency of the software to be developed. The datasets for software requirements risk prediction have been adopted in this paper to predict the risk levels across the software projects and to ascertain the attributes that contribute to the recognized risk in the software projects. A supervised machine learning technique was used to predict the risk across the projects using Naïve Bayes Classifier technique. The model was able to predict the risks across the projects and the performance metrics of the risk attributes were evaluated. The model predicted four (4) as Catastrophic, eleven (11) as High, eighteen (18) as Moderate, thirty-three (33) as Low and seven (7) as insignificant. The overall confusion matrix statistics on the risk levels prediction by the model had accuracy to be 98% with confidence interval (CI) of 95% and Kappa 97%.
Article
Recently, software project failures have been increasing due to lack of planning and budget constraints. In this regard, identifying the suitable software model with the consideration of risk factors is imperative. Therefore, this study investigates the key software models utilized in the industry through an interaction with software development experts and literature survey. In this study, 15 standard indicators were chosen where a survey was conducted through a questionnaire. The major performance metrics which were taken into consideration are network, security, software, machine learning, internet of things and application programming interface. We proposed a novel framework for the received dataset through questionnaire in which the machine learning classifiers were applied and risk predictions for each of the identified software models were accomplished. Using this outcome, software product managers can identify the appropriate software model according to the software requirements along with risk prediction percentage.
Article
Context Risk event management has become strategic in Project Management, where uncertainties are inevitable. In this sense, the use of concepts of ubiquitous computing, such as contexts, context histories, and mobile computing can assist in proactive project management. Objective This paper proposes a computational model for the reduction of the probability of project failure through the prediction of risks. The purpose of the study is to show a model to assist teams to identify and monitor risks at different points in the life cycle of projects. The work presents as scientific contribution to the use of context histories to infer the recommendation of risks to new projects. Method The research conducted a case study in a software development company. The study was applied in two scenarios. The first involved two teams that assessed the use of the prototype during the implementation of 5 projects. The second scenario considered 17 completed projects to assess the recommendations made by the Átropos model comparing the recommendations with the risks in the original projects. In this scenario, Átropos used 70% of each project's execution as learning for the recommendations of risks generated to the same projects. Thus, the scenario aimed to assess whether the recommended risks are contained in the remaining 30% of the executed projects. We used as context histories, a database with 153 software projects from a financial company. Results A project team with 18 professionals assessed the recommendations, obtaining a result of 73% acceptance and 83% accuracy when compared to projects already being executed. The results demonstrated a high percentage of acceptance of the recommendation of risks compared to the other models that do not use the characteristics and similarities of projects. Conclusion The results show the applicability of the risk recommendation to new projects, based on the similarity analysis of context histories. This study applies inferences on context histories in the development and planning of projects, focusing on risk recommendation. Thus, with recommendations considering the characteristics of each new project, the manager starts with a larger set of information to make more assertive project planning.