ArticlePDF Available

A Predictive Risk Model for Software Projects’ Requirement Gathering Phase

Authors:

Abstract and Figures

The initial stage of the software development lifecycle is the requirement gathering and analysis phase. Predicting risk at this phase is very crucial because cost and efforts can be saved while improving the quality and efficiency of the software to be developed. The datasets for software requirements risk prediction have been adopted in this paper to predict the risk levels across the software projects and to ascertain the attributes that contribute to the recognized risk in the software projects. A supervised machine learning technique was used to predict the risk across the projects using Naïve Bayes Classifier technique. The model was able to predict the risks across the projects and the performance metrics of the risk attributes were evaluated. The model predicted four (4) as Catastrophic, eleven (11) as High, eighteen (18) as Moderate, thirty-three (33) as Low and seven (7) as insignificant. The overall confusion matrix statistics on the risk levels prediction by the model had accuracy to be 98% with confidence interval (CI) of 95% and Kappa 97%.
Content may be subject to copyright.
Volume 5, Issue 6, June 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
IJISRT20JUN066 www.ijisrt.com 231
A Predictive Risk Model for Software Projects’
Requirement Gathering Phase
Beatrice O. Akumba1, Samera U. Otor1, Iorshase Agaji2 and Barnabas T. Akumba3
1Department of Mathematics/Computer Science, Benue State University Makurdi, Nigeria
2Department of Mathematics/Statistics/Computer Science, Federal University of Agriculture Makurdi, Nigeria
3Management Sciences for Health (MSH), Abuja.
Abstract:- The initial stage of the software development
lifecycle is the requirement gathering and analysis phase.
Predicting risk at this phase is very crucial because cost
and efforts can be saved while improving the quality and
efficiency of the software to be developed. The datasets
for software requirements risk prediction have been
adopted in this paper to predict the risk levels across the
software projects and to ascertain the attributes that
contribute to the recognized risk in the software projects.
A supervised machine learning technique was used to
predict the risk across the projects using Naïve Bayes
Classifier technique. The model was able to predict the
risks across the projects and the performance metrics of
the risk attributes were evaluated. The model predicted
four (4) as Catastrophic, eleven (11) as High, eighteen
(18) as Moderate, thirty-three (33) as Low and seven (7)
as insignificant. The overall confusion matrix statistics
on the risk levels prediction by the model had accuracy
to be 98% with confidence interval (CI) of 95% and
Kappa 97%. It was seen that with respect to the risk
levels across the projects, probability and priority are
the most significant variables in predicting Risk levels.
Software Project Managers should put in place measures
to reduce the factors (probability and priority) that
increase the chances of occurrence of the identified risks.
Keywords:- Naïve Bayes Classifier, SDLC, Risk Prediction,
Software Projects, Risk Outcomes, Risk Levels.
I. INTRODUCTION
The development of most software projects is mostly
characterised by high failure rates. These failure rates are
attributed to the uncertain events that occur in the Software
Development Life Cycle (SDLC) process and as a result of
which has led to the to the potential loss of software in most
organisations. These uncertain events/occurrences are
referred to as risks and they emanate from divergent risk
factors that are embedded in the heterogeneous activities of
the software development lifecycle. There is need for these
risks to be identified timely else they become the cause of
the software project failure (Salih and Ammar, 2017).
According to several surveys carried out by the
Standish Group of Companies, 16% of software projects are
on time and on budget, 52.7% are delivered with less
functionality/performance and 31.1% are scrapped before
completion by most organisations (Standish Group of
Companies, 2019). These are attributed to inherent risks
found in most software projects. Thus, there is need for
these risk factors to be identified and mitigated through risk
predictions before they become a threat to the software
projects.
The requirement gathering and analysis phase is the
first stage of the software development lifecycle. Cost and
efforts can be salvaged if the risk at this stage is predicted. It
will effectively lessen the occurrences of software project
failures too.
A lot of methods have been developed for predicting
risk in the SDLC ranging from models (Hu et al., 2009), to
several machine learning methods (Hu et al., 2015) among
others. The Artificial Neural Network (ANN) and Support
Vector Machine (SVM) were used to predict and manage
software development risks in an entire project. They
formulated a model for identifying risk and used the data
gathered through questionnaires that were administered to
software development companies to develop the risk
prediction model. (Hu et al., 2009). Also, the ensemble
learning techniques was used to predict risk in software
projects by using classifier ensemble methods of decision
trees (DT) based on bagging and SVM. The data used was
also gathered through questionnaires.
Nevertheless, as most data used for risk prediction of
most software projects were gotten through questionnaires
by most researchers, Shaukat et al., (2018) developed a
dataset that accommodated most software requirements and
their attributes that are required and necessary for the
prediction of risks in most software projects. This is because
they noted that infrequent techniques to predict risks at the
requirement gathering stage/phase of the SDLC existed and
as such, there were no datasets containing risk attributes for
software risk prediction. They went further to developed a
dataset based on the Software Requirements Specification
(SRS) of new projects and used classification techniques to
refine the data.
It is to this end that in this paper, the datasets by
Shaukat et al., (2018) has been adopted to carry out software
risks predictions in the requirement gathering phase of the
SDLC. A model was created and trained using machine
learning techniques to predict the risks across the projects
Volume 5, Issue 6, June 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
IJISRT20JUN066 www.ijisrt.com 232
and the performance metrics of the risk attributes were
evaluated with respect to the risk levels across the projects.
The model was formulated and trained using the Naïve
Bayes classifier and the predictions across the projects
showed that the probability and priority variables were
highly significant in the prediction of risks in software
projects.
The rest of this paper is organised as follows: in
section 2, we review some related literatures in the subject
area, while section 3 describes the materials and methods
used for the study. Section 4 is centered on discussions and
results and section 5 is conclusion and further work.
II. LITERATURE REVIEW
Hu et al., (2009) proposed that software project
development process is associated with some levels of risks
and are accompanied by high failure rates. In order to
mitigate the risks, they proposed an intelligent model that
was able to predict and manage the risks inherent in all
software projects. The model developed was formal and
intelligent enough to identify risk by gathering real life
instances from software development companies using
machine learning algorithms of Artificial Neural Networks
(ANN) and Support Vector Machine (SVM). The model was
able to predict risk in all the projects through the data
gathered from the questionnaires that were used to collect
data as there were no existing dataset to be used for the
prediction. Kawamuram et al., (2017) noted that, over 70%
of most software-based project development ends up as
failures due to risk. In order to enable successful software
project rates, they recommended some software risks
predictions for various Information Technology (IT) vendors
and organisations by collecting data of 332 projects of
different IT vendors through survey over the internet. They
developed a success/failure rate risk prediction algorithm
using Naïve Bayes classifier techniques on the data
collected. They concluded that the prediction of the
successful/failure rates of software projects helped
immensely to identify the projects that organisations needed
to know if they are risk prone or not in an order of priority.
However, the study was not able to show the success/failure
rate at each level of the SDLC process of projects.
Salih and Amar (2017) also emphasised that the
growth in the complexity of software systems has made
software performance predictions a difficult task. They
addressed the problem using a model-based approach for
resource utilization and software performance risk
prediction. They employed machine learning techniques to
predict the performance risk and resource utilisation.
Also, Christiansen et al., (2015) in their study used
multiple logistic regression to predict software development
risk based on data gathered from questionnaires from some
experts which analysed the factors of risk stratification and
risk factors. They employed statistical integration to
illustrate the risk factors which were anticipated and
managed by minimizing the risk that occurred during the
software development processes. A combination of factor
analysis and logistic regression were used to predict the
classification probability of the failure or success of
software projects. Their study concluded by stressing that
there are risk factors which are inherent in software
development processes and must be known and addressed to
enable software projects to be delivered and completed on
time. Their data was gathered through questionnaires that
were administered to experts because there were no dataset
templates available for software projects development life
cycle phases.
However, Shaukat et al., (2018) recognized the
requirement gathering phase of the SDLC as the most
important and demanding phase amongst the other phases.
They recognized that there was no explicit dataset from real
life software projects that contained all the attributes of
software requirements and their risks which can be used to
predict risks in new software projects. They proposed a
dataset for the requirement gathering phase of the SDLC
which contains the requirements gotten from the Software
Requirements Specification (SRS) of some software projects
and their risk attributes. It also has the correlation between
the requirements and risks. The dataset developed is an apt
template that can be used as a tool for the prediction of risks
in software projects. The dataset was subjected to three
preprocessing filter techniques of Normalisation,
Standardisation and Discretisation to get a better accuracy
from unsupervised learning preprocessors in WEKA. The
dataset is a clean and accurate template and also acts as a
data source to be used by researchers for most risk decision
support systems and for predicting risk at the requirement
gathering phase of the SDLC.
However, this paper seeks to adopt the dataset
developed by Shaukat et al., (2018) to predict the risk levels
and the attributes that contribute to the recognized risk in the
software projects. It is worth noting that the significant
difference between this paper and the work of Shaukat et al.,
(2018) is the prediction of the risk levels across the projects
in the requirement gathering phase using the dataset
developed by Shaukat et al., (2018) as the risk level
predictions were not done in their study.
III. MATERIALS AND METHODS
Machine Learning techniques are powerful tools for
software risk predictions. The supervised machine learning
technique has been employed in this paper using the Naïve
Bayes classifiers machine learning algorithm. Supervised
learning is a learning model that is developed to make
predictions based on an unforeseen input instance. A
supervised learning algorithm accepts a known set of input
dataset which are known responses to the data (output) to
learn the classification/regression model. It is a learning
algorithm that trains a model to produce a prediction for the
response to new data or the test dataset. Supervised learning
uses classification algorithms and regression techniques to
develop predictive models. Classification task predicts
discrete responses and it is recommended and applied for
only data that can be categorized, tagged, or separated into
specific groups or classes.
Volume 5, Issue 6, June 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
IJISRT20JUN066 www.ijisrt.com 233
Naïve Bayes classifiers are statistical classifiers that
assume that the probability of a given record in the dataset
belongs to a particular class and can be predicted using the
class membership probabilities of the classifier. Naïve Bayes
classification is based on Baye’s theorem. The Naïve Bayes
classifier requires that all the variables must be categorical.
It is purely probabilistic and each independent variable
contributes to a decision. It produces models based on the
combined probabilities of a dependent variable being
associated with the different categories of the dependent
variables. The Naïve Bayes Classifier used five levels of
categories to represent the risk outcome levels and they are
Catastrophic, High, Moderate, Low and Insignificant
respectively.
R programming language was used for the
implementation and testing of the risk outcome prediction
model. R programming is one of the promising languages
for machine learning and data science as it provides
excellent visualization features which are essential to
explore the data before pushing it to any automated learning
and assessing the results of the learning algorithm.
A. The Dataset Used for the Risk Prediction
The dataset used for this study was derived from the
Dataset on Software Requirement Risk Prediction from
https://doi.org/10.5281/zenodo.1209601 (Shaukat et al.,
2018). The dataset provided 299 data instances for risk
prediction at the initial phase of the software development
lifecycle (requirement gathering phase). The dataset consists
of twelve (12) variable attributes and one (1) target variable
(outcome). The attributes of probability, magnitude of risk,
impact, priority and risk levels were modified to reflect
categorical values that can be used by the Naïve Bayes
Classifier as the classification task predicts discrete
responses and it is recommended and applied for only data
that can be categorized, tagged, or separated into specific
groups or classes. The risk level attribute was included to
serve as the target variable.
B. The Algorithm of the Risk Prediction Model
The algorithm of the Risk Prediction Model is
described using a flowchart as shown in Figure 1. The
flowchart of the Risk Outcome Prediction Model using
Naïve Bayes Classifier is used to show the step by step
method used to achieve correct predictions on the given
dataset adopted.
Fig 1:- The Flowchart of the Risk Outcome Prediction
Model using Naïve Bayes Classifier
Invoke the Naïve Bayes
Function Library (e1071)
Create Naïve Bayes Model
using the training dataset
Perform prediction
Output Prediction
Result
Stop
Start
Read Dataset
file into R
Set outcome according to risk
levels
Partition dataset into training and
testing
Load required R Packages
from the R Repository
Volume 5, Issue 6, June 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
IJISRT20JUN066 www.ijisrt.com 234
IV. RESULTS AND DISCUSSION
The results of this paper were analysed using R and
their respective interfaces are screenshot and presented for
discussions. The discussions are based on the dataset
partition, the risk outcome prediction model, the risk
outcome prediction model performance and analysis and the
overall performance plot results are being discussed
respectively with figures to depict them accordingly.
A. The Dataset Partition (Training and Testing)
The dataset used for the risk prediction model was split
into two phases. They are; the training and the test instances.
75% of the dataset was used for training the Risk Outcome
Prediction Model while 25% of the dataset was used for the
testing and validation of the model. Figure 2 shows the split
dataset and their corresponding properties in a tabular form
using prop.table in R.
Fig 2:- The Partitions of the Train and Test Dataset
Instances and their Dimensions
B. The Risk Outcome Prediction (RISOP) Model using
Naïve Bayes Classifier
A Risk Outcome Prediction Model based on Naïve
Bayes Classifier for the prediction and analysis of the risk in
the software projects was created as shown in Figure 3. The
model has 226 dataset samples corresponding to the 75% of
the dataset that was partitioned for the training of the model.
The Model has 12 predictors corresponding to the attributes.
Also, Ten-fold (10-fold) Cross-Validation was applied and
the accuracies of the model stood at 98% and Kappa was
97%. The model was able to perform prediction on the new
data to test the risk levels across the datasets. It was able to
evaluate the risk levels by indicating whether they are
insignificant, low, moderate, high or catastrophic across the
software projects. The models’ evaluation and prediction
testing set are as shown in Figures3 and 4.
Fig 3:- The Risk Outcome Prediction (RISOP) Model
Fig 4:- Model Evaluation and Prediction based on the
Testing Instance
C. The Performance and Analysis of the RISOP Model
The developed RISOP Model was analysed in R. The
confusion matrix was derived from the model on the
predicted risk outcome levels as shown in Figure 5. The risk
levels predicted indicated that four (4) were truly classified
as Catastrophic, eleven (11) as High, eighteen (18) as
Moderate, thirty-three (33) as Low and seven (7) as
insignificant. The overall confusion matrix statistics on the
risk levels prediction by the model had its accuracy to be
100% with confidence interval (CI) of 95% and Kappa
100%. The statistics by class for the risk levels with respect
to sensitivity and specificity were also derived and they
stood at 100% respectively. This can be deduced from
Figure 6.
Volume 5, Issue 6, June 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
IJISRT20JUN066 www.ijisrt.com 235
Fig 5:- The Confusion Matrix of the Risk Prediction Model
Fig 6:- The Statistics by Class for the Risk Levels
D. The Overall Variable Performance Plot of the Model
A plot of the performance matrix of the variables
shows that probability and priority are the most significant
variables in the prediction of the risk levels across the
projects in the dataset used. The variables were seen to have
high values across the projects and they indicated how much
they contribute to the inherent risk in most software projects.
With the performance matrix plots, it will enable the
software project managers to ensure that they put in place
measures to reduce the identified variables in any of their
software projects. Figure 7shows the plot of the performance
matrix of the variables.
Fig 7:- Plot of the Performance Matrix of the Variables
across the Project
V. CONCLUSION
The aim of this paper is to use Naïve Bayes Classifier
to predict risks that occur during the requirement gathering
phase of the software development lifecycle (SDLC) in
some Software Projects. The requirement gathering stage is
a vital stage in the SDLC. If risks can be identified and
managed early, it will go a long way to improve software
quality and save cost. This will also lead to timely delivery
of such projects. It is to this end that a dataset for software
requirement risk prediction was gotten from the requirement
gathering phase of five major projects. The dataset was used
to build a Naïve Bayes model and was trained to predict the
risk levels based on attributes such as magnitude, impact,
probability, priority and the risk dimension to determine
whether they were catastrophic, high, moderate, low or
insignificant. The confusion matrix derived from the model
predicted four (4) correctly predicted data as Catastrophic,
Eleven (11) as High, Eighteen (18) as Moderate, Thirty-
three (33) as Low and Seven (7) as Insignificant. The Risk
Outcome Prediction (RISOP) Model had a plot of the
performance matrix of the variables to enable us know the
variable that was common and highest across the software
project. It was seen that probability and priority are the most
significant variables in predicting risk levels across the
project. If the probability of the risk occurring is high and
the priority is also known in advance, they can salvage the
entire project from failures. But if they are neglected, it will
be disastrous as the projects will not meet their timelines and
will be delivered with reduced functionalities. Therefore,
Software Project Managers should put in place measures to
reduce the probability of the identified variables of
probability and priority of risk occurrences in their software
projects to foster full functionalities of the systems
delivered.
Volume 5, Issue 6, June 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
IJISRT20JUN066 www.ijisrt.com 236
It is known that research is an ongoing process that
does not terminate, and as such, we advise that further work
should be done to ascertain the minimal values that each of
the variables must hold before that can be said to have
contributed to the risk levels across the projects. The model
was able to predict risk levels across projects in the
requirement gathering phase of the software development
life cycle, predictions can be done on other phases of the
software development lifecycle such as the design, coding,
testing and maintenance phases to ascertain the attributes
that lead to risk in most software projects. Also,
benchmarking this model on the requirement gathering
phase with the datasets gotten from the other phases of the
software development lifecycle should be done to ascertain
the phase with the highest risk levels in the software
projects.
REFERENCES
[1]. Christiansen, T., Wuttidittachotti, P., Somchai, P. and
Vallipakorn, S.A. (2015). Prediction of Risk Factors of
Software Development Project by using Multiple
Logistic Regression. ARPN Journal of Engineering
and Applied Sciences. 10(3),1324 -1331.
[2]. Hu, Y., Feng, B., Mob, X., Zhang, X. Z., Ngai, E. W.
T., Fan, M. and Liu, M. (2015). Elsevier Journals on
Decision Support Systems. 72, 11-23 [online].
Retrieved on 9th August, 2019 from
https://www.sciencedirect.com/science/article/pii/S016
7923615000238
[3]. Hu, Y., Zhang, X., Sun, X., Liu, M. and Du, J. (2009).
An Intelligent Model for Software Project Risk
Prediction. International in Conference on Information
Management, Innovation Management and Industrial
Engineering.(p. 629 [online] retrieved on the 5th of
May, 2019 from
https://ieeexplore.ieee.org/abstract/document/5368175
[4]. Kawamura, T., Toma, T. and Takano, K. (2017).
Outcome Prediction of Software Projects for
Information Technology Vendors. Conference
Proceedings of the 2017 IEEE IEEM.[online] retrieved
on the 9th of May, 2019
fromhttps://ieeexplore.ieee.org/document/8290188
[5]. Salih, H. A. M. and Ammar, H.H. (2017). Model-
Based Resource Utilisation and Performance Risk
Prediction Using Machine Learning Techniques.
International Journal on Informatics Visualisation
(JOIV). 1(3), p 101-109.
[6]. Shaukat, Z., Naseem, R. and Zubar, M. (2018). A
Dataset for Software Requirement Risk Prediction. In
IEEE International Conference on Computational
Science and Engineering. (pp. 112-118)[online]
retrieved on the 9th of May, 2019
fromhttps://zenodo.org/record/1209601#.Xs5psmhKjI
U
[7]. The Standish Group, “Extreme
chaos”.www.standishgroup.com 2019. [online]
Retrieved on the 10th of March, 2019 from
https://www.projectsmart.co.uk/white-papers/chaos-
report.pdf
... A risk is an uncertain event that occurs in the SDLC process and because of which has led to the potential loss of software in most organizations. It increases most software projects' failure rate [5]. Therefore, risk assessment early in the life cycle of software development is very important [4]. ...
... Moreover, the accuracy of the results obtained using this dataset is low and requires improvement Dillibabu's paper [12] presents an innovative approach for software risk prediction, which combines fuzzy-based TOP-strengths of multiple machine learning algorithms to improve overall predictive performance. In addition to this, previous researches [5] [8] [9] [10] are limited in specifying the stages of risk occurrence during dataset collection. These studies primarily focus on risks that arise during the requirement stage, neglecting the risks associated with the design stage. ...
... Akumba's study [5] aimed to predict software risk during the requirement stage. However, their research did not consider other critical software development stages such as the design stages. ...
... Akumba et al. [29] used an NB classifier in risk prediction during the requirement elicitation phase of the SDLC in certain software projects. The NB model was built based on risk dataset features, such as size, effect, likelihood, priority, and risk dimension, to decide whether they were catastrophic, high, moderate, low, or inconsequential. ...
... The dataset has 299 instances, and 13 features across five different class labels. Appendix A (Table A1) tabulates the components of the dataset, and further details of the dataset are provided in [1,2,29]. ...
... The primary objective of this assessment is to ascertain and authenticate the effectiveness of the FURIA model in software risk prediction in contrast to the baseline rule-based and ML-based models. The investigated rule-based (rough set (RS), partial decision tree (PART), RIPPER) and ML-based (SVM, k nearest neighbour (KNN), DT, average one dependency estimator (A1DE), NB, random forest (RF)) models are selected based on their reported predictive performances in existing software risk prediction studies and other ML processes [7,11,28,29,37,38]. In addition, these models have diverse computational characteristics and are aimed at introducing heterogeneity to empirical experimentation. ...
Article
Full-text available
The development of most modern software systems is accompanied by a significant level of uncertainty, which can be attributed to the unanticipated activities that may occur throughout the software development process. As these modern software systems become more complex and drawn out, escalating software project failure rates have become a critical concern. These unforeseeable uncertainties are known as software risks, and they emerge from many risk factors inherent to the numerous activities comprising the software development lifecycle (SDLC). Consequently, these software risks have resulted in massive revenue losses for software organizations. Hence, it is imperative to address these software risks, to curb future software system failures. The subjective risk assessment (SRM) method is regarded as a viable solution to software risk problems. However, it is inherently reliant on humans and, therefore, in certain situations, imprecise, due to its dependence on an expert’s knowledge and experience. In addition, the SRM does not allow repeatability, as expertise is not easily exchanged across the different units working on a software project. Developing intelligent modelling methods that may offer more unbiased, reproducible, and explainable decision-making assistance in risk management is crucial. Hence, this research proposes enhanced fuzzy induction models for software requirement risk prediction. Specifically, the fuzzy unordered rule induction algorithm (FURIA), and its enhanced variants based on nested subset selection dichotomies, are developed for software requirement risk prediction. The suggested fuzzy induction models are based on the use of effective rule-stretching methods for the prediction process. Additionally, the proposed FURIA method is enhanced through the introduction of nested subset selection dichotomy concepts into its prediction process. The prediction performances of the proposed models are evaluated using a benchmark dataset, and are then compared with existing machine learning (ML)-based and rule-based software risk prediction models. From the experimental results, it was observed that the FURIA performed comparably, in most cases, to the rule-based and ML-based models. However, the FURIA nested dichotomy variants were superior in performance to the conventional FURIA method, and rule-based and ML-based methods, with the least accuracy, area under the curve (AUC), and Mathew’s correlation coefficient (MCC), with values of approximately 98%.
... A successful software development process depends on early risk prediction in the requirements, as late risk recognition can have a negative impact on the project's quality, schedule, and money. We have identified that this problem is broadly mentioned in the studies analyzed in this section [3], [9]- [13]. As the initial phase of the SDLC, accurately forecasting risks in software requirements can enhance the software's productivity and quality. ...
... Akumba et al. [13], the main objective was to use supervised ML methods, namely the Naive Bayes classifier, to create a predictive risk model. The purpose of this approach was to forecast the degree of risk in software projects and to pinpoint the essential components that raise that risk. ...
Article
Full-text available
Software requirements are the most critical phase focused on documenting, eliciting, and maintaining the stakeholders' requirements. Risk identification and analysis are preemptive actions designed to anticipate and prepare for potential issues. Usually, this classification of risks is done manually, a practice that the personal judgment of the risk analyst or the project manager might influence. Machine learning (ML) techniques were proposed to predict the risk level in software requirements. The techniques used were logistic regression (LR), multilayer perceptron (MLP) neural network, support vector machine (SVM), decision tree (DT), naive bayes, and random forest (RF). Each model was trained and tested using cross-validation with k-folds, each with its respective parameters, to provide optimal results. Finally, they were compared based on precision, accuracy, and recall metrics. Statistical tests were performed to determine if there were significant differences between the different ML techniques used to classify risks. The results concluded that the DT and RF are the techniques that best predict the risk level in software requirements.
... An NB classifier was used in risk prediction by Akumba et al. [16] during the SDLC's requirement elicitation stage. Based on the characteristics of the risk dataset, the NB model emphasized the significance of probability and priority in forecasting risk levels. ...
Article
Full-text available
The increasing complexity of software projects makes it difficult to predict risks in software requirements, which is a crucial and essential part of the Software Development Life Cycle (SDLC). The failure of a software project may occur from an inability to appropriately anticipate such risks. Because it is the first stage of any software project, risk prediction has a greater significance in software requirements. Thus, ForExPlusPlus (FEPP), a novel model for risk prediction in software requirements, is proposed in this work. Standard models such as K-nearest Neighbor (KNN), Naïve Bayes (NB), Logistic Model Tree (LMT), Random Forest (RF), and Support Vector Machine (SVM) are used to benchmark the suggested model. The dataset from the Zenodo repository is used to train these models, and standard assessment criteria are used to evaluate the results. The accuracy analysis of the models is assessed critically using the precision, F-measure (FM), and Mathew’s correlation coefficient (MCC), as well as the error rate using the Kappa Statistic (KS) and Mean Absolute Error (MAE). The suggested FEPP performs better overall, with an accuracy of 96.84%, whereas KNN performs the worst, with an accuracy of 50.99%.
... If both the risk likelihood and the priority are known in advance, they can prevent the failure of the entire project. However, if it is ignored, it will be terrible as projects won't be completed on time and will have less usefulness [45]. ...
Article
Full-text available
Software engineering and data science require strong programming skills. Software engineering focuses more on construction, functionality, and features, while software risk forecasting focuses more on data collection and analysis. A high level of system functionality is one of the basic needs of software development projects. One of the main characteristics that directly affects the effectiveness of software systems is the prediction of risks. Organizations can make decisions about potential solutions and improvements by using the ability to identify software systems risks through early recognition of expected failures. Inaccurate risk assessments may result in poor system performance and thus reveal its reliability. This research focuses on reviewing mechanisms for predicting early failure in software project risk assessment. Various ML machine learning techniques are used. The aim of the study is to review experience-based risk assessment models that use historical failure data from several past program projects as training data to accurately assess the risks of program initiatives. This study covers software project risk prediction models that are generally applied to all software projects throughout the software development process, helping advance the evolution of software systems.
... According to a survey by the Standish Group of Companies, 16% of software projects meet schedule deadlines and cost estimates, 52.7% are delivered to customers with less functionality than expected and 31.1% are delivered damaged or uncompleted due to the inherent risks (Akumba et al., 2020). Despite the use of project management practices, most of the projects remain unfulfilled in terms of achieving targeted performance due to ineffective management of risks (Maqsoom et al., 2020). ...
Article
Purpose Taking a co-creation perspective and integrating knowledge-based and resource-based perspectives, the authors examine the role of customer participation in organizational performance and project success. The authors also investigate the mediating role of knowledge integration and the moderating role of requirement risk for these relationships in uncertain contexts. Design/methodology/approach The authors undertook two studies. The first study was carried out in 2018 in which the authors drew on survey data from 150 information technology (IT) sector employees and examined the mediating role of knowledge integration in the relationship of customer participation with organizational performance and project success. In the second study undertaken in 2020, the authors drew on data from 92 IT and telecom sector employees and examined the moderating role of requirement risk in the relationship between customer participation and knowledge integration. Study 2 was conducted during the COVID-19 pandemic when employees were largely working from home and were more sensitive to risks and uncertainty about the scope and system requirements. Both studies were survey-based, and analysis was carried out using structural equation modeling. Findings The authors’ two-study examination indicated that knowledge integration positively mediates the relationship of customer participation with organizational performance and project success during the co-creation process. Furthermore, the authors demonstrate that when requirement risks are high, customer participation relationship with knowledge integration is weaker. Originality/value The authors show that integrating customer knowledge is critical to project success and organizational performance. By identifying risk uncertainties and environmental contingencies, the authors highlight the constraints of customer participation for knowledge integration, organizational performance and project success. The authors provide some key study findings based on survey data obtained from project teams during two periods (normal and pandemic).
Article
The primary purpose of a software risk assessment is to predict risks and vulnerabilities that may exist in each phase of the software development life cycle (SDLC). Risk factors have a significant impact on the timeline, budget, and quality of software development. It's very important to know and understand the risks before they can be effectively managed. Researchers have developed several tools to manage the risk that help reduce the number of failed software projects and increase the number of successful software projects. This study aims to ascertain which risks are important and how often they happen, and to explore and reveal the situations where the risks could lead to software failure in the design phase. We are developing a model that can predict risks during the design process so that we can find the risk factors that lead to risks in software development. These risks have been analyzed, classified, and incorporated into Risk Prediction Trees (RPTs). Bayesian network (BN) techniques have been used to propose a model for estimating the probability of risk during the software design phase. The Bayesian network approach is used because the data can be obtained from software that has already been used. It has the flexibility to predict risk in real-time. And it has the best risk prediction rates when it comes to potential risk factors. The outcome of this study shows that, compared to other standard machine-learning approaches, BN can be used to predict possible risks in the early software design phase.
Article
Full-text available
The growing complexity of modern software systems makes the performance prediction a challenging activity. Many drawbacks incurred by using the traditional performance prediction techniques such as time consuming and inability to surround all software system when large scaled. To contribute to solving these problems, we adopt a model-based approach for resource utilization and performance risk prediction. Firstly, we model the software system into annotated UML diagrams. Secondly, performance model is derived from UML diagrams in order to be evaluated. Thirdly, we generate performance and resource utilization training dataset by changing workload. Finally, when new instances are applied we can predict resource utilization and performance risk by using machine learning techniques. The approach will be used to enhance work of human experts and improve efficiency of software system performance prediction. In this paper, we illustrate the approach on a case study. A performance training dataset has been generated, and three machine learning techniques are applied to predict resource utilization and performance risk level. Our approach shows prediction accuracy within 68.9 % to 93.1 %.
Article
Full-text available
This research aimed to predict the risks in software development projects by applying multiple logistic regression. The logistic regression was used as a tool to control the software development process. These consisted of the risk stratification and causal risk factors analyses. This statistical integration was intended to establish the risk factors, anticipated and minimized the risk, which can occur during processes of software development. The factor analysis incorporated with logistic regression was used to predict the risk classification probability of failure or success of software development. The logistic regression analyses can grade and help to point out the risk factors, which were important problems in development processes. These analytical results can lead to create and development of strategies and highlighted problems, which are important issues to manage, control and reduce the risks of error. The result from classification of questionnaires of software development risk analyses by SPSS program had overall prediction accuracy at 90%.
Conference Paper
Software project development is a risky process with high failure rate. This paper proposes an intelligent model that can predict and control software development risks from an overall project perspective rather than focusing only on the single factor, project output. In this study, we first constructed a formal model for risk identification, and then collected actual cases from software development companies to build a risk prediction model. In order to evaluate the performance of our model, two machine learning algorithms, Artificial Neural Networks (ANN) and Support Vector Machine (SVM), are compared. The experiments show that our risk prediction model based on SVM achieves better performance in prediction.