ArticlePDF Available

A Predictive Risk Model for Software Projects’ Requirement Gathering Phase

June 2020

June 2020
5(6):231-236

DOI:10.38124/IJISRT20JUN066

Authors:

Beatrice Akumba

Benue State University, Makurdi

Samera Otor

Benue State University, Makurdi

Iorshase Agaji

Federal University of Agriculture Makurdi, Nigeria

Barnabas Akumba

HISP Nigeria

The initial stage of the software development lifecycle is the requirement gathering and analysis phase. Predicting risk at this phase is very crucial because cost and efforts can be saved while improving the quality and efficiency of the software to be developed. The datasets for software requirements risk prediction have been adopted in this paper to predict the risk levels across the software projects and to ascertain the attributes that contribute to the recognized risk in the software projects. A supervised machine learning technique was used to predict the risk across the projects using Naïve Bayes Classifier technique. The model was able to predict the risks across the projects and the performance metrics of the risk attributes were evaluated. The model predicted four (4) as Catastrophic, eleven (11) as High, eighteen (18) as Moderate, thirty-three (33) as Low and seven (7) as insignificant. The overall confusion matrix statistics on the risk levels prediction by the model had accuracy to be 98% with confidence interval (CI) of 95% and Kappa 97%.

The Partitions of the Train and Test Dataset Instances and their Dimensions

…

The Risk Outcome Prediction (RISOP) Model

…

The Confusion Matrix of the Risk Prediction Model

…

Plot of the Performance Matrix of the Variables across the Project

…

Figures - uploaded by Beatrice Akumba

Content may be subject to copyright.

Content uploaded by Beatrice Akumba

Content may be subject to copyright.

Volume 5, Issue 6, June – 2020 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

IJISRT20JUN066 www.ijisrt.com 231

A Predictive Risk Model for Software Projects’

Requirement Gathering Phase

Beatrice O. Akumba1, Samera U. Otor1, Iorshase Agaji2 and Barnabas T. Akumba3

1Department of Mathematics/Computer Science, Benue State University Makurdi, Nigeria

2Department of Mathematics/Statistics/Computer Science, Federal University of Agriculture Makurdi, Nigeria

3Management Sciences for Health (MSH), Abuja.

Abstract:- The initial stage of the software development

lifecycle is the requirement gathering and analysis phase.

Predicting risk at this phase is very crucial because cost

and efforts can be saved while improving the quality and

efficiency of the software to be developed. The datasets

for software requirements risk prediction have been

adopted in this paper to predict the risk levels across the

software projects and to ascertain the attributes that

contribute to the recognized risk in the software projects.

A supervised machine learning technique was used to

predict the risk across the projects using Naïve Bayes

Classifier technique. The model was able to predict the

risks across the projects and the performance metrics of

the risk attributes were evaluated. The model predicted

four (4) as Catastrophic, eleven (11) as High, eighteen

(18) as Moderate, thirty-three (33) as Low and seven (7)

as insignificant. The overall confusion matrix statistics

on the risk levels prediction by the model had accuracy

to be 98% with confidence interval (CI) of 95% and

Kappa 97%. It was seen that with respect to the risk

levels across the projects, probability and priority are

the most significant variables in predicting Risk levels.

Software Project Managers should put in place measures

to reduce the factors (probability and priority) that

increase the chances of occurrence of the identified risks.

Keywords:- Naïve Bayes Classifier, SDLC, Risk Prediction,

Software Projects, Risk Outcomes, Risk Levels.

I. INTRODUCTION

The development of most software projects is mostly

characterised by high failure rates. These failure rates are

attributed to the uncertain events that occur in the Software

Development Life Cycle (SDLC) process and as a result of

which has led to the to the potential loss of software in most

organisations. These uncertain events/occurrences are

referred to as risks and they emanate from divergent risk

factors that are embedded in the heterogeneous activities of

the software development lifecycle. There is need for these

risks to be identified timely else they become the cause of

the software project failure (Salih and Ammar, 2017).

According to several surveys carried out by the

Standish Group of Companies, 16% of software projects are

on time and on budget, 52.7% are delivered with less

functionality/performance and 31.1% are scrapped before

completion by most organisations (Standish Group of

Companies, 2019). These are attributed to inherent risks

found in most software projects. Thus, there is need for

these risk factors to be identified and mitigated through risk

predictions before they become a threat to the software

projects.

The requirement gathering and analysis phase is the

first stage of the software development lifecycle. Cost and

efforts can be salvaged if the risk at this stage is predicted. It

will effectively lessen the occurrences of software project

failures too.

A lot of methods have been developed for predicting

risk in the SDLC ranging from models (Hu et al., 2009), to

several machine learning methods (Hu et al., 2015) among

others. The Artificial Neural Network (ANN) and Support

Vector Machine (SVM) were used to predict and manage

software development risks in an entire project. They

formulated a model for identifying risk and used the data

gathered through questionnaires that were administered to

software development companies to develop the risk

prediction model. (Hu et al., 2009). Also, the ensemble

learning techniques was used to predict risk in software

projects by using classifier ensemble methods of decision

trees (DT) based on bagging and SVM. The data used was

also gathered through questionnaires.

Nevertheless, as most data used for risk prediction of

most software projects were gotten through questionnaires

by most researchers, Shaukat et al., (2018) developed a

dataset that accommodated most software requirements and

their attributes that are required and necessary for the

prediction of risks in most software projects. This is because

they noted that infrequent techniques to predict risks at the

requirement gathering stage/phase of the SDLC existed and

as such, there were no datasets containing risk attributes for

software risk prediction. They went further to developed a

dataset based on the Software Requirements Specification

(SRS) of new projects and used classification techniques to

refine the data.

It is to this end that in this paper, the datasets by

Shaukat et al., (2018) has been adopted to carry out software

risks predictions in the requirement gathering phase of the

SDLC. A model was created and trained using machine

learning techniques to predict the risks across the projects

Volume 5, Issue 6, June – 2020 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

IJISRT20JUN066 www.ijisrt.com 232

and the performance metrics of the risk attributes were

evaluated with respect to the risk levels across the projects.

The model was formulated and trained using the Naïve

Bayes classifier and the predictions across the projects

showed that the probability and priority variables were

highly significant in the prediction of risks in software

projects.

The rest of this paper is organised as follows: in

section 2, we review some related literatures in the subject

area, while section 3 describes the materials and methods

used for the study. Section 4 is centered on discussions and

results and section 5 is conclusion and further work.

II. LITERATURE REVIEW

Hu et al., (2009) proposed that software project

development process is associated with some levels of risks

and are accompanied by high failure rates. In order to

mitigate the risks, they proposed an intelligent model that

was able to predict and manage the risks inherent in all

software projects. The model developed was formal and

intelligent enough to identify risk by gathering real life

instances from software development companies using

machine learning algorithms of Artificial Neural Networks

(ANN) and Support Vector Machine (SVM). The model was

able to predict risk in all the projects through the data

gathered from the questionnaires that were used to collect

data as there were no existing dataset to be used for the

prediction. Kawamuram et al., (2017) noted that, over 70%

of most software-based project development ends up as

failures due to risk. In order to enable successful software

project rates, they recommended some software risks

predictions for various Information Technology (IT) vendors

and organisations by collecting data of 332 projects of

different IT vendors through survey over the internet. They

developed a success/failure rate risk prediction algorithm

using Naïve Bayes classifier techniques on the data

collected. They concluded that the prediction of the

successful/failure rates of software projects helped

immensely to identify the projects that organisations needed

to know if they are risk prone or not in an order of priority.

However, the study was not able to show the success/failure

rate at each level of the SDLC process of projects.

Salih and Amar (2017) also emphasised that the

growth in the complexity of software systems has made

software performance predictions a difficult task. They

addressed the problem using a model-based approach for

resource utilization and software performance risk

prediction. They employed machine learning techniques to

predict the performance risk and resource utilisation.

Also, Christiansen et al., (2015) in their study used

multiple logistic regression to predict software development

risk based on data gathered from questionnaires from some

experts which analysed the factors of risk stratification and

risk factors. They employed statistical integration to

illustrate the risk factors which were anticipated and

managed by minimizing the risk that occurred during the

software development processes. A combination of factor

analysis and logistic regression were used to predict the

classification probability of the failure or success of

software projects. Their study concluded by stressing that

there are risk factors which are inherent in software

development processes and must be known and addressed to

enable software projects to be delivered and completed on

time. Their data was gathered through questionnaires that

were administered to experts because there were no dataset

templates available for software projects development life

cycle phases.

However, Shaukat et al., (2018) recognized the

requirement gathering phase of the SDLC as the most

important and demanding phase amongst the other phases.

They recognized that there was no explicit dataset from real

life software projects that contained all the attributes of

software requirements and their risks which can be used to

predict risks in new software projects. They proposed a

dataset for the requirement gathering phase of the SDLC

which contains the requirements gotten from the Software

Requirements Specification (SRS) of some software projects

and their risk attributes. It also has the correlation between

the requirements and risks. The dataset developed is an apt

template that can be used as a tool for the prediction of risks

in software projects. The dataset was subjected to three

preprocessing filter techniques of Normalisation,

Standardisation and Discretisation to get a better accuracy

from unsupervised learning preprocessors in WEKA. The

dataset is a clean and accurate template and also acts as a

data source to be used by researchers for most risk decision

support systems and for predicting risk at the requirement

gathering phase of the SDLC.

However, this paper seeks to adopt the dataset

developed by Shaukat et al., (2018) to predict the risk levels

and the attributes that contribute to the recognized risk in the

software projects. It is worth noting that the significant

difference between this paper and the work of Shaukat et al.,

(2018) is the prediction of the risk levels across the projects

in the requirement gathering phase using the dataset

developed by Shaukat et al., (2018) as the risk level

predictions were not done in their study.

III. MATERIALS AND METHODS

Machine Learning techniques are powerful tools for

software risk predictions. The supervised machine learning

technique has been employed in this paper using the Naïve

Bayes classifiers machine learning algorithm. Supervised

learning is a learning model that is developed to make

predictions based on an unforeseen input instance. A

supervised learning algorithm accepts a known set of input

dataset which are known responses to the data (output) to

learn the classification/regression model. It is a learning

algorithm that trains a model to produce a prediction for the

response to new data or the test dataset. Supervised learning

uses classification algorithms and regression techniques to

develop predictive models. Classification task predicts

discrete responses and it is recommended and applied for

only data that can be categorized, tagged, or separated into

specific groups or classes.

Volume 5, Issue 6, June – 2020 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

IJISRT20JUN066 www.ijisrt.com 233

Naïve Bayes classifiers are statistical classifiers that

assume that the probability of a given record in the dataset

belongs to a particular class and can be predicted using the

class membership probabilities of the classifier. Naïve Bayes

classification is based on Baye’s theorem. The Naïve Bayes

classifier requires that all the variables must be categorical.

It is purely probabilistic and each independent variable

contributes to a decision. It produces models based on the

combined probabilities of a dependent variable being

associated with the different categories of the dependent

variables. The Naïve Bayes Classifier used five levels of

categories to represent the risk outcome levels and they are

Catastrophic, High, Moderate, Low and Insignificant

respectively.

R programming language was used for the

implementation and testing of the risk outcome prediction

model. R programming is one of the promising languages

for machine learning and data science as it provides

excellent visualization features which are essential to

explore the data before pushing it to any automated learning

and assessing the results of the learning algorithm.

A. The Dataset Used for the Risk Prediction

The dataset used for this study was derived from the

Dataset on “Software Requirement Risk Prediction” from

https://doi.org/10.5281/zenodo.1209601 (Shaukat et al.,

2018). The dataset provided 299 data instances for risk

prediction at the initial phase of the software development

lifecycle (requirement gathering phase). The dataset consists

of twelve (12) variable attributes and one (1) target variable

(outcome). The attributes of probability, magnitude of risk,

impact, priority and risk levels were modified to reflect

categorical values that can be used by the Naïve Bayes

Classifier as the classification task predicts discrete

responses and it is recommended and applied for only data

that can be categorized, tagged, or separated into specific

groups or classes. The risk level attribute was included to

serve as the target variable.

B. The Algorithm of the Risk Prediction Model

The algorithm of the Risk Prediction Model is

described using a flowchart as shown in Figure 1. The

flowchart of the Risk Outcome Prediction Model using

Naïve Bayes Classifier is used to show the step by step

method used to achieve correct predictions on the given

dataset adopted.

Fig 1:- The Flowchart of the Risk Outcome Prediction

Model using Naïve Bayes Classifier

Invoke the Naïve Bayes

Function Library (e1071)

Create Naïve Bayes Model

using the training dataset

Perform prediction

Output Prediction

Result

Stop

Start

Read Dataset

file into R

Set outcome according to risk

levels

Partition dataset into training and

testing

Create objects x and y to hold predicted and

response variables

Load required R Packages

from the R Repository

Volume 5, Issue 6, June – 2020 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

IJISRT20JUN066 www.ijisrt.com 234

IV. RESULTS AND DISCUSSION

The results of this paper were analysed using R and

their respective interfaces are screenshot and presented for

discussions. The discussions are based on the dataset

partition, the risk outcome prediction model, the risk

outcome prediction model performance and analysis and the

overall performance plot results are being discussed

respectively with figures to depict them accordingly.

A. The Dataset Partition (Training and Testing)

The dataset used for the risk prediction model was split

into two phases. They are; the training and the test instances.

75% of the dataset was used for training the Risk Outcome

Prediction Model while 25% of the dataset was used for the

testing and validation of the model. Figure 2 shows the split

dataset and their corresponding properties in a tabular form

using prop.table in R.

Fig 2:- The Partitions of the Train and Test Dataset

Instances and their Dimensions

B. The Risk Outcome Prediction (RISOP) Model using

Naïve Bayes Classifier

A Risk Outcome Prediction Model based on Naïve

Bayes Classifier for the prediction and analysis of the risk in

the software projects was created as shown in Figure 3. The

model has 226 dataset samples corresponding to the 75% of

the dataset that was partitioned for the training of the model.

The Model has 12 predictors corresponding to the attributes.

Also, Ten-fold (10-fold) Cross-Validation was applied and

the accuracies of the model stood at 98% and Kappa was

97%. The model was able to perform prediction on the new

data to test the risk levels across the datasets. It was able to

evaluate the risk levels by indicating whether they are

insignificant, low, moderate, high or catastrophic across the

software projects. The models’ evaluation and prediction

testing set are as shown in Figures3 and 4.

Fig 3:- The Risk Outcome Prediction (RISOP) Model

Fig 4:- Model Evaluation and Prediction based on the

Testing Instance

C. The Performance and Analysis of the RISOP Model

The developed RISOP Model was analysed in R. The

confusion matrix was derived from the model on the

predicted risk outcome levels as shown in Figure 5. The risk

levels predicted indicated that four (4) were truly classified

as Catastrophic, eleven (11) as High, eighteen (18) as

Moderate, thirty-three (33) as Low and seven (7) as

insignificant. The overall confusion matrix statistics on the

risk levels prediction by the model had its accuracy to be

100% with confidence interval (CI) of 95% and Kappa

100%. The statistics by class for the risk levels with respect

to sensitivity and specificity were also derived and they

stood at 100% respectively. This can be deduced from

Figure 6.

Volume 5, Issue 6, June – 2020 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

IJISRT20JUN066 www.ijisrt.com 235

Fig 5:- The Confusion Matrix of the Risk Prediction Model

Fig 6:- The Statistics by Class for the Risk Levels

D. The Overall Variable Performance Plot of the Model

A plot of the performance matrix of the variables

shows that probability and priority are the most significant

variables in the prediction of the risk levels across the

projects in the dataset used. The variables were seen to have

high values across the projects and they indicated how much

they contribute to the inherent risk in most software projects.

With the performance matrix plots, it will enable the

software project managers to ensure that they put in place

measures to reduce the identified variables in any of their

software projects. Figure 7shows the plot of the performance

matrix of the variables.

Fig 7:- Plot of the Performance Matrix of the Variables

across the Project

V. CONCLUSION

The aim of this paper is to use Naïve Bayes Classifier

to predict risks that occur during the requirement gathering

phase of the software development lifecycle (SDLC) in

some Software Projects. The requirement gathering stage is

a vital stage in the SDLC. If risks can be identified and

managed early, it will go a long way to improve software

quality and save cost. This will also lead to timely delivery

of such projects. It is to this end that a dataset for software

requirement risk prediction was gotten from the requirement

gathering phase of five major projects. The dataset was used

to build a Naïve Bayes model and was trained to predict the

risk levels based on attributes such as magnitude, impact,

probability, priority and the risk dimension to determine

whether they were catastrophic, high, moderate, low or

insignificant. The confusion matrix derived from the model

predicted four (4) correctly predicted data as Catastrophic,

Eleven (11) as High, Eighteen (18) as Moderate, Thirty-

three (33) as Low and Seven (7) as Insignificant. The Risk

Outcome Prediction (RISOP) Model had a plot of the

performance matrix of the variables to enable us know the

variable that was common and highest across the software

project. It was seen that probability and priority are the most

significant variables in predicting risk levels across the

project. If the probability of the risk occurring is high and

the priority is also known in advance, they can salvage the

entire project from failures. But if they are neglected, it will

be disastrous as the projects will not meet their timelines and

will be delivered with reduced functionalities. Therefore,

Software Project Managers should put in place measures to

reduce the probability of the identified variables of

probability and priority of risk occurrences in their software

projects to foster full functionalities of the systems

delivered.

Volume 5, Issue 6, June – 2020                                              International Journal of  Innovative Science and Research Technology                                                 
                                                  ISSN No:-2456-2165 
 
IJISRT20JUN066                                                    www.ijisrt.com                          236 
It  is known  that research  is an  ongoing process  that 
does not terminate, and as such, we advise that further work 
should be done to ascertain the minimal values that each of 
the  variables  must  hold  before  that  can  be  said  to  have 
contributed to the risk levels across the projects. The model 
was  able  to  predict  risk  levels  across  projects  in  the 
requirement  gathering  phase  of  the  software  development 
life cycle, predictions  can be  done on other phases  of the 
software development lifecycle such as the design, coding, 
testing  and maintenance phases  to  ascertain the  attributes 
that  lead  to  risk  in  most  software  projects.  Also, 
benchmarking  this  model  on  the  requirement  gathering 
phase with the datasets gotten from the other phases of the 
software development lifecycle should be done to ascertain 
the  phase  with  the  highest  risk  levels  in  the  software 
projects. 
 REFERENCES 
 
[1]. Christiansen, T.,  Wuttidittachotti, P., Somchai, P. and 
Vallipakorn, S.A. (2015). Prediction of Risk Factors of 
Software  Development  Project  by  using  Multiple 
Logistic  Regression.  ARPN  Journal  of  Engineering 
and Applied Sciences. 10(3),1324 -1331. 
[2]. Hu, Y., Feng, B., Mob, X., Zhang, X. Z., Ngai, E. W. 
T., Fan, M. and  Liu, M. (2015). Elsevier Journals on 
Decision  Support  Systems.  72,  11-23  [online]. 
Retrieved  on  9th  August,  2019  from 
https://www.sciencedirect.com/science/article/pii/S016
7923615000238 
[3]. Hu, Y., Zhang, X., Sun, X., Liu, M. and Du, J.  (2009). 
An  Intelligent  Model  for  Software  Project  Risk 
Prediction. International in Conference on Information 
Management,  Innovation Management  and Industrial 
Engineering.(p.  629  [online]  retrieved  on  the  5th  of 
May,  2019  from 
https://ieeexplore.ieee.org/abstract/document/5368175 
[4]. Kawamura,  T.,  Toma,  T.  and  Takano,  K.  (2017). 
Outcome  Prediction  of  Software  Projects  for 
Information  Technology  Vendors.  Conference 
Proceedings of the 2017 IEEE IEEM.[online] retrieved 
on  the  9th  of  May,  2019 
fromhttps://ieeexplore.ieee.org/document/8290188 
[5]. Salih,  H.  A.  M.  and  Ammar,  H.H.  (2017).  Model-
Based  Resource  Utilisation  and  Performance  Risk 
Prediction  Using  Machine  Learning  Techniques. 
International  Journal  on  Informatics  Visualisation 
(JOIV). 1(3), p 101-109. 
[6]. Shaukat,  Z.,  Naseem,  R.  and  Zubar,  M.  (2018).  A 
Dataset for Software Requirement Risk Prediction. In 
IEEE  International  Conference  on  Computational 
Science  and  Engineering.  (pp.  112-118)[online] 
retrieved  on  the  9th  of  May,  2019 
fromhttps://zenodo.org/record/1209601#.Xs5psmhKjI
U 
[7]. The  Standish  Group,  “Extreme 
chaos”.www.standishgroup.com，  2019.  [online] 
Retrieved  on  the  10th  of  March,  2019  from 
https://www.projectsmart.co.uk/white-papers/chaos-
report.pdf 

Software Risk Prediction at Requirement and Design Phase : An Ensemble Machine Learning Approach

Conference Paper

Full-text available

Oct 2023

Software Requirement Risk Prediction Using Enhanced Fuzzy Induction Models

Article

Full-text available

Sep 2023

The development of most modern software systems is accompanied by a significant level of uncertainty, which can be attributed to the unanticipated activities that may occur throughout the software development process. As these modern software systems become more complex and drawn out, escalating software project failure rates have become a critical concern. These unforeseeable uncertainties are known as software risks, and they emerge from many risk factors inherent to the numerous activities comprising the software development lifecycle (SDLC). Consequently, these software risks have resulted in massive revenue losses for software organizations. Hence, it is imperative to address these software risks, to curb future software system failures. The subjective risk assessment (SRM) method is regarded as a viable solution to software risk problems. However, it is inherently reliant on humans and, therefore, in certain situations, imprecise, due to its dependence on an expert’s knowledge and experience. In addition, the SRM does not allow repeatability, as expertise is not easily exchanged across the different units working on a software project. Developing intelligent modelling methods that may offer more unbiased, reproducible, and explainable decision-making assistance in risk management is crucial. Hence, this research proposes enhanced fuzzy induction models for software requirement risk prediction. Specifically, the fuzzy unordered rule induction algorithm (FURIA), and its enhanced variants based on nested subset selection dichotomies, are developed for software requirement risk prediction. The suggested fuzzy induction models are based on the use of effective rule-stretching methods for the prediction process. Additionally, the proposed FURIA method is enhanced through the introduction of nested subset selection dichotomy concepts into its prediction process. The prediction performances of the proposed models are evaluated using a benchmark dataset, and are then compared with existing machine learning (ML)-based and rule-based software risk prediction models. From the experimental results, it was observed that the FURIA performed comparably, in most cases, to the rule-based and ML-based models. However, the FURIA nested dichotomy variants were superior in performance to the conventional FURIA method, and rule-based and ML-based methods, with the least accuracy, area under the curve (AUC), and Mathew’s correlation coefficient (MCC), with values of approximately 98%.

Comparing machine learning techniques for software requirements risk prediction

Article

Full-text available

Feb 2024

Yasiel Pérez Vera

Software requirements are the most critical phase focused on documenting, eliciting, and maintaining the stakeholders' requirements. Risk identification and analysis are preemptive actions designed to anticipate and prepare for potential issues. Usually, this classification of risks is done manually, a practice that the personal judgment of the risk analyst or the project manager might influence. Machine learning (ML) techniques were proposed to predict the risk level in software requirements. The techniques used were logistic regression (LR), multilayer perceptron (MLP) neural network, support vector machine (SVM), decision tree (DT), naive bayes, and random forest (RF). Each model was trained and tested using cross-validation with k-folds, each with its respective parameters, to provide optimal results. Finally, they were compared based on precision, accuracy, and recall metrics. Statistical tests were performed to determine if there were significant differences between the different ML techniques used to classify risks. The results concluded that the DT and RF are the techniques that best predict the risk level in software requirements.

FEPP: Advancing Software Risk Prediction in Requirements Engineering Through Innovative Rule Extraction and Multi-Class Integration

Article

Full-text available

Jan 2024

The increasing complexity of software projects makes it difficult to predict risks in software requirements, which is a crucial and essential part of the Software Development Life Cycle (SDLC). The failure of a software project may occur from an inability to appropriately anticipate such risks. Because it is the first stage of any software project, risk prediction has a greater significance in software requirements. Thus, ForExPlusPlus (FEPP), a novel model for risk prediction in software requirements, is proposed in this work. Standard models such as K-nearest Neighbor (KNN), Naïve Bayes (NB), Logistic Model Tree (LMT), Random Forest (RF), and Support Vector Machine (SVM) are used to benchmark the suggested model. The dataset from the Zenodo repository is used to train these models, and standard assessment criteria are used to evaluate the results. The accuracy analysis of the models is assessed critically using the precision, F-measure (FM), and Mathew’s correlation coefficient (MCC), as well as the error rate using the Kappa Statistic (KS) and Mean Absolute Error (MAE). The suggested FEPP performs better overall, with an accuracy of 96.84%, whereas KNN performs the worst, with an accuracy of 50.99%.

Software Risk Prediction Through the Use of Machine Learning: Review

Article

Full-text available

Feb 2023

Software engineering and data science require strong programming skills. Software engineering focuses more on construction, functionality, and features, while software risk forecasting focuses more on data collection and analysis. A high level of system functionality is one of the basic needs of software development projects. One of the main characteristics that directly affects the effectiveness of software systems is the prediction of risks. Organizations can make decisions about potential solutions and improvements by using the ability to identify software systems risks through early recognition of expected failures. Inaccurate risk assessments may result in poor system performance and thus reveal its reliability. This research focuses on reviewing mechanisms for predicting early failure in software project risk assessment. Various ML machine learning techniques are used. The aim of the study is to review experience-based risk assessment models that use historical failure data from several past program projects as training data to accurately assess the risks of program initiatives. This study covers software project risk prediction models that are generally applied to all software projects throughout the software development process, helping advance the evolution of software systems.

Co-creating Organizational Performance and Project Success through Customer Participation, Requirement Risk and Knowledge Integration: A Multi-Study Evidence

Article

Jun 2023
Benchmark Int J

Purpose Taking a co-creation perspective and integrating knowledge-based and resource-based perspectives, the authors examine the role of customer participation in organizational performance and project success. The authors also investigate the mediating role of knowledge integration and the moderating role of requirement risk for these relationships in uncertain contexts. Design/methodology/approach The authors undertook two studies. The first study was carried out in 2018 in which the authors drew on survey data from 150 information technology (IT) sector employees and examined the mediating role of knowledge integration in the relationship of customer participation with organizational performance and project success. In the second study undertaken in 2020, the authors drew on data from 92 IT and telecom sector employees and examined the moderating role of requirement risk in the relationship between customer participation and knowledge integration. Study 2 was conducted during the COVID-19 pandemic when employees were largely working from home and were more sensitive to risks and uncertainty about the scope and system requirements. Both studies were survey-based, and analysis was carried out using structural equation modeling. Findings The authors’ two-study examination indicated that knowledge integration positively mediates the relationship of customer participation with organizational performance and project success during the co-creation process. Furthermore, the authors demonstrate that when requirement risks are high, customer participation relationship with knowledge integration is weaker. Originality/value The authors show that integrating customer knowledge is critical to project success and organizational performance. By identifying risk uncertainties and environmental contingencies, the authors highlight the constraints of customer participation for knowledge integration, organizational performance and project success. The authors provide some key study findings based on survey data obtained from project teams during two periods (normal and pandemic).

An efficient Bayesian network model (BNM) for software risk prediction in design phase development

Article

Apr 2023

The primary purpose of a software risk assessment is to predict risks and vulnerabilities that may exist in each phase of the software development life cycle (SDLC). Risk factors have a significant impact on the timeline, budget, and quality of software development. It's very important to know and understand the risks before they can be effectively managed. Researchers have developed several tools to manage the risk that help reduce the number of failed software projects and increase the number of successful software projects. This study aims to ascertain which risks are important and how often they happen, and to explore and reveal the situations where the risks could lead to software failure in the design phase. We are developing a model that can predict risks during the design process so that we can find the risk factors that lead to risks in software development. These risks have been analyzed, classified, and incorporated into Risk Prediction Trees (RPTs). Bayesian network (BN) techniques have been used to propose a model for estimating the probability of risk during the software design phase. The Bayesian network approach is used because the data can be obtained from software that has already been used. It has the flexibility to predict risk in real-time. And it has the best risk prediction rates when it comes to potential risk factors. The outcome of this study shows that, compared to other standard machine-learning approaches, BN can be used to predict possible risks in the early software design phase.

Model-Based Resource Utilization and Performance Risk Prediction using Machine Learning Techniques

Article

Full-text available

Jul 2017

The growing complexity of modern software systems makes the performance prediction a challenging activity. Many drawbacks incurred by using the traditional performance prediction techniques such as time consuming and inability to surround all software system when large scaled. To contribute to solving these problems, we adopt a model-based approach for resource utilization and performance risk prediction. Firstly, we model the software system into annotated UML diagrams. Secondly, performance model is derived from UML diagrams in order to be evaluated. Thirdly, we generate performance and resource utilization training dataset by changing workload. Finally, when new instances are applied we can predict resource utilization and performance risk by using machine learning techniques. The approach will be used to enhance work of human experts and improve efficiency of software system performance prediction. In this paper, we illustrate the approach on a case study. A performance training dataset has been generated, and three machine learning techniques are applied to predict resource utilization and performance risk level. Our approach shows prediction accuracy within 68.9 % to 93.1 %.

Prediction of risk factors of software development project by using multiple logistic regression

Article

Full-text available

Jan 2015

This research aimed to predict the risks in software development projects by applying multiple logistic regression. The logistic regression was used as a tool to control the software development process. These consisted of the risk stratification and causal risk factors analyses. This statistical integration was intended to establish the risk factors, anticipated and minimized the risk, which can occur during processes of software development. The factor analysis incorporated with logistic regression was used to predict the risk classification probability of failure or success of software development. The logistic regression analyses can grade and help to point out the risk factors, which were important problems in development processes. These analytical results can lead to create and development of strategies and highlighted problems, which are important issues to manage, control and reduce the risks of error. The result from classification of questionnaires of software development risk analyses by SPSS program had overall prediction accuracy at 90%.

A Dataset for Software Requirements Risk Prediction

Conference Paper

Oct 2018

Outcome prediction of software projects for information technology vendors

Conference Paper

Dec 2017

An Intelligent Model for Software Project Risk Prediction

Conference Paper

Dec 2009

Software project development is a risky process with high failure rate. This paper proposes an intelligent model that can predict and control software development risks from an overall project perspective rather than focusing only on the single factor, project output. In this study, we first constructed a formal model for risk identification, and then collected actual cases from software development companies to build a risk prediction model. In order to evaluate the performance of our model, two machine learning algorithms, Artificial Neural Networks (ANN) and Support Vector Machine (SVM), are compared. The experiments show that our risk prediction model based on SVM achieves better performance in prediction.

A Predictive Risk Model for Software Projects’ Requirement Gathering Phase

Abstract and Figures

Recommended publications

FEPP: Advancing Software Risk Prediction in Requirements Engineering Through Innovative Rule Extract...

Software Requirement Risk Prediction Using Enhanced Fuzzy Induction Models

A Dataset for Software Requirements Risk Prediction

A Comparative Analysis of Machine Learning Techniques for Software Risk Assessment