ArticlePDF Available

Cost Prediction and Software Project Management

Authors:

Abstract and Figures

This chapter reviews the background and extent of the software project cost prediction problem. Given the importance of the topic, there has been a great deal of research activity over the past 40 years, most of which has focused on developing formal cost prediction systems. The problem is that presently there is limited evidence to suggest formal methods outperform experts, therefore detailed consideration is given to the available empirical evidence concerning expert performance. This shows that software professionals tend to be biased (optimistic) and over-confident, and there are a number of deep cognitive biases which help us understand why this is so. Finally, the chapter describes how this might best be tackled through a range of simple, practical and evidence-based methods. © 2014 Springer-Verlag Berlin Heidelberg. All rights are reserved.
Content may be subject to copyright.
Cost Prediction and Software Project
Management
Martin Shepperd
Brunel University, UK
Abstract This chapter reviews the background and extent of the software project
cost prediction problem. Given the importance of the topic there has been a great
deal of research activity over the past 40 years, most of which has focused on
developing formal cost prediction systems. The problem is that presently there is
limited evidence to suggest formal methods outperform experts, therefore detailed
consideration is given to the available empirical evidence concerning expert
performance. This shows that software professionals tend to be biased (optimistic)
and over-confident and there are a number of deep cognitive biases which help us
understand why this is so. Finally the chapter describes how this might best be
tackled through a range of simple, practical and evidence-based methods.
3.1 Introduction
Cost estimation1 has been viewed as a challenging and important part of software
project management for almost 60 years. Interestingly Benington (1956) writes of
his experiences developing, what was back in the mid 1950s, a large air defense
system comprising half a million lines of code (LOC). In it he tabulates what he
termed reasonable production costsand although the headings such as computer
and paper costs might no longer be seen as relevant, others such as specification,
coding and testing remain pertinent. The outcome was “the schedule slipped by a
year”, something that remains distressingly familiar!
So how bad is the problem? Apart from the anecdotal, evidence is surprisingly
elusive probably due to the commercially sensitive nature of poor project cost
estimation. Jørgensen and Moløkken-Østvold (2006) reviewed multiple sources of
evidence and concluded that a typical cost estimation error was “in the range of
about 30%”. Another indicator that not all is well comes from the 2005 and 2007
surveys conducted by El Eman and Koru (2008) who from a total of 388
1 There is something of a proliferation of terminology. Whilst the majority of writers refer to
cost modelling or prediction, strictly speaking the usual focus is upon labour or effort which
forms the dominant part of costs and is usually the hardest to predict. Such costs may or may not
be reflected in the price charged to the client or user. This chapter uses the term in this particular
sense. Likewise estimation and prediction are used interchangeably since we are only concerned
with future events.
responses found “the most critical performance problem in delivered software
projects is therefore estimating the schedule and managing to that estimate”. An
independent study by the European Services Strategy Unit of 105 large public ICT
projects (Whitfield 2007) found more than half to show cost overruns with the
average cost being 30.5%, a figure very much in line with (Jørgensen and
Moløkken-Østvold 2006).
The question therefore arises, as to why are software project costs difficult to
estimate. There are many reasons. First and foremost is complexity. Many projects
are extremely large undertakings with multiple stakeholders in a setting
characterised by uncertainty, inconsistency and change. Second, software
development is best viewed as a design type activity and it is emphatically not
concerned with production. This means the sub-tasks and activities are not routine
so simple linear extrapolation is seldom a safe guide. Third, estimates are required
at a very early stage when little is known and requirements are still to be
discovered, arbitrated, let alone documented. Finally there are many subtle, and
not so subtle, social and political pressures upon those responsible for cost
modelling. In his analysis of a wide range of projects Flyvbjerg refers to this
tendency to under-estimate costs and over-estimate benefits in order to secure
funding for a proposed project as strategic misrepresentation(Flyvbjerg 2008).
Clearly these problems with predicting software project costs have significant
ramifications. First, we see a tendency for errors in one direction, that is bias or a
propensity for over-optimism. Second, poor cost prediction will severely hamper
meaningful cost-benefit analysis and the consequent unnecessary cancelation of
projects that should not have been commissioned in the first place. Conversely
under-estimation might lead to missed opportunities or sub-optimal procurement
decisions.
3.2 A Review of State of the Art Techniques
The first thing to consider is what is an estimate? Although it can easily be
forgotten it must be stressed that an estimate is a probabilistic statement (DeMarco
1982), (Kitchenham and Linkman 1997) consequently to simply report an estimate
as a point value masks important information. As an example, if a project manager
makes a prediction that the Integration Testing will take 150 person-hours we do
not know with what confidence he or she makes this statement; it could be with
near certainty or it could be a wild guess with almost no certainty. Thus there are
two components. Jørgensen and Sjøberg (Jørgensen and Sjøberg 2003)
recommend a simple approach based on an interval and a confidence level. Based
on the Integration Testing example, the project manager (if highly confident)
might state 140-160 person-hours at 90% confidence, or (if lacking confidence)
50-250 person-hours at 50% confidence. Note the trade off between the interval
size so it is possible to increase confidence either by enlarging the interval or by
decreasing the confidence value.
An alternative approach is sometimes used in industry is derived from the
critical path analysis technique PERT (Willis, 1985) and is known as 3-point
estimation. It is based on the idea that an estimate is actually a probability
distribution, and a simple characterisation is as a triangle based on the best case,
worst case and most likely case or mode.
Fig. 1 Three point estimates as probability distributions.
Figure 1 shows an example of a 3-point estimate depicted as a triangular
probability distribution. The shaded area shows the region within which the true
or actual effort value will fall (assuming of course that the distribution is correctly
estimated). The estimation interval is the range between the worst case (i.e.
highest possible value) for effort and the best case (i.e. the lowest possible value)
for effort. In addition the distribution shows likelihood or the probability p on the
y-axis. This reveals that the highest or modal point on the distribution is the most
likely, that is has the greatest chance of actually occurring. The distribution also
reveals another interesting property, that it is skewed or biased since the region
above or to the right of the most likely value is considerably greater than the
region below the mode. The implication is that even if the distribution were
accurately estimated, to use the most-likely value as the actual estimate will lead
to a tendency to under-estimate over time. This is a phenomenon that we observe
(as noted in Section 3.1 and also previously in Section 2.1).
Although thinking of an estimate as a distribution enables a far richer analysis,
empirically we are hindered by the fact that we are obliged to construct the
distribution from a single observation. The situation can be further complicated by
the fact that projects are seldom static and so one has to be clear whether the
estimates refer to a project as intended at its inception as compared with the actual
project as delivered which could conceivably have functionality added or
removed. These problems are further explored by Grimstad et al. (2006).
There is surprisingly little systematic analysis of what software practitioners
actually do. Studies such as (Heemstra 1992), (Hughes 1996) have reported that
expert judgment is the dominant method amongst software practitioners and there
is little to suggest matters have changed radically since the 1990s.
One source for identifying what is perceived as good practice is the Software
engineering Body of Knowledge (SweBOK) (Abran and Bourque, 2004), which
was the culmination of the work of a team of software development experts.
Interestingly the section on effort, schedule and cost estimation is relatively brief,
however, a number of principles emerge.
1. Estimates can be derived top-down or by means of some breakdown of tasks.
2. For each such task the expected effort [cost] range can be derived from a cost
model which needs calibration to the local environment using historical data if
available. Otherwise an alternative is needed such as expert judgment.
3. The individual estimates should be summed across the entire project.
4. Estimates need to be revised iteratively until agreement is reached amongst all
stakeholders, which the SweBOK identifies as principally software engineers
and management.
From this list of steps, some key concepts have been italicised which are explored
in more detail.
The idea behind a top-down or a decomposition approach to cost estimation is
that of divide and conquer. In other words it is easier to estimate the cost of a
small task than a large one. Moreover it is easier to match a smaller task to some
repertoire of previously completed tasks, than it is for a large task where the
combinatorial explosion militates against this possibility. Often the idea is
formalised into work breakdown charts. The chief difficulty is the fact that some
activities do not easily fit into neat hierarchical breakdowns.
The next point of note is the SweBOK recommendation to consider
representing an estimate as a range. As previously discussed, in order to view an
estimate as a probabilistic statement then a point value is inadequate. However, to
attach additional meaning minimally we need a confidence level in the range.
Provision of 3-point estimate provides an even richer picture.
SweBOK also recommend the use of formal models and although no examples
are specified widely used models include COCOMO 81 which is based on a non-
linear relationship between estimated LOC and effort implying diseconomies of
scale. The fundamental relationship is modified by the type of project and initially
14 cost drivers. This was subsequently modified and extended as COCOMO II
although, unfortunately unlike COCOMO 81, the database from which this model
is derived is not in the public domain.
Although COCOMO is widely used and there are many free implementations it
has come in for criticism. Firstly accurate estimates of LOC may not be available
at an early stage of a software project. Second, there is mixed empirical evidence
as to whether software projects exhibit diseconomies (as many commentators
assert), economies or simple linearity with respect to scale (Kitchenham 2002).
Third, there is limited evidence that COCOMO performs well using the off the
shelf settings on data other than that with which it was developed, for example,
(Kocaguneli et al. 2012b) reported that the model was ranked 92 out of 102
different combinations of models and pre-processors that were evaluated in a
major empirical study. Likewise Kemerer (1987) reported mean absolute relative
errors in excess of 600% for a different data set of 15 software projects.
Interestingly he found that COCOMO performed best (least badly?) in its simplest
form and additional sophistication of the model harmed its accuracy. This has led
many researchers, in line with the SweBOK, to recommend tailoring and
calibration to a local environment. Gulezian (1991) describes how multiple
regression analysis can be used to calibrate the weights for the various cost
drivers. The systematic review by Jørgensen (2004) identified individual primary
studies and the only ones that showed formal prediction systems to outperform
experts involved the use of calibration. More recently, Yang et al. (2013) describe
a calibration procedure to handle local bias thereby improving the usability of
cross-company data sets and demonstrate this with respect to COCOMO II. The
value of calibration was again highlighted by the analysis of Menzies et al. (2013).
Nevertheless, despite some of the reservations COCOMO or a similar approach is
often used as some form of sanity check.
Another important part of the SweBOK recommendations is the need to revisit
any prediction. This has often been neglected by researchers who tend to see
software project as a static snapshot which of course does not reflect the realities
of (i) a growing understanding of the requirements and challenges as the software
project plays out, converging upon certainty on the day of delivery and (ii) the
changing environment in which the project is embedded. MacDonell and
Shepperd (2003a) in a rare study of re-estimation in a commercial setting found no
support for the idea that there are ‘standard proportions’ of effort for particular
development stages, e.g., specification and design. However, in most cases simple
linear regression combined the managers’ estimates led to improvements in
predictive accuracy. These results indicate that, in this organisation, prior-phase
effort data is useful and revising estimates worthwhile.
3.3 A Review of Cost Estimation Research
Because of the need for effective software cost estimation this has been the subject
of a good deal of research. From the outset, the aim has been to replace the
subjectivity of project managers and other professionals, generally referred to as
expert judgment with more objective and formal approaches. This was, or still is,
seen as a good thing because this provides opportunities for scrutiny, it is more
repeatable and can militate against the loss of knowledge and insight if experts
leave an organisation.
Early approaches tended to be based on some function between size either
measured as estimated LOC or Function Points (Albrecht and Gaffney 1983) and
a variant known as Mk II Function Points (Symons 1988). Generically these take
the form:
E = f(Sa)
where E is effort or cost, S is size (typically measured by LOC or Function Points)
and a an exponent representing economies or diseconomies of scale. Typically
this overall relationship is then modified by a set of productivity or cost factors.
COCOMO 81 (as described in Section 3.2) is a good example of this approach.
An interesting recent study by Kocaguneli et al. (2012a) has suggested that in
many cases the use of a size measure may be less important than previously
supposed. It may be that other features act as a proxy for size, for example, the
different application types may tend to be of different sizes. Nevertheless it is an
interesting point that size may be less essential than has been previously supposed.
Early models were postulated based on the beliefs of the inventor, however,
the 1990s heralded a more data driven approach to modelling. Often multiple
regression methods sometimes using a stepwise approach2 were deployed in order
to isolate the important factors, specific to some software development
environment as captured by a data set of historical project data. Kitchenham and
Kansala (1993) used multiple regression to re-estimate weightings for the standard
values for Function Points with considerable benefit. They also reminded
researchers of the dangers of constructing models when many of the components
are strongly correlated, i.e., multicollinearity is present which if uncorrected leads
to highly unstable models.
Given the emphasis of learning from historical data, different machine learning
techniques became popular from the 1990s onwards. In all cases the underlying
principle is to reason inductively from the particular to the general. For cost
prediction the idea is to learn from past, completed software projects in order to
predict for new, unseen projects. One technique is lazy learning3 based on
analogical or case-based reasoning (Shepperd and Schofield 1997; Keung et al.
2008) which is often referred to as Estimation by Analogy (EBA). The simplicity
of the idea that history repeats itself, but not exactly — has attracted a good
deal of attention not least because to be acceptable to practitioners, prediction
systems benefit from good explanatory value since decisions arising from the
prediction will be of high value (Mair et al. 2000). Despite these strengths, EBA
was not found by a Systematic Review (Mair and Shepperd 2005) of all available
empirical studies to outperform simpler regression models with 9 studies
supporting, 4 equivocal and 7 against.
2 The regression model is constructed one independent variable at a time or iteratively until no
new variable significantly contributes to the model fit.
3 A lazy learner only makes an inductive generalization when actually presented with the new
problem to solve. This can be advantageous when trying to learn in the face of noisy training
cases and much uncertainty.
Because of the relative ease of fitting regression models these are now often
used as a benchmark with which to compare more elaborate methods, e.g., Mair et
al. (Mair et al. 2000) compared various machine learning methods (artificial
neural nets (ANNs), case-based reasoners (CBR) and rule induction) with
stepwise regression. Interestingly, the basic regression approach outperformed the
rule induction algorithms although not CBR or ANNs.
The last decade could be characterised by research that has explored more
advanced prediction systems. Examples include through the use of ensembles of
learners coupled with some decision making logic (Minku and Yao 2013) and new
approaches like Grey Relational Algebra (Song and Shepperd 2011). This has
been supported by more research into such things as data pre-processing as many
prediction methods are vulnerable to excessive noise, extreme outliers and missing
observations. Consequently appropriate pre-processing can have a substantial
impact upon predictive performance, e.g. (Strike et al. 2001; Song and Shepperd
2007) and (Liu and Mintram 2005).
Another area of concern and of some progress is developing frameworks as to
how we meaningfully compare the proliferating number of cost estimation
approaches. Until the empirical studies of Myrtveit and Stensrud (1999) which set
out to independently compare regression modelling, EBA and the unaided human
expert, it was not customary to perform any statistical testing. Subsequently,
inferential test such as t-tests and Mann-Whitney U became the norm, however,
methodological problems such as correcting4 the α threshold for null hypothesis
significance testing in the face of large numbers of tests and using inappropriate
measures of predictive accuracy remained. Mittas and Angelis have proposed a
method that is not too conservative but reduces the number of tests required by
means of clustering the results into groups (Mittas and Angelis 2013). More
generally, various authors have proposed remedies and strong arguments as to
why to proper procedures are required in order to derive sound conclusions. For
example, Shepperd and MacDonell (2012) show that inappropriate evaluation hid
the fact that various prediction techniques such as regression to the mean coupled
with EBA actually performed worse than guessing!
After the event, when evaluating the quality of a prediction there are three
dimensions that need to be assessed (i) error (ii) bias and (iii) variance or scatter.
Even accuracy is often misunderstood in the software engineering community and
inappropriately assessed by accuracy statistics such as Mean Magnitude of
Relative Error (MMRE). Elsewhere researchers show how this is flawed both
theoretically as it is merely an asymmetric measure of spread (Kitchenham et al.
4 Essentially the point is that when conducting a significance test for an
hypothesis there are two dangers. One can wrongly reject the null hypothesis or
wrongly fail to reject the null hypothesis. It is customary to set the chances of
wrongly rejecting the null hypothesis (denoted by α) at 0.05. However if many
tests are performed the probability of at least once committing such an error grows
with the number of tests. For this reason the α threshold needs to be reduced to
take this danger into account.
2001) and empirically through Monte Carlo simulation (Foss et al. 2003). Without
a clear conceptual understanding of accuracy it is difficult for the community to
review or improve their prediction practice since there is no systematic basis for
evaluating different approaches to cost estimation. Indeed, MMRE has the rather
perverse characteristic of favouring optimistic predictions over pessimistic ones.
Given the widespread use of MMRE this may be another contributor to the biases
we observe in industry practice described in Section 3.1. Therefore, unless there is
good reason to the contrary, it is recommended (Shepperd and MacDonell 2012)
that researchers seek to minimise the absolute sum of the residuals, consider
performance relative to guessing and be aware of the effect size. The effect size is
a means of capturing the practical or real world effect of the particular
intervention, for example by moving from cost estimation technique A to B what
actual benefit does this yield? This is a very different question from how likely is
the effect to have arisen by chance since large numbers of observations will render
even small effects highly significant (Armstrong 2007; Ellis 2010).
The final development, and one that warrants a section in its own right, is the
realisation that formal prediction or cost models have not succeeded in replacing
humans and therefore there is a need to research into how practitioners make
predictions. This section has of necessity been brief. For a more detailed overview
see the mapping studies in (Jørgensen 2004) and (Jørgensen and Shepperd 2007).
3.4 The interaction between people and formal techniques
As the previous section has shown there has been no shortage of ideas or research
into constructing formal prediction systems for software project costs.
Unfortunately as systematic reviews (Mair and Shepperd 2005), (Jørgensen and
Shepperd 2007) and simulation work (Shepperd and Kadoda 2001) demonstrate,
no single technique dominates. In particular formal model performance seems
closely linked with the specific characteristics of the historical data that are used
to train or calibrate the prediction system (Shepperd and Kadoda 2001). This has
led some researchers such as Menzies et al. to suggest that we should focus on
finding prediction systems that are good enoughrather than the ‘best’ (Menzies
et al. 2010). Nevertheless, Jørgensen (2004) reported that formal models do not
consistently outperform their human counterparts and frequently do less well.
Specifically, in his systematic review of 15 primary studies he reports that 5
favoured formal models, 5 were equivocal and 5 favoured expert judgement over
the formal model. Looking in more detail, Jørgensen suggests that that those
studies using local calibration or where the estimators lacked expertise yielded the
best results for formal models. Similarly, in a software maintenance setting, the
systematic review of Riaz et al., (2009) found that “there is little evidence on the
effectiveness of software maintainability prediction techniques and models”.
Moreover, formal models do not appear to be very widely used in practice and
expert judgement remains the dominant estimation technique (Jørgensen 2004).
Consequently Jørgensen and his co-workers have been exploring over the past
decade why this might be so.
The first thing to appreciate is the nature and use of cost estimates. Software
projects are generally high value and relatively infrequent events since typical
durations are many months through to years. Therefore the estimate matters and in
a way that predicting if a supermarket customer chooses a cabbage will they also
purchase carrots, does not. The career prospects of an individual may be impacted
by an estimate and the associated decision-making, e.g., to initiate / cancel a
software project. In extremis the financial health or viability of the software
development may be impacted. Such awareness may skew the estimation process
of individuals. More than 20 years ago Lederer and Mendelow (1999) in their
study of cost estimation within information systems projects observed how
organisational politics can be inimical to good estimation. (Flyvbjerg et al. 2003;
Flyvbjerg 2008) in a study of a number of major projectswhilst not specifically
related to software found considerable evidence to support the notion of
strategic misrepresentation. This typically manifests itself as a tendency to under-
estimate costs and over-estimate benefits because of the desirability of the end
goal. In terms of software it may be that professionals might see the potential
opportunities of a new project, e.g., improved work prospects, personal
development or intellectual challenge. The interesting thing is that formal models
may not offer any protection against such phenomena since these models require
inputs, many of which must be estimated, for instance COCOMO (as previously
indicated) requires the user to estimate delivered LOC which will not normally be
known at the point of prediction. Likewise many machine learning techniques are
heavily parameterised with little deep theory to guide the user, thus rendering such
methods rather experimental in their approach. This can encourage a “suck it and
see” philosophy. Jørgensen and Gruschke (2005) termed this "expert judgment in
disguise".
The problem of obtaining useful predictions is compounded by the strong
tendency for professionals to display both over-optimism e.g., (Buehler et al.
1994) and over-confidence e.g., (Jørgensen 2010). Because these phenomena are
so widespread the causes of bias have been extensively investigated by cognitive
psychologists in various domains over the past three decades since the seminal
work of Kahneman and Tversky (1979). This has led to the identification of a
number of cognitive biases that appear to be both deeply ingrained and
widespread. Four such biases are now considered.
One problem is the so-called “planning fallacy” which is the tendency to under-
estimate project completion times as a consequence of spending time on detailed
planning aspects. Buehler and Griffin (1994) examined the underlying cognitive
processes and found that a narrow focus on future plans for the target task led to
neglect of other useful sources of information. In other words, an illusion of
control leads to significant over-optimism. Therefore we might expect detailed
top-down planning methods such as work breakdown to be vulnerable to this
particular bias.
Another source of bias is a preference for case-specific (and recent) evidence
over distributional evidence (Tversky and Kahneman 1974), (Griffin and Buehler
1999). For example, data suggesting that 8 out of 10 projects are delivered late
(i.e., costs and schedule have been under-estimated) might be neglected in
preference to evidence suggesting this specific project will be different because
staff will be motivated to work harder or because there will be reuse of some
software components. This helps us understand why professionals struggle to
learn lessons from the past because deep down we believe it will be different next
time. The problem is the distributional or frequency related evidence says
otherwise and this is usually correct!
A closely related phenomenon is the peak-end rule where the most recent
experience dominates even when it is highly atypical. This has been demonstrated
in many different arenas including the experiment described in (Kahneman et al.
1993) where participants were subjected to modest pain (a hand in icy water) and
preferred the worse (in terms of temperature and duration) experience when for
the final period the water temperature was raised. In terms of software projects,
professionals may recall the final experiences of getting software to work, as
opposed to the lengthy previous experiences of failures and debugging. Again this
bias can lead to distributional evidence being ignored or neglected and the
consequent impact upon estimates.
A third, relevant cognitive theory is the dual-process theory of cognition which
leads to a tendency to trust analytic justifications (explanations) over intuitive
ones yet to prefer intuitive judgments over analytic ones. One implication is that
this is another reason why formal prediction systems can turn into "expert
judgment in disguise" (Jørgensen and Gruschke 2005) as the estimator is seeking
‘objective’ evidence to support his or her intuitive judgement.
A fourth bias is known as anchoring where data in the request for an estimate
can be highly influential even when the estimator is told to ignore it. An example
is the experiment by Jørgensen and Grimstad (2012) where professional
participants were randomly allocated to two groups, one of which was primed
with a high anchor and another with a very low anchor. They were then asked to
estimate the same task, namely their own productivity in LOC per work-hour over
their last project. Remarkably, the difference in median response between the two
groups was almost seven-fold (15 LOC per hour versus 100 LOC per hour). This
stable finding repeated by a number of independent studies indicates just
how vulnerable humans are to these biases and is clearly a major contributor to the
some of the cost estimation problems reported at the beginning of this chapter.
These biases are common to many problem domains, and seem independent of
individual differences, e.g., the traits of optimism and procrastination (Buehler
and Griffin 2003). The limited work investigating de-biasing strategies, e.g.,
utilising previous experience, such as past project databases, Personal Software
Process (Humphrey 2000) and lessons learned sessions, have not been all that
successful, particularly in the field of software engineering prediction.
Interestingly Jørgensen and Grushka found that software professionals were better
able to learn lessons for the estimates of others than for their own estimates
(Jørgensen and Gruschke 2009).
There are both theoretical and empirical reasons why software practitioners
make consistently sub-optimal predictions within software engineering. However,
the vast bulk of the psychological research has been conducted using student
participants working on problems that are not industry-related (Mair et al. 2009)
and therefore Jørgensen’s work using software developers has been quite unusual.
In addition, the literature has predominantly focused upon understanding factors
that contribute to bias. We need to also explore factors that promote de-biasing in
realistic settings. In parallel, much research has been undertaken into
metacognition (that is thinking about thinking), particularly in the domain of
learning. There is a considerable body of evidence showing that increased
metacognitive awareness leads to increased learning and enhanced performance
e.g., Coutinho found a relationship between metacognitive awareness and
educational performance (Coutinho 2007). Other researchers have shown that
metacognitive skills can be taught (Borkowski et al. 1987), (Dawson 2008) and
these can potentially militate against some of the cognitive biases described
above.
Metacognition can be divided into metacognitive knowledge and metacognitive
skills. The former relates to declarative knowledge of the interactions among self,
task, and strategy characteristics (Flavell 1979) that can be inaccurate and resistant
to change. Clearly this will be an inhibitor to improving prediction performance.
Metacognitive skills on the other hand refer to procedural knowledge for self-
regulating problem solving and learning activities and include feedback
(reflection) on metacognitive knowledge. This division between metacognitive
knowledge and skills is related to that of single and double loop learning
popularised by Argyris and Schön (1996).
‘Single-loop learning’ occurs when goals, values, plans and rules are taken for
granted and put into operation rather than questioned. It reduces risk and affords
greater control, but severely limits growth and learning. By contrast, ‘double-loop
learning’ involves questioning the fundamental systems that underlie goals and
strategies. It results in the questioning of governing variables and may lead to
fundamental changes. This double-loop learning is necessary if practitioners and
organisations are to make informed decisions in changing and uncertain contexts.
Reflection, is a metacognitive skill important for personal and professional
development, see for example (Schön 1983), (Moon 1999), and it plays a key role
in both single and double loop learning. However critical reflection, as
demonstrated in double loop learning, is essential for growth and change. Critical
reflection demands focusing on the cognitive aspects and challenging the
strategies that led to particular actions, and the outcomes and lessons learned from
those actions for future application.
Unfortunately, previous studies of software project cost prediction suggest that
feedback on performance and the typical methods for reflecting on experience,
e.g. unaided lessons learned sessions, do not necessarily lead to improvement in
accuracy or assessment of the uncertainty (Jørgensen and Gruschke 2009). The
lack of training in both reflecting on one’s own thinking and the fundamental
causes of suboptimal outcomes (double-loop learning) can be a major obstacle. As
an illustration, in a previous study where software professionals described reasons
for their estimation errors (Jørgensen and Gruschke 2009), (Moløkken and
Jørgensen 2004) most were shallow and corresponded to single-loop learning. In
particular the participants (all software professionals) exclusively focused on
reasons for their estimation inaccuracy and at the expense of their confidence.
Indeed participants only identified means to improve their accuracy (e.g., add
more time for unknown events). The alternative, which would have been to
change their level of confidence in the effort estimates, was considered in terms of
documented reflections. This lack of double-loop learning would seem to be a key
contributor to the robust findings on over-optimism and over-confidence among
software developers. (Note, in contrast Chapter 4 takes a more organisational
perspective to learning). It also uses the device of a decision rationale to support
future learning.
Hence it is important to consider estimation approaches that are underpinned
by theories of metacognition and double-loop learning. Specifically we need to
better understand the impact of enhanced metacognitive awareness on the ability
to improve project cost prediction and confidence (uncertainty assessment) within
a software engineering context. To summarise:
1. Formal prediction systems are not consistently reliable or superior to the
unaided human expert. Moreover, their inputs and parameters must be
manipulated by humans with a consequent loss of their raison d’être, that is
objectivity.
2. There is a strong tendency for professionals to display over-optimism and over-
confidence. A number of experiments and empirical studies help us to
understand the cognitive basis for this bias.
3. De-biasing strategies based upon utilising previous experiences, such as lessons
learned sessions, have not led to noticeable improvement in prediction
accuracy or the realism of uncertainty assessment.
4. There are opportunities to apply recent results from metacognition research to
counteract this natural bias and consequently improve performance.
It is therefore evident that more attention needs to be paid both by researchers and
practitioners into the cognitive aspects of cost estimation. To ignore this aspect is
to severely limit the reach and impact of any initiatives to improve cost estimation
practice. As has already been noted, formal models such as those based on
machine learning algorithms have their place but they still depend upon inputs and
parameters supplied by, and outputs utilised by, software professionals who are
subject to the same cares, concerns and biases of all human beings.
3.5 Practical Recommendations
Thus far this chapter has noted the importance of effective cost estimation for
software projects and contrasted this with the widespread challenges that are
faced. In particular, endemic over-optimism has led to costs being systematically
under-estimated and over-confidence causing estimators to believe they are more
accurate than they really are. We have then reviewed the problems that are
currently being experienced in terms of cost estimation, most notably the tendency
to be over-optimistic (i.e., under-estimate costs) and to be over-confident (i.e., be
less accurate than anticipated). This has triggered a good deal of research to try to
overcome these problems, in particular through proposing formal prediction
systems or models. After initial work based on the idea of generally applicable
models such as COCOMO (Boehm 1984) and COCOMO II (Boehm et al. 2000)
the dominant idea driving formal models has been to derive them from historical
data either through statistical analysis such as regression modelling or through
induction using one of the many machine learning techniques available. Despite
this activity, it is not possible to strongly recommend any one formal technique,
for the simple reason of a lack of consistent evidence. Thus any recommendations
must be grounded in the understanding that human judgement plays a substantial
contribution.
Whilst not intended to be exhaustive the following is a list of six practical
recommendations that are supported by empirical evidence and could usefully be
deployed in real-life projects.
1. Data driven
2. Sensitivity analysis
3. Multiple techniques
4. Group estimates
5. Training and reflection
6. Estimation and confidence
Data-driven estimation requires the availability of historical data on previously
completed projects. Such data can be useful in three different ways. First, for
analogical reasoning that can be formalised as case-based reasoning (Shepperd
and Schofield 1997) or used more informally. Second, local historical data can be
used for calibration purposes since there is widespread evidence to indicate that
off-the-shelf approaches are problematic and that general purpose models benefit
from calibration to the specific or local problem domain (Cuelenaere et al. 1987;
Jeffery and Low 1990; Gulezian 1991; Yang, et al. 2013). Third, for direct
predictive model building relevant, local data is necessary for training i.e.,
inductive learning purposes. Naturally the question arises about the situation when
no local data is available. This might be because the software development
organisation is new or because no relevant past data exists. Is the assumption that
global data is inferior to global data well founded? This has vexed researchers for
some time and two systematic reviews (Kitchenham et al. 2007; MacDonell and
Shepperd 2007) have concluded that the evidence is mixed and from primary
studies available no definitive answer is possible. In some ways the question of
local versus global data is somewhat artificial and more relevant is how relevant is
the global or cross-company data? However, a recommendation is to inform any
cost estimation with local data, including past estimation performance data
wherever possible. If circumstances do not allow this then global data, after
careful consideration of its relevance, is the next best option.
Sensitivity analysis is not common practice, yet in the face of uncertainty a very
useful means of determining the vulnerability of an estimate to particular
assumptions and the level of confidence that can be placed in that estimate. Such
analysis can be highly sophisticated (Saltelli et al. 2000) or use simple Monte
Carlo methods (Fishman 1996). Wagner (2007) illustrates how these ideas can be
deployed using a COCOMO model and finds that the code size estimate
dominates the effort prediction, but less obviously that there are significant second
order effects between the different cost drivers due to the multiplicative nature of
the model. This kind of analysis can also be valuable when the uncertainty
surrounding an estimate is unacceptable thereby helping the estimator identify the
most important sources of variability and could then take steps to reduce this
uncertainty through further investigation, simulation, etc. of the key parameters or
inputs.
Using more than one estimation method or multiple techniques is another
important consideration. Although an obvious recommendation for practitioners
this has not been widely researched and the evidence base is quite limited.
Kitchenham et al., (2002) conducted an empirical study of 145 projects at a large
software house where estimators were required to use a minimum of two
techniques and then select one estimate to be the basis of client-agreed budget.
The advantage, over simply using the mean is that if one estimate is misleading it
will not ‘contaminate’. MacDonell and Shepperd (2003b) explored a similar
question and also found that not only was no one technique best but using the
mean was also sub-optimal. By selecting one technique, or perhaps investigating
more deeply, requires more consideration and discussion than the formulaic
application of an averaging technique.
Group estimates should also be considered as a practical estimation technique.
Again, surprisingly considering they have been promoted since Boehm’s seminal
Software Engineering Economics (Boehm 1981) described a wideband Delphi
process, there has been limited research and therefore evidence. Taff et al. (1991)
proposed a related approach that they termed Estimeetings, however, little
empirical support is offered in terms of their effectiveness. Passing and Shepperd
(2003) investigated the impact of group discussion and iterated estimates and
found that that both checklists and group discussions significantly contribute to
improved estimation. The limitation of this study was that it involved Masters
students rather than professionals and was in an artificial setting. Reporting
similar results, Moløkken and Jørgensen (2004) found a significant and substantial
effect in terms of the tendency for group estimates to be less optimistic both for
group decisions and the individual post-group discussion decisions to be less
optimistic than the original estimates.
The lack of systematic training and reflection is another improvement
opportunity. As Jørgensen puts it the focus on learning estimation skills from
software development experience seems to be very low." (Jørgensen 2004). The
challenges are the various cognitive biases described in Section 3.4 are deeply
ingrained and de-biasing strategies not necessarily effective. Consequently
emphasis should be given to reflection but structured in order to guide estimators
beyond the shallow reflections that some researchers have found, such as “the
estimate was too low because insufficient time was allocated”! Researchers have
also found that emphasising metacognitive skills can also significantly improve
performance.
Finally, practitioners need to keep in mind that because an estimate is a
probabilistic statement it has two dimensions (the estimate and confidence) and
therefore it is not well represented by a single point value even if this is required
as the final outcome of the decision making process e.g., the bid value. To give an
example, estimating 1000 person-hours ± 10 person-hours is a very different
proposition to 1000 person-hours ± 500 person-hours. Even this may not be
adequate since it is unclear whether it means that an actual effort of 1510 person-
hours is deemed impossible or merely very unlikely. Moreover such a formulation
imposes a symmetric distribution which may not properly reflect the estimator’s
beliefs. Jørgensen recommends a confidence value in a range, e.g., 80%
confidence between 500 and 1500 person-hours. This allows some simple trade
offs between precision and confidence to be exploited. A richer picture still is
obtained by describing the estimate as a probability distribution, e.g., as a three-
point estimate and a triangular distribution. Either way, failing to regard estimates
as probabilities indicates a failure to appreciate their true nature and therefore the
opportunity to learn and improve.
The above list contains some simple, practical, general and evidence-based
recommendations for software cost estimation. It is not a panacea and there are
many other challenges that have not been fully addressed. Nevertheless, given the
importance of software, software projects and effective cost management they
may offer some useful steps forward.
6 Follow up sources of information
There are several comprehensive systematic reviews on research into cost
estimation. Jørgensen and Shepperd (2007) gives general coverage of the different
research activities being undertaken and Simula have continued to update the
database of sources subsequent to its publication.5 A second, more specialised on
the role of human experts, and slightly older systematic review is Jørgensen
5 The bibliographic database can be found at www.simula.no/BESTweb.
(2004). The review by Riaz et al. (2009) focuses on cost estimation in a software
maintenance context.
Cost estimation generally takes place in the wider setting of a software project.
There are many good textbooks, such as (Hughes and Cotterell 2009) on project
management and (Sommerville 2010) on software engineering and the set of
guidelines published as the SweBOK (Abran and Bourque 2004).
In terms of making sense of published empirical research comparing different
formal models, and for designing new experiments, Shepperd and MacDonell
(2012) set out a framework based on three research questions that need to be
addressed.
Glossary:
Absolute residuals: a simple and robust means of assessing the predictive
accuracy of a prediction system. It is defined simply as:
𝑦"− 𝑦"
where 𝑦" is the true value for the ith project and 𝑦" the estimated value. This
gives the error, irrespective of direction, i.e., under or over-estimate. The
mean residual (keeping the direction of error) gives a measure of the degree
of bias.
Cognitive bias: these are patterns of thinking about problem solving or
decision-making that distort and lead people to ‘sub-optimal’ choices.
Because of the ubiquity of many such biases they are classified and named,
e.g. the anchoring bias. See the pioneering work of Tversky and Kahneman
(Tversky and 1974).
Double loop learning: this differs from ordinary or single-loop learning in
that one not only observes the effects of the process, but also understands
the external factors that influence the effects. This was initially promoted by
Argyris and Schön as a way of promoting effective organisational behavior
(Argyris and Schön 1996).
Estimation by Analogy (EBA): uses some form of case-based reasoning
where a new or target case which is to be solved is plotted in feature space
(one dimension per feature) and some distance metric used to determine past
proximal cases from which a solution can be derived. For a general account
of CBR see the pioneering work by Kolodner (1993) and for its application
to software engineering see (Shepperd 2003).
Expert Judgement: this is something of a catch all description for a range
of informal approaches to estimation. Jørgensen describes it as “unaided
intuition (‘gut feeling’) to expert judgment supported by historical data,
process guidelines and checklists (‘structured estimation’) (Jørgensen
2004). Despite it being a widespread estimation approach, it can still be
criticised for its reasoning not being open to scrutiny since the reasoning
process is non-recoverable’ (Jørgensen 2004), repeatable or easily
transferable from existing experts to others.
Formal prediction system: or formal model for cost prediction is
characterised by repeatability so that different individuals applying the same
inputs should generate the same outputs (with the exception of prediction
systems based on stochastic search [also see Chapter 15 on search based
project management] where this will tend to be true over time (Clark et al.
2003) but not necessarily for a single utilisation). Examples range of formal
systems range from simple algorithmic models, such as COCOMO, to
complex ensembles of learners.
Machine Learning: this is a branch of applied artificial intelligence based
on inducing prediction systems from historical data, i.e., reasoning from the
particular to the general. There are a wide range of approaches including
neural networks, case-based reasoning, rule induction, Bayesian methods,
support vector machines and population search methods such as genetic
programming. Standard textbooks that provide overviews of these
techniques include Witten et al. (2011).
Mean magnitude of relative error (MMRE): this is a widely used
although now heavily criticized (Kitchenham et al. 2001; Foss et al. 2003;
Shepperd and MacDonell 2012) measure of predictive accuracy defined as:
𝑀𝑀𝑅𝐸 =𝑥"− 𝑥"𝑥"𝑛
*
+
where 𝑥" is the true cost for the ith project, 𝑥" is the estimated cost and the n
the total number of projects.
Metacognition: this refers to ‘thinking about thinking’ (Flavell 1979) and is
an awareness and monitoring of one’s thoughts and performance. It
encompasses the ability to consciously control the cognitive processes
involved in learning such as planning, strategy selection, monitoring and
evaluating progress towards a particular goal and adapting strategies as, and
when, necessary to reach that goal (Ridley et al. 1992).
Over-confidence: refers to the tendency of an estimator to value precision
over accuracy. Typically one might express confidence in an estimate as the
likelihood that the true value falls within a specified interval. For example
stating that one is 80% confident that the actual effort will fall within the
range 1000-1200 person hours implies that this will occur 8 out of 10 times.
If the true value falls into the range less frequently this implies over-
confidence. Jørgensen et al. (2004) reported that over-confidence was a
widespread phenomenon and that at least one contributor was the fact that
managers often interpret wide intervals as conveying a lack of knowledge
and prefer narrow but less accurate estimates.
Over-optimisim: refers to the situation where the estimation error is biased
towards an under-estimate. Many studies indicate that this is the norm in the
software industry with a figure of 30% being seen as typical (Jørgensen
2004).
Prediction: whilst ‘prediction’ and ‘estimation’ are often used
interchangeably we use prediction’ to mean a forecast or projection, and
‘estimate’ to connote a guess or rough and ready calculation.
Single-loop learning: Argyris and Schön (1996) characterise this as
focusing on restrictive feedback so that the individual or organisation only
endeavours to improve a single metric without external reflection upon the
process i.e., double loop learning.
References
Abran A, Bourque, P. (2004) SWEBOK: Guide to the software engineering Body of Knowledge.
IEEE Computer Society
Albrecht AJ, Gaffney JR (1983) Software function, source lines of code, and development effort
prediction: a software science validation. IEEE Transactions on Software Engineering 9:639
648.
Argyris C, Schön D (1996) Organizational learning II: Theory, method and practice. Addison
Wesley, Reading, Mass.
Armstrong, S. (2007) Significance Tests Harm Progress in Forecasting. International Journal of
Forecasting 23:321327.
Benington HD (1956) Production of large computer programs. Symp. on Advanced Computer
Programs for Digital Computers ACR-15:
Boehm B, Abts C, Brown, W., et al. (2000) Software Cost Estimation with COCOMO II.
Pearson/Prentice Hall, Englewood Cliffs, NJ
Boehm BW (1984) Software engineering economics. IEEE Transactions on Software
Engineering 10:421.
Boehm BW (1981) Software Engineering Economics. Prentice-Hall, Englewood Cliffs, N.J.
Borkowski JG, Carr M, Pressley M (1987) Spontaneous strategy use: Perspectives from
metacognitive theory. Intelligence 11:6175.
Buehler R, Griffin D (2003) Planning, personality, and prediction: The role of future focus in
optimistic time predictions. Organizational Behavior and Human Decision Processes 92:8090.
Buehler R, Griffin D, Ross M (1994) Exploring the “Planning Fallacy”: why people
underestimate their task completion times. J of Personality & Social Psychology 67:366381.
Clark J, Dolado JJ, Harman M, et al. (2003) Reformulating software engineering as a search
problem. IEE Proceedings - Software 150:161175.
Coutinho SA (2007) The relationship between goals, metacognition, and academic success,.
Educate 7:3947.
Cuelenaere A, van Genuchten M, Heemstra F (1987) Calibrating a software cost estimation
model - why and how. Information & Software Technology 29:558567.
Dawson TL (2008) Metacognition and learning in adulthood. Developmental Testing Service,
LLC
DeMarco T (1982) Controlling Software Projects. Management, Measurement and Estimation.
Yourdon Press, NY
El Emam, K Koru, G (2008) A Replicated Survey of IT Software Project Failures. IEEE
Software 25:8490.
Ellis, P (2010) The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis, and the
Interpretation of Research Results. Cambridge University Press
Fishman G (1996) Monte Carlo: Concepts, Algorithms, and Applications. Springer, New York
Flavell JH (1979) Metacognition and cognitive monitoring: A new area of cognitive-
developmental inquiry. American Psychologist 34:906 911.
Flyvbjerg B (2008) Curbing Optimism Bias and Strategic Misrepresentation in Planning:
Reference Class Forecasting in Practice. European Planning Studies 16:332.
Flyvbjerg B, Bruzelius N, Rothengatter W (2003) Megaprojects and risk: an anatomy of
ambition. Cambridge University Press
Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation
criterion MMRE. IEEE Transactions On Software Engineering 29:985995.
Griffin D, Buehler R (1999) Frequency, Probability, and Prediction: Easy Solutions to Cognitive
Illusions? Cognitive Psychology 38:4878.
Grimstad S, Jorgensen M, Moløkken-Østvold K (2006) Software effort estimation terminology:
The tower of Babel. Information and Software Technology 48:302310.
Gulezian R (1991) Reformulating and calibrating COCOMO. J of Systems Software 16:235
242.
Heemstra FJ (1992) Software cost estimation. Information & Software Technology 34:627639.
Hughes RT (1996) Expert judgement as an estimating method. Information & Software
Technology 38:6775.
Hughes RT, Cotterell M (2009) Software project management. McGraw-Hill, London
Humphrey W (2000) Introducing the personal software process. Annals of Software Engineering
1:311325.
Jeffery DR, Low GC (1990) Calibrating estimation tools for software development. Software
Engineering Journal 5:215221.
Jørgensen M (2004) A review of studies on expert estimation of software development effort. J
of Systems & Software 70:3760.
Jørgensen M (2010) Identification of more risks can lead to increased over-optimism of and
over-confidence in software development effort estimates. Info & Software Technol 52:506516.
Jorgensen M, Grimstad S (2012) Software Development Estimation Biases: The Role of
Interdependence. IEEE Transactions on Software Engineering 38:677693.
Jørgensen M, Gruschke T (2005) Industrial Use of Formal Software Cost Estimation Models:
Expert Estimation in Disguise?
Jørgensen M, Gruschke T (2009) The Impact of Lessons-Learned Sessions on Effort Estimation
and Uncertainty Assessments. IEEE Transactions on Software Engineering 35:368383.
Jørgensen M, Moløkken-Østvold K (2006) How large are software cost overruns? A review of
the 1994 CHAOS report. Information & Software Technology 48:297301.
Jørgensen M, Shepperd M (2007) A Systematic Review of Software Development Cost
Estimation Studies. IEEE Transactions on Software Engineering 33:3353.
Jørgensen M, Sjøberg DIK (2003) An effort prediction interval approach based on the empirical
distribution of previous estimation accuracy. Information & Software Technology 45:123136.
Jørgensen M, Teigen KH, Moløkken K (2004) Better sure than safe? Overconfidence in
judgment based software development effort prediction intervals. J of Systems & Software
70:7993.
Kahneman D, Fredrickson B, Schreiber C, Redelmeir D (1993) When more pain is preferred to
less: Adding a Better End. Psychological Science 4:401405.
Kahneman D, Tversky A (1979) Intuitive prediction: biases and corrective procedures. TIMS
Studies in Management Science 12:313327.
Kemerer CF (1987) An empirical validation of software cost estimation models.
Communications of the ACM 30:416429.
Keung J, Kitchenham B, Jeffery R (2008) Analogy-X: Providing Statistical Inference to
Analogy-Based Software Cost Estimation. IEEE Transactions on Software Engineering 34:471
484.
Kitchenham B, Mendes E, Travassos G (2007) Cross versus Within-Company Cost Estimation
Studies: A Systematic Review. IEEE Transactions on Software Engineering 33:316329.
Kitchenham BA (2002) The question of scale economies in software - why cannot researchers
agree? Information & Software Technology 44:1324.
Kitchenham BA & Kansala, K. (1993) Inter-item correlations among function points. 1st Intl.
Symposium on Software Metrics, 1993 Baltimore, MD. IEEE Computer Society Press.
Kitchenham BA, Linkman SG (1997) Estimates, uncertainty and risk. IEEE Software 14:6974.
Kitchenham BA, MacDonell SG, Pickard L, Shepperd MJ (2001) What accuracy statistics really
measure. IEE Proceedings - Software Engineering 148:8185.
Kitchenham BA, Pfleeger SL, McColl B, Eagan S (2002) An empirical study of maintenance and
development estimation accuracy. J of Systems & Software 64:5777.
Kocaguneli E, Menzies T, Hihn J, Kang H (2012a) Size Doesn’t Matter? On the Value of
Software Size Features for Effort Estimation.
Kocaguneli E, Menzies T, Keung J (2012b) On the Value of Ensemble Effort Estimation
Kocaguneli, E. Menzies, T.; Keung, J.W. Software Engineering. IEEE Trans on Softw Eng
38:14031416.
Kolodner JL (1993) Case-Based Reasoning. Morgan-Kaufmann
Lederer, A., Mendelow, A. (1999) The Impact of the Environment on the Management of
Information Systems. Information Systems Research 1:205222.
Liu, Q., Mintram, R. (2005) Preliminary Data Analysis Methods in Software Estimation.
Software Quality Journal 13:91115.
MacDonell S, Shepperd M (2003a) Using prior-phase effort records for re-estimation during
software projects. 9th IEEE Intl. Metrics Symp.
MacDonell S, Shepperd M (2003b) Combining Techniques to Optimize Effort Predictions in
Software Project Management. J of Systems & Software 66:9198.
MacDonell S, Shepperd MJ (2007) Comparing Local and Global Software Effort Estimation
Models Reflections on a Systematic Review. 1st Intl. Symp. on Empirical Softw. Eng. &
Measurement.
Mair C, Kadoda G, Lefley M, et al. (2000) An investigation of machine learning based
prediction systems. J of Systems Software 53:2329.
Mair C, Martincova M, Shepperd M (2009) A Literature Review of Expert Problem Solving
using Analogy. 13th International Conference on Evaluation & Assessment in Software
Engineering (EASE).
Mair C, Shepperd M (2005) The consistency of empirical comparisons of regression and
analogy-based software project cost prediction. 4th International Symposium on Empirical
Software Engineering (ISESE) Noosa Heads, Australia.
Menzies, A. Butcher, D. Cok, A. Marcus, L. Layman, F. Shull, B. Turhan, and T. Zimmermann,
(2013) Local versus Global Lessons for Defect Prediction and Effort Estimation. IEEE
Transactions on Software Engineering 39:822834.
Menzies T, Jalili, M., Hihn J, et al. (2010) Stable Rankings for Different Effort Models.
Automated Software Engineering 17:409437.
Minku, L., Yao, X. (2013) Ensembles and locality: Insight on improving software effort
estimation. Information and Software Technology In Press, Corrected Proof:
Mittas N, Angelis L (2013) Ranking and Clustering Software Cost Estimation Models through a
Multiple Comparisons Algorithm. IEEE Trans on Softw Eng 39:537551.
Moløkken K, Jørgensen M (2004) Group processes in software effort estimation. Empirical
Software Engineering 9:315334.
Moon J (1999) Reflection in Learning and Professional Development: Theory and Practice.
Kogan Page, London.
Myrtveit I, Stensrud E (1999) A controlled experiment to assess the benefits of estimating with
analogy and regression models. IEEE Transactions on Software Engineering 25:510525.
Passing U, Shepperd M (2003) An Experiment on Software Project Size and Effort Estimation.
ACM-IEEE International Symposium on Empirical Software Engineering (ISESE 2003).
Riaz, M., Mendes E, Tempero, E. (2009) A Systematic Review of Software Maintainability
Prediction and Metrics. 3rd International Symposium on Empirical Software Engineering and
Measurement, ACM Computer Press, pp 367377.
Ridley D, Schutz P, Glanz R, Wernstein C (1992) Self-regulated learning: the interactive
influence of metacognitive awareness and goal-setting. Journal of Experimental Education
60:293306.
Saltelli, A., Tarantola, S., Campolongo, F. (2000) Sensitivity Analysis as an Ingredient of
Modeling. Statistical Science 15:377395.
Schön DA (1983) The reflective practitioner. Basic Books, New York.
Shepperd M (2003) Case-based Reasoning and Software Engineering. In Managing Software
Engineering Knowledge, Eds Aurum, A, Jeffery, R, Wohlin, C, Handzic, M. Springer-Verlag:
Berlin.
Shepperd M, MacDonell S (2012) Evaluating prediction systems in software project estimation.
Information and Software Technology 54:820827.
Shepperd MJ, Kadoda G (2001) Comparing Software Prediction Techniques Using Simulation.
IEEE Trans on Softw Eng 27:987998.
Shepperd MJ, Schofield C (1997) Estimating software project effort using analogies. IEEE
Transactions on Software Engineering 23:736743.
Sommerville I (2010) Software Engineering. Pearson, Hemel Hempstead, UK.
Song Q, Shepperd M (2011) Predicting software project effort: A grey relational analysis based
method. Expert Systems with Applications 38:73027316.
Song Q, Shepperd M (2007) Missing Data Imputation Techniques. Intl J of Bus Intelligence &
Data Mining 2:261291.
Strike K, El Emam K, Madhavji N (2001) Software cost estimation with incomplete data. IEEE
Transactions on Software Engineering 27:890908.
Symons CR (1988) Function point analysis: difficulties and improvements. IEEE Transactions
on Software Engineering 14:211.
Taff LM, J.W. B, Hudgins WR (1991) Estimeetings: development estimates and a front-end
process for a large project. IEEE Transactions on Software Engineering 17:839849.
Tversky A, Kahneman D (1974) Judgment under Uncertainty: Heuristics and Biases. Science
185:11241131.
Wagner, S. (2007) An Approach to Global Sensitivity Analysis: FAST on COCOMO. 1st Intl.
Symp. on Empirical Software Engineering and Measurement (ESEM 2007). IEEE Computer
Society, pp 440442.
Whitfield, D. (2007) Cost Overruns, delays and terminations: 105 outsourced public sector ICT
contracts. The European Services Strategy Unit
Willis, R., (1985). Invited review: Critical path analysis and resource constrained project
scheduling Theory and practice. European Journal of Operational Research, 21(2), pp.149
155.
Witten, I., Frank, E., Hall, M. (2011) Data Mining: Practical Machine Learning, Tools and
Techniques. Morgan Kaufmann, Burlington, MA.
Yang, Y., He, Z., Mao, K., et al. (2013) Analyzing and Handling Local Bias for Calibrating
Parametric Cost Estimation Models. Information and Software Technology In Press:
Software Engineering Body of Knowledge (SWEBOK). In: Software Engineering Body of
Knowledge (SWEBOK) Home. http://www.computer.org/portal/web/swebok/home.
... For example, the research program on cognitive biases received mainstream attention in 2002 with a Prize in Economic Sciences in Memory of Alfred Nobel (c.f., Kahneman [30]), likely influencing research in SE as well. In 2014, Shepperd [50] discusses how cost estimations are affected by the planning fallacy, a preference for case-specific over distributional evidence, the peak-end rule, and the anchoring effect. In 2016, Cunha et al. [15] briefly review how cognitive biases have been studied in the context of decision-making in software project management. ...
... (2) was studied in a narrow area of SE, such as software construction, design and testing (e.g., Blackwell et al. [7], which is also limited to two journals, or Sheil [49] and Curtis et al. [16,17], which were also done more than 30 years ago) or software project management [15,50], or (3) focused on a particular method to study cognitive processes (e.g., eye-tracking by Sharafi et al. [48]). ...
Article
Full-text available
Cognition plays a fundamental role in most software engineering activities. This article provides a taxonomy of cognitive concepts and a survey of the literature since the beginning of the Software Engineering discipline. The taxonomy comprises the top-level concepts of perception, attention, memory, cognitive load, reasoning, cognitive biases, knowledge, social cognition, cognitive control, and errors, and procedures to assess them both qualitatively and quantitatively. The taxonomy provides a useful tool to filter existing studies, classify new studies, and support researchers in getting familiar with a (sub) area. In the literature survey, we systematically collected and analysed 311 scientific papers spanning five decades and classified them using the cognitive concepts from the taxonomy. Our analysis shows that the most developed areas of research correspond to the four life-cycle stages, software requirements, design, construction, and maintenance. Most research is quantitative and focuses on knowledge, cognitive load, memory, and reasoning. Overall, the state of the art appears fragmented when viewed from the perspective of cognition. There is a lack of use of cognitive concepts that would represent a coherent picture of the cognitive processes active in specific tasks. Accordingly, we discuss the research gap in each cognitive concept and provide recommendations for future research.
... For example, the research program on cognitive biases received mainstream attention in 2002 with a Prize in Economic Sciences in Memory of Alfred Nobel (c.f., Kahneman [30]), likely influencing research in SE as well. In 2014, Shepperd [50] discusses how cost estimations are affected by the planning fallacy, a preference for case-specific over distributional evidence, the peak-end rule, and the anchoring effect. In 2016, Cunha et al. [15] briefly review how cognitive biases have been studied in the context of decision-making in software project management. ...
... (2) was studied in a narrow area of SE, such as software construction, design and testing (e.g., Blackwell et al. [7], which is also limited to two journals, or Sheil [49] and Curtis et al. [16,17], which were also done more than 30 years ago) or software project management [15,50], or (3) focused on a particular method to study cognitive processes (e.g., eye-tracking by Sharafi et al. [48]). ...
Preprint
Full-text available
Cognition plays a fundamental role in most software engineering activities. This article provides a taxonomy of cognitive concepts and a survey of the literature since the beginning of the Software Engineering discipline. The taxonomy comprises the top-level concepts of perception, attention, memory, cognitive load, reasoning, cognitive biases, knowledge, social cognition, cognitive control, and errors, and procedures to assess them both qualitatively and quantitatively. The taxonomy provides a useful tool to filter existing studies, classify new studies, and support researchers in getting familiar with a (sub) area. In the literature survey, we systematically collected and analysed 311 scientific papers spanning five decades and classified them using the cognitive concepts from the taxonomy. Our analysis shows that the most developed areas of research correspond to the four life-cycle stages, software requirements, design, construction, and maintenance. Most research is quantitative and focuses on knowledge, cognitive load, memory, and reasoning. Overall, the state of the art appears fragmented when viewed from the perspective of cognition. There is a lack of use of cognitive concepts that would represent a coherent picture of the cognitive processes active in specific tasks. Accordingly, we discuss the research gap in each cognitive concept and provide recommendations for future research.
... Another implication of our results is that, knowing that SQL-related tasks may involve more code changes, developers or project managers can consider this information when creating and communicating software estimations (e.g., when playing planning poker, rules may be carefully followed to ensure the most reliable estimates are entered on the system). The software estimation aspect is particularly important because accurate estimations are directly related to the delivery of a successful software project (Whigham et al. 2015;Shepperd 2014). Additionally, if an issue is likely impacting SQL code, developers may consider this information in their decision to prioritise the code review of the changes involved in such an issue. ...
Article
Full-text available
A key function of a software system is its ability to facilitate the manipulation of data, which is often implemented using a flavour of the Structured Query Language (SQL). To develop the data operations of software (i.e, creating, retrieving, updating, and deleting data), developers are required to excel in writing and combining both SQL and application code. The problem is that writing SQL code in itself is already challenging (e.g., SQL anti-patterns are commonplace) and combining SQL with application code (i.e., for SQL development tasks) is even more demanding. Meanwhile, we have little empirical understanding regarding the characteristics of SQL development tasks. Do SQL development tasks typically need more code changes? Do they typically have a longer time-to-completion? Answers to such questions would prepare the community for the potential challenges associated with such tasks. Our results obtained from 20 Apache projects reveal that SQL development tasks have a significantly longer time-to-completion than SQL-unrelated tasks and require significantly more code changes. Through our qualitative analyses, we observe that SQL development tasks require more spread out changes, effort in reviews and documentation. Our results also corroborate previous research highlighting the prevalence of SQL anti-patterns. The software engineering community should make provision for the peculiarities of SQL coding, in the delivery of safe and secure interactive software.
... Another implication of our results is that, knowing that SQL-related tasks may involve more code changes, developers or project managers can consider this information when creating and communicating software estimations (e.g., when playing planning poker, rules may be carefully followed to ensure the most reliable estimates are entered on the system). The software estimation aspect is particularly important because accurate estimations are directly related to the delivery of a successful software project [60,53]. Additionally, if an issue is likely impacting SQL code, developers may consider this information in their decision to prioritise the code review of the changes involved in such an issue. ...
Preprint
Full-text available
A key function of a software system is its ability to facilitate the manipulation of data, which is often implemented using a flavour of the Structured Query Language (SQL). To develop the data operations of software (i.e, creating, retrieving, updating, and deleting data), developers are required to excel in writing and combining both SQL and application code. The problem is that writing SQL code in itself is already challenging (e.g., SQL anti-patterns are commonplace) and combining SQL with application code (i.e., for SQL development tasks) is even more demanding. Meanwhile, we have little empirical understanding regarding the characteristics of SQL development tasks. Do SQL development tasks typically need more code changes? Do they typically have a longer time-to-completion? Answers to such questions would prepare the community for the potential challenges associated with such tasks. Our results obtained from 20 Apache projects reveal that SQL development tasks have a significantly longer time-to-completion than SQL-unrelated tasks and require significantly more code changes. Through our qualitative analyses, we observe that SQL development tasks require more spread out changes, effort in reviews and documentation. Our results also corroborate previous research highlighting the prevalence of SQL anti-patterns. The software engineering community should make provision for the peculiarities of SQL coding, in the delivery of safe and secure interactive software.
... Acknowledging that ML classification is a process of steps with possibly multiple iterations suggests the need to look at the estimated cost for all these steps. Cost estimation is known to be inherently difficult in software engineering [39]. The same is true for value prediction. ...
Preprint
Full-text available
Machine Learning (ML) can substantially improve the efficiency and effectiveness of organizations and is widely used for different purposes within Software Engineering. However, the selection and implementation of ML techniques rely almost exclusively on accuracy criteria. Thus, for organizations wishing to realize the benefits of ML investments, this narrow approach ignores crucial considerations around the anticipated costs of the ML activities across the ML lifecycle, while failing to account for the benefits that are likely to accrue from the proposed activity. We present findings for an approach that addresses this gap by enhancing the accuracy criterion with return on investment (ROI) considerations. Specifically, we analyze the performance of the two state-of-the-art ML techniques: Random Forest and Bidirectional Encoder Representations from Transformers (BERT), based on accuracy and ROI for two publicly available data sets. Specifically, we compare decision-making on requirements dependency extraction (i) exclusively based on accuracy and (ii) extended to include ROI analysis. As a result, we propose recommendations for selecting ML classification techniques based on the degree of training data used. Our findings indicate that considering ROI as additional criteria can drastically influence ML selection when compared to decisions based on accuracy as the sole criterion
... Software development is a nonroutine activity because most systems developed are unique and cannot be developed again. Software development is often described as creative work where a single optimal solution may not exist, and progress toward completion can be difficult to estimate (Kraut & Streeter, 1995;Shepperd, 2014). ...
Article
Full-text available
Software development projects have undergone remarkable changes with the arrival of agile development methods. While intended for small, self-managing teams, these methods are increasingly used also for large development programs. A major challenge in programs is to coordinate the work of many teams, due to high uncertainty in tasks, a high degree of interdependence between tasks and because of the large number of people involved. This revelatory case study focuses on how knowledge work is coordinated in large-scale agile development programs by providing a rich description of the coordination practices used and how these practices change over time in a four year development program with 12 development teams. The main findings highlight the role of coordination modes based on feedback, the use of a number of mechanisms far beyond what is described in practitioner advice, and finally how coordination practices change over time. The findings are important to improve the outcome of large knowledge-based development programs by tailoring coordination practices to needs and ensuring adjustment over time.
... Por todo lo anterior, la estimación de costos de software es importante para la planificación, programación, presupuesto y establecimiento del precio indicado al desarrollo del software [7]. Otros autores como Pendharkar [8] y Martin Shepperd [9] convergen con estos principios al asegurar que es fundamental para el éxito de la gestión del proyecto de software, al afectar la mayoría de las actividades de gestión incluyendo la asignación de recursos, la licitación de proyectos y la planificación. ...
Article
Full-text available
The present paper is the result of the following investigation: ''Design of a model to improve the cost estimation processes for software developing companies''. It is presented a revision of the literature at an international level for identifying tendencies and methods for more accurate software cost estimations. Trough the Delphi predictive method, a group of experts in the field of software development in Barranquilla qualified and assessed from the probability of occurrence five realistic estimation scenarios. A completely random experiment was designed whose results point to two scenarios statistically similar in a cualitative manner, from that a three agents model was built: Metodology, capacity of the work team and technological products; each of those with three categories of fulfilment to achieve more precise estimations.
... Por todo lo anterior, la estimación de costos de software es importante para la planificación, programación, presupuesto y establecimiento del precio indicado al desarrollo del software [7]. Otros autores como Pendharkar [8] y Martin Shepperd [9] convergen con estos principios al asegurar que es fundamental para el éxito de la gestión del proyecto de software, al afectar la mayoría de las actividades de gestión incluyendo la asignación de recursos, la licitación de proyectos y la planificación. ...
Article
Full-text available
El presente artículo es resultado de la investigación: “Diseño de un modelo para mejorar los procesos de estimación de costos para las empresas desarrolladoras de software”. Se presenta una revisión de la literatura con el fin de identificar tendencias y métodos para realizar estimaciones de costos de software más exactas. Por medio del método productivo Delphi, expertos pertenecientes al sector de software de Barranquilla en Colombia, clasificaron y valoraron según la probabilidad de ocurrencia cinco escenarios realistas de estimaciones. Se diseñó un experimento completamente aleatorio cuyos resultados apuntaron a dos escenarios estadísticamente similares de manera cualitativa, con lo que se construyó un modelo de análisis basado en tres agentes: metodología, capacidad del equipo de trabajo y productos tecnológicos; cada uno con tres categorías de cumplimiento para lograr estimaciones más precisas.
Article
Full-text available
A recurring statement in the literature is that cost estimations in software projects are problematic, but the evidence for such a statement is often unclear. In this paper, we analyze a project repository consisting of 338 software projects from the German software company adesso where for each project the original estimation is available in addition to the actual and estimated remaining costs at different points in time. The results revealed that there is an underestimation of costs in the repository (12.6%), but this underestimation is not significant: the hypothesis \(\mu (\frac {estimatedCosts}{averageCosts})=1\) (respectively the hypothesis on the log-transformed ratios \(\mu (log(\frac {estimatedCosts}{averageCosts}))=0\)) could not be rejected. However, we found a significant underestimation in the largest 20% of projects. And finally, we found a strong correlation between the estimated costs after 50% project duration and the final costs.
Article
Context In this paper we present a multiple case study on the insights of software organizations into stakeholder satisfaction and (perceived) value of their software projects. Our study is based on the notion that quantifying and qualifying project size, cost, duration, defects, and estimation accuracy needs to be done in relation with stakeholder satisfaction and perceived value. Objectives We contrast project metrics such as cost, duration, number of defects and estimation accuracy with stakeholder satisfaction and perceived value. Method In order to find out whether our approach is practically feasible in an industrial setting, we performed two case studies; one in a Belgian telecom company and the other in a Dutch software company. Results In this study we evaluate 22 software projects that were delivered during one release in the Belgian telecom company, and 4 additional large software releases (representing an extension of 174% in project size) that were delivered in a Dutch software company. Eighty-three (83) key stakeholders of two companies provide stakeholder satisfaction and perceived value measurements in 133 completed surveys. Conclusions We conclude that a focus on shortening overall project duration, and improving communication and team collaboration on intermediate progress is likely to have a positive impact on stakeholder satisfaction and perceived value. Our study does not provide any evidence that steering on costs helped to improve these. As an answer to our research question - how do stakeholder satisfaction and perceived value relate to cost, duration, defects, size and estimation accuracy of software projects? – we found five take-away-messages.
Article
Full-text available
Back cover text: Megaprojects and Risk provides the first detailed examination of the phenomenon of megaprojects. It is a fascinating account of how the promoters of multibillion-dollar megaprojects systematically and self-servingly misinform parliaments, the public and the media in order to get projects approved and built. It shows, in unusual depth, how the formula for approval is an unhealthy cocktail of underestimated costs, overestimated revenues, undervalued environmental impacts and overvalued economic development effects. This results in projects that are extremely risky, but where the risk is concealed from MPs, taxpayers and investors. The authors not only explore the problems but also suggest practical solutions drawing on theory and hard, scientific evidence from the several hundred projects in twenty nations that illustrate the book. Accessibly written, it will be essential reading in its field for students, scholars, planners, economists, auditors, politicians, journalists and interested citizens.
Technical Report
Full-text available
The Research Report identifies the scope of major cost overruns, delays and terminations in 105 outsourced public sector ICT projects in central government, NHS, local authorities, public bodies and agencies in the last decade. There has been wide reporting of individual and department or authority-wide project failures in the national and ICT press but little analysis of the overall scope and evidence. The value of contacts is nearly £30billion with an average cost overrun of 30.5%.
Article
The field of software economics seeks to develop technical theories, guidelines, and practices of software development based on sound, established, and emerging models of value and value-creation-adapted to the domain of software development as necessary. The premise of the field is that software development is an ongoing investment activity-in which developers and managers continually make investment decisions requiring the expenditure of valuable resources, such as time, talent, and money. The overriding aim of this activity is to maximize the value added subject to an equitable distribution among the participating stakeholders. The goal of the tutorial is to expose the audience to this line of thinking and introduce the tools pertinent to its pursuit. The tutorial is designed to be self-contained and will cover concepts from introductory to advanced. Both practitioners and researchers with an interest in the impact of value considerations in software decision-making will benefit from attending it. This tutorial is offered in conjunction with the Fourth International Workshop on Economics-Driven Software Engineering Research (EDSER-4). The tutorial is meant in part to enable those who would like to participate in the workshop, but who might not possess the requisite background, to come up to speed.
Article
Many decisions are based on beliefs concerning the likelihood of uncertain events such as the outcome of an election, the guilt of a defendant, or the future value of the dollar. Occasionally, beliefs concerning uncertain events are expressed in numerical form as odds or subjective probabilities. In general, the heuristics are quite useful, but sometimes they lead to severe and systematic errors. The subjective assessment of probability resembles the subjective assessment of physical quantities such as distance or size. These judgments are all based on data of limited validity, which are processed according to heuristic rules. However, the reliance on this rule leads to systematic errors in the estimation of distance. This chapter describes three heuristics that are employed in making judgments under uncertainty. The first is representativeness, which is usually employed when people are asked to judge the probability that an object or event belongs to a class or event. The second is the availability of instances or scenarios, which is often employed when people are asked to assess the frequency of a class or the plausibility of a particular development, and the third is adjustment from an anchor, which is usually employed in numerical prediction when a relevant value is available.
Article
We explore the tasks where sensitivity analysis (SA) can be useful and try to assess the relevance of SA within the modeling process. We suggest that SA could considerably assist in the use of models, by providing objective criteria of judgement for different phases of the model-building process: model identification and discrimination; model calibration; model corroboration. We review some new global quantitative SA methods and suggest that these might enlarge the scope for sensitivity analysis in computational and statistical modeling practice. Among the advantages of the new methods are their robustness, model independence and computational convenience. The discussion is based on worked examples.
Article
A summary is presented of the current state of the art and recent trends in software engineering economics. It provides an overview of economic analysis techniques and their applicability to software engineering and management. It surveys the field of software cost estimation, including the major estimation techniques available, the state of the art in algorithmic cost models, and the outstanding research issues in software cost estimation.