ArticlePDF Available

Improving Accuracy in Predicting City-level Construction Cost Indices by Combining Linear ARIMA and Nonlinear ANNS

Authors:

Abstract

Accurate cost forecasting in budget planning and contract bidding is crucial for the success of construction projects. Linear models such as the autoregressive integrated moving average (ARIMA) and nonlinear models such as the artificial neural network (ANN) have been adopted in the literature for forecasting construction costs. However, both linear and nonlinear models are subject to some limitations derived from their modeling structure and assumptions. This study proposes a hybrid ARIMA-ANN model for forecasting construction costs and explores whether the hybrid ARIMA-ANN model can provide more accurate forecasts than an individual ARIMA or ANN. The national and city-level construction cost indices (CCIs) are forecasted for three forecasting horizons (short-term, mid-term, and long-term) using three forecasting models: (1) linear autoregressive integrated moving average (ARIMA), (2) nonlinear artificial neural networks (ANNs), and (3) the hybrid ARIMA-ANN model. Out-of-sample forecasting exercise reveals that the hybrid model combining the distinctive features of both ARIMA and ANNs performs better than individual models in most forecasting cases, especially for longer-term forecasting horizons. The findings can help project planners, cost engineers, and decision-makers prepare for more accurate budgets and bids for diverse construction projects in different locations.
1
IMPROVING ACCURACY IN PREDICTING CITY-LEVEL
CONSTRUCTION COST INDICES BY COMBINING LINEAR ARIMA
AND NONLINEAR ANNS
Sooin Kim, S.M.ASCE1, Chi-Young Choi2, Mohsen Shahandashti, M.ASCE3, and
Kyeong Rok Ryu, A.M.ASCE4
1Graduate Research Assistant, Department of Civil Engineering, The University of Texas at Arlington, 416
S. Yates St., Arlington, TX 76010. E-mail: sooin.kim@uta.edu
2Professor, Department of Economics, The University of Texas at Arlington, 701 S. West St., Arlington,
TX 76019. E-mail: cychoi@uta.edu
3Associate Professor, Department of Civil Engineering, The University of Texas at Arlington, 416 S. Yates
St., Arlington, TX 76010. E-mail: mohsen@uta.edu
4Assistant Professor, Department of Civil Engineering, The University of Texas at Arlington, 416 S. Yates
St., Arlington, TX 76010. E-mail: kyeongrok.ryu@uta.edu
ABSTRACT
Accurate cost forecasting in budget planning and contract bidding is crucial for the success of
construction projects. Linear models such as the autoregressive integrated moving average
(ARIMA) and nonlinear models such as the artificial neural network (ANN) have been adopted in
the literature for forecasting construction costs. However, both linear and nonlinear models are
subject to some limitations derived from their modeling structure and assumptions. This study
proposes a hybrid ARIMA-ANN model for forecasting construction costs and explores whether
the hybrid ARIMA-ANN model can provide more accurate forecasts than an individual ARIMA
or ANN. The national and city-level construction cost indices (CCIs) are forecasted for three
forecasting horizons (short-term, mid-term, and long-term) using three forecasting models: (1)
linear autoregressive integrated moving average (ARIMA), (2) nonlinear artificial neural networks
(ANNs), and (3) the hybrid ARIMA-ANN model. Out-of-sample forecasting exercise reveals that
the hybrid model combining the distinctive features of both ARIMA and ANNs performs better
than individual models in most forecasting cases, especially for longer-term forecasting horizons.
2
The findings can help project planners, cost engineers, and decision-makers prepare for more
accurate budgets and bids for diverse construction projects in different locations.
Keywords: Hybrid model, Out-of-sample forecasting, City-level ENR Construction cost
index (CCI), U.S. cities, ARIMA model, ANN.
INTRODUCTION
Cost estimation is an essential part of budgeting and bidding at the beginning stage of construction
projects. Inaccurate cost estimation leads to not only direct losses to stakeholders but also socially
undesirable consequences (Choi et al. 2021, Kim et al. 2020). Cost overestimation can cause a
bidding failure to a bidder by offering an uncompetitive budget compared to competing bidders.
Moreover, if the cost is overestimated in a large public infrastructure project, then the government
as a project owner needs to allocate or finance more budget for the project. This ultimately leads
to the opportunity loss for other potential projects as well as increases the implicit or explicit
government financial burden. According to the Congressional Budget Office (CBO), the combined
federal, state, and local spending on infrastructure projects was $441 billion as of 2017, about 2.3
percent of the U.S. GDP (CBO 2018). State and local governments also spend approximately 85
percent of their capital budget on key public infrastructures (McNichol 2016). Cost
underestimation is equally problematic as it causes project cost overruns, which in turn increases
the risks of project delay, abandonment, financial losses, and even insolvency of the contractors
(Cantarelli and Flyvbjerg 2013). As such, accurate cost estimation is crucial not only for the
success of a construction project but also for the efficient allocation of limited resources (Kim et
al. 2021a). Nevertheless, accurate cost estimation and forecasting is a challenging task due to large
3
fluctuations of construction costs over time, often measured by the Construction Cost Indices
(CCIs).
This has sparked many researchers to seek approaches to improving the accuracy of forecasting
CCIs in two different directions. On the one hand, a great deal of effort has been directed at the
development of better-performing forecasting models. On the other hand, a more recent study by
Choi et al. (2021) shows that the use of city-level CCIs can enhance the forecasting accuracy of
local construction costs that vary widely across project locations. The current study sits at the
intersection of these two strands of the literature. To be specific, the present study aims to show
how to improve the forecasting accuracy of construction costs in different locations by creating a
hybrid forecasting model that is superior to the individual models.
LITERATURE REVIEW
Most construction industry variables including construction costs are subject to volatile
fluctuations, which can increase contingency costs and lead to failure in construction project
process including bidding, cost estimating, and investing. Thus, various forecasting models have
been implemented to acquire more accurate forecasts of construction industry variables.
Multivariate forecasting models use macroeconomic and social leading indicators as independent
variables for forecasting the construction industry variables. Bhattacharyya et al. (2021) included
macroeconomic and social variables such as construction spending, construction backlog
indicator, and employment in multiple linear regression and random forest models for forecasting
Purdue Index for Construction (Pi-C). Assaad and El-adaway (2021) investigated the impacts of
the dynamic workforce and workplace variables such as total construction employment and
4
average weekly hours on construction productivity. Shahandashti and Ashuri (2016) identified
macroeconomic leading indicators such as crude oil price for forecasting the national highway
construction cost index (NHCCI). Xu and Moon (2013) forecasted the CCI using a cointegrated
vector autoregressive (VAR) model based on the interactive relationship between the CCI and
consumer price index (CPI).
Although it is useful to identify macroeconomic and social indicators and investigate their impacts
on construction industry variables for forecasting, parsimonious univariate models have several
advantages, especially when statistical data on leading indicators are limited or unavailable (Choi
et al. 2021, Lam and Oshodi 2016a, Hwang 2011). For example, Han et al. (2018) compared
forecasting accuracies between multiple linear regression and univariate ARIMA models and
concluded that the univariate ARIMA provided more accurate forecasts for the near future trends
of the home sales index (HSI).
Multivariate models could provide more accurate forecasts by considering the relationships
between CCI and macroeconomic indicators to some extent. However, multivariate models require
several parameters to be estimated, which can consequently lead to greater specification and
forecasting errors than the errors of parsimonious univariate models (Cook and Doh 2019;
Steyerberg 2018; Giraitis et al. 2018; Han et al. 2017). Parsimonious univariate models can avoid
overfitting problems and outperform an overparameterized model (Cook and Doh 2019).
Parsimonious univariate models are also preferable due to their comparable simplicity for
convenient industrial applications (Nobis et al. 2019; Kamruzzaman et al. 2016).
To this end, the scope of this research primarily focuses on implementing univariate time series
forecasting models for CCIs. In future research, it is recommended to investigate the relationships
5
between the CCIs and macroeconomic indicators and examine whether multivariate models can
improve forecasting accuracy over univariate hybrid models. Moreover, the hybrid model can be
updated to include multivariate time series models in future research. Overfitting problems and
model specification errors should be fully and cautiously investigated in such hybrid models
including multivariate time series models.
Two univariate time series forecasting models have been popularly adopted for this purpose: the
linear autoregressive integrated moving average (ARIMA) models and the nonlinear artificial
neural networks (ANNs). Moon et al. (2018) forecasted CCI using univariate linear ARIMA and
ARFIMA (Autoregressive Fractionally Integrated Moving Average) models. Lam and Oshodi
(2016a) applied the ARIMA, ANN, and Support Vector Machine models for predicting gross
values in the construction industry. As summarized in Table 1, however, no clear consensus exists
on a better performing model because neither of them consistently gives the best results in various
situations.
6
Table 1. Summary of previous studies on forecasting construction costs
Study
Methodology
Main findings
Ashuri and
Lu (2010)
ARIMA model
Seasonal ARIMA models outperform other univariate
time series models in forecasting the ENR national CCI.
Choi et al.
(2021)
ARIMA and VEC model
(Vector Error Correction)
Recommends a parsimonious ARIMA model for
forecasting the city-level construction cost index (CCI) in
the absence of leading indicators.
Fan et al.
(2010)
ARIMA model
ARIMA models cannot capture sudden changes with
turning points in the construction market.
Zhao et al.
(2019)
ARIMA model
No single dominant model was found in forecasting
residential building costs in New Zealand. The dominance
of the ARIMA model in out-of-sample forecasting
performances varies with data characteristics.
Mir et al.
(2021)
ANN
The overfitting problems of ANN were alleviated with the
optimal lower and upper bound estimation method in
forecasting construction material prices.
Shiha et al.
(2020)
ANN
ANNs perform better than linear models in forecasting
construction material price movements.
Tijanić et al.
(2019)
ANN
The forecasting accuracy of ANN depends on the quality
and quantity of input data.
Cao and
Ashuri (2020)
ARIMA and ANNs
ANNs outperform seasonal ARIMA models in forecasting
highway construction costs.
Mahdavian et
al. (2021)
Linear regression and
neural networks
Linear regression models exhibit superiority in
forecasting highway construction costs over nonlinear
neural networks.
Oshodi et al
(2017)
ARIMA and ANNs
ANNs outperform ARIMA models for forecasting the
tender price index.
Yip et al.
(2014)
ARIMA and ANNs
No dominance between ARIMA and ANNs in forecasting
construction equipment maintenance costs.
7
This lack of consensus is because each model has different strengths and weaknesses, as
summarized in Table 2.
Table 2. Strengths and weaknesses of ARIMA and ANNs
Approach
Strengths
ARIMA
Simple and relatively easy to
implement (Fattah et al. 2018)
Less sensitive to data size and noise
level (Fard and Akbari-Zadeh 2014).
More robust to overfitting problems
(Valipour et al. 2013).
ANN
Suitable for capturing nonlinear
dynamics (Oshodi et al. 2017).
More flexible approximation using
multiple functions.
Accommodate heterogeneous
dynamics in data movements
(Büyükşahin and Ertekin 2019).
As a linear parametric time series model, the ARIMA model is simple, easy to implement, and
robust to data size and noise level (Fard and Akbari-Zadeh 2014, Fattah et al. 2018). However,
the ARIMA model is not suited to capture nonlinear dynamics of time series since it assumes a
linear relationship between past and future observations (Oshodi et al. 2017, Wang and Ashuri
2017). By contrast, the ANN has an advantage in estimating nonlinear and volatile components in
time series by extracting the information from the observed data without assuming a specific
relationship between past and future observations. Despite the attractive features of flexibility and
nonlinearity, ANNs are not necessarily preferred over linear models when it comes to out-of-
sample forecasting performances because of the so-called over-fitting problem.
8
To overcome these limitations of individual ARIMA and ANN models which cause no clear
consensus on a better forecasting model, hybrid models were suggested to provide dominant
predictability in forecasting different time series data such as sunspot dataset, Canadian lynx
dataset, foreign exchange rates, and stock market price indices (Büyükşahin and Ertekin 2019,
Khashei and Bijari 2011, Rathnayaka et al. 2015, Wang et al. 2013, Zhang 2003). We hypothesize
that the proposed hybrid approach for forecasting CCIs can overcome the limitations of individual
ARIMA and ANN models and provide more accurate forecasts than individual models by
estimating individually and combining both linear and nonlinear components in CCIs. The basic
idea of hybrid models is to utilize the distinctive strengths of ARIMA and ANNs to alleviate the
limitations of each model. By incorporating the advantage of each model into a combined model,
the hybrid model can achieve more accurate forecast results by reducing the error of adopting an
inappropriate method. For instance, the ARIMA model in the hybrid model can mitigate the
overfitting problems of the ANN, while the ANN in the hybrid model can capture the dynamics
that the linear function fails to approximate (Tealab et al. 2017). A common practice in the hybrid
model is to decompose time series data into its linear and nonlinear components, then apply an
appropriate type of model to each of them separately. The hybrid ARIMA-ANN model considered
herein approximates the linear component of a time series using an ARIMA model before
implementing an ANN. The ANN is then applied to approximate the nonlinearities in the residuals
of the ARIMA model that are not captured by the linear ARIMA model. Consequently, the hybrid
model can reduce the forecasting errors stemming from the overfitting problems of the ANN while
capturing nonlinear fluctuations in the ARIMA model residuals with the more flexible ANN.
This paper is the first attempt to develop and adopt a hybrid ARIMA-ANN model for CCI
forecasting to overcome the limitations of individual ARIMA and ANN models, which cause no
9
clear consensus on a better forecasting model in the study of CCI forecasting. The national CCI
has been forecasted using linear and nonlinear models individually in the previous studies. The
present study creates the hybrid ARIMA-ANN model (hereafter, hybrid model) to examine
whether and how much it can achieve more accurate forecasting results in both national and city-
level CCIs over individual ARIMA or ANN models, under various forecasting horizons typically
valuable to practitioners.
The hybrid model is particularly suited for forecasting CCIs, which are constructed by the
combination of multiple sub-indices, such as labor costs and material costs, that are likely to follow
different dynamic patterns across locations. Given that the main advantage of the hybrid model
comes from approximating the linear and nonlinear components of the time series in different
ways, it can effectively capture the different dynamics of subcomponents and thus yield more
accurate forecasts of CCIs. Moreover, since the dynamics of CCI movements may differ across
locations, particularly due to the heterogeneous dynamics of subcomponents, individual models
working well in one location may not necessarily work well in others. As a result, either ARIMA
or ANN alone may not be sufficient to capture the heterogeneous dynamics of CCIs in different
locations. This makes the hybrid model a promising technique for forecasting the movements in
CCIs at the city level.
DATA
This research uses monthly construction cost indices (CCIs) from January 1995 to December 2019,
which are published by Engineering News-Record (ENR) for 20 major cities in the United States.
The CCI in each city is a weighted average of subcomponents, such as common labor costs and
10
material costs, whose weights are fixed across cities. The city CCI consists of 81% common labor
costs, 13% of steel prices, 5% of lumber prices, and 1% of cement prices (Zevin 2020). Common
labor costs are measured by 200 hours of common labor at the average of common labor cost rates
in each city. Material prices are measured by 25 cwt of standard structural steel price, 1.128 tons
of Portland cement price, and 1,088 board-ft of 24 lumber price (ENR 2021). The national CCI
is then constructed from the simple average of the CCIs of 20 major cities. As a popular cost index
of many construction projects, the ENR CCI has been widely used for cost estimation, bid
preparation, and project budgeting (Ashuri et al. 2012). Following the research by Choi et al.
(2021), the current research collects the ENR CCIs for both the national level (NAT) and twenty
cities: Atlanta (ATL), Baltimore (BAL), Birmingham (BHM), Boston (BOS), Chicago (CHI),
Cincinnati (CIN), Cleveland (CLE), Dallas (DAL), Denver (DEN), Detroit (DET), Kansas City
(KCT), Los Angeles (LAX), Minneapolis (MIN), New Orleans (NOL), New York City (NYC),
Philadelphia (PHL), Pittsburg (PIT), San Francisco (SFC), Seattle (SEA), and St. Louis (STL).
The dataset, therefore, consists of 21 CCI series (national plus twenty cities) for 25 years (January
1995 to December 2019), resulting in 300 monthly observations for each series (N = 21, T = 300).
For the empirical analysis below, each CCI series is seasonally adjusted using the U.S. Census
Bureau’s X13-ARIMA seasonal adjustment method.
METHODOLOGIES
Researchers in the CCI literature have sought a better forecasting model over the years. Among
numerous forecasting models available in the literature, this research focuses on three models for
comparison: (1) linear ARIMA, (2) nonlinear ANNs, and (3) hybrid ARIMA-ANNs. While linear
11
ARIMA models and nonlinear ANNs have been popularly employed in the literature, the hybrid
model has yet been used to forecast CCIs. It is therefore worth investigating the forecasting
performance of the hybrid ARIMA-ANN model approach in comparison with those of ARIMA
and ANNs.
Autoregressive Integrated Moving Average (ARIMA) Model
Built on the combination of Autoregressive (AR) and Moving Average (MA) models, ARIMA
models assume that future data values are linearly dependent on the current and past data
observations as well as random errors.
A typical ARIMA(p,0,q) model can be represented by Equation (1).

 
 (1)
where yt denotes the current observation of a time series of interest, yt−i (i=1,2,, p) represent its
past observations, and  (j=0,1,2,, q) are random errors with zero mean and finite variance.
p and q respectively denote the orders of the autoregressive (AR) term and moving average (MA)
term, which are selected by the Bayesian information criterion (BIC) rule.
Since the stationarity of the time series is required for ARIMA models, an appropriate data
transformation is needed before estimating an ARIMA model. Because all the level CCIs in this
research turn out to be nonstationary, they are transformed to the growth rates by taking the first
log differencing before estimating an ARIMA model.
Thanks to the simplicity, ARIMA models have been widely adopted for forecasting CCIs. Studies,
in general, find that ARIMA models have decent predictive power for linear and stationary time
12
series processes. However, due to the underlying linearity assumption, ARIMA models show
limited accuracy in forecasting nonlinear dynamic patterns observed in many real-world time
series data (Zhang 2003). To cope with this problem, several types of nonlinear models have been
considered in the literature. A difficulty arising in this regard is to correctly specify the form of
nonlinearity. Among a wealth of nonlinear models, an artificial neural network (ANN) has been
extensively considered in the literature.
Artificial Neural Network (ANN)
ANN is a nonlinear, nonparametric, and data-driven machine learning method that mimics the
central nervous system of the human brain (Ciaburro and Venkateswaran 2017). ANN comprises
several layers, including one input layer, one or more hidden layers, and one output layer (Abd
Rahman et al. 2015). A set of nodes, or artificial neurons, are organized in each layer. The artificial
neurons with associated weights and thresholds in each layer are interconnected with the neurons
in the following layer. Any artificial neuron whose output is above its specified threshold value is
activated to send information to the next layer of the network. When the artificial neuron is not
activated, the information is not transferred to the next layer of the network (Yu et al. 2016).
Comprised of a set of artificial neurons and multilayer perceptrons (MLPs), the basic idea of ANN
is to process and transfer information through nonlinear activation functions without imposing any
prior assumption (Ahmadi et al. 2019; May et al. 2011).
The ANN with a three-layer network used for the current study is represented by Equation (2).
yt = 0 +
  
 (2)
13
where yt denotes a time series of interest at time t and yt−i (i=1,2,, s) denotes its past observations,
which are typically fed into the nodes in the input layer. g(∙) is a transfer function of the hidden
layer.  (i=0,1,2,, s; j=1,2,, r) is weights from the input layer to the hidden layer. j
(j=1,2,, r) is weights from the hidden layer to the output layer. s and r denote the number of
nodes in input and hidden layers, respectively. is the random error at time t.
Note that the nonlinear feature of ANNs mainly stems from the nonlinearity of this transfer
function. In the current study, the sigmoid function in Equation (3) is used as a transfer function
of the hidden layer.
g(x) =
 (3)
which transforms the input values of past observations to be bounded between 0 and 1.
 ( j=1,2, , s) and (i=1,2, , r) represent the input-to-hidden weights and the hidden-to-
output weights, respectively. is the error term.
The estimation process begins with feeding the past observations of yt into the nodes in the input
layer, which is then sent to the hidden layer, where further information is filtered out to fit the data
using the back-propagation training algorithm. The extracted information is transferred to the
output layer to produce the final output based on the nonlinear transfer function. Since the
forecasting accuracy of ANNs varies with the number of nodes in the hidden layer, forecast results
are obtained from the best training algorithm by changing the number of hidden layer nodes.
In the current study, a total of sixty-three ANN models were fitted to the growth rates of twenty
city-CCIs and national CCI over short-term, mid-term, and long-term forecasting horizons. Three-
layer ANNs were developed since the three-layer ANN can estimate any nonlinear data
14
movements if the numbers of nodes are determined properly (Liu et al. 2012). The number of input
nodes (s) was selected based on the least in-sample errors among twelve, twenty-four, and thirty-
six months. The best-fitted ANN models were achieved when the number of hidden nodes is
adjusted to the number of nodes in the input layer. As a rule of thumb, the number of hidden nodes
is two-thirds of input nodes (Karsoliya 2012). The ANNs were trained recursively using the back-
propagation training algorithm.
As data-driven, self-adaptive methods with few prior assumptions, ANNs have several attractive
features relative to the linear ARIMA model. First, the ANN is suited for capturing nonlinear
dynamics of time series because of its modeling structure using nonlinear network algorithms.
Therefore, nonlinear parametric ANNs have effectively solved nonlinear problems in the real
world, in contrast to conventional forecasting techniques like linear ARIMA models (Kim et al.
2021b, Ciulla et al. 2019). Second, since the ANNs are trained by the features of the data (data-
driven approach) without any strict modeling assumption such as linearity and stationarity, it is
capable of approximating any measurable functions between input and output values through its
data filtering process in the hidden layer. Consequently, it is often reported that nonlinear models
improve upon linear models in characterizing the in-sample properties of a time series. At the same
time, however, this flexibility of ANNs is known to lead to an overfitting problem in out-of-sample
forecasting (Golafshani et al. 2020, Zhang et al. 2018). This is why ANNs do not necessarily
outperform ARIMA models in out-of-sample forecasting. Besides, in practice, the number of
nodes and layers in an ANN needs to be estimated, which can lead to additional forecast
uncertainty.
15
Hybrid ARIMA-ANN model
The hybrid ARIMA-ANN model was proposed to alleviate the limitations of each model by
exploiting the strengths of ARIMA and ANNs (Büyükşahin and Ertekin 2019, Khashei and Bijari
2011, Rathnayaka et al. 2015, Wang et al. 2013, Zhang 2003). Previous studies in other disciplines
show that combining linear and nonlinear models can be effective in improving forecasting
performance, especially in the absence of any single dominant forecasting model (Büyükşahin and
Ertekin 2019, Zhang 2003). Zhang (2003) shows that the hybrid model can achieve improved
forecasting accuracy over individual ARIMA or ANNs by exploiting the unique strength of
ARIMA and ANNs in capturing linear and nonlinear dynamics, respectively. Buyuksahin and
Ertekin (2019) proposed a hybrid ARIMA-ANN model that works in a more general structure
using empirical mode decomposition. Yet, little is known about whether such a combination of
ARIMA and ANNs can improve the accuracy of individual linear or nonlinear models in
forecasting CCIs. This study fills the gap by examining the forecasting performance of the hybrid
model of ARIMA and ANN in comparison with those of individual ARIMA and ANNs. The
hybrid model is particularly relevant for the CCI data, which are constructed from multiple
subcomponents, such as material costs and labor costs, that are likely to follow different dynamics.
The hybrid ARIMA-ANN model employed here consists of two steps. The time series of interest
(yt) is supposed to be a combination of linear (Lt) and nonlinear (Nt) components, as represented
by Equation (4).
yt = Lt + Nt. (4)
In the first step, the linear component is estimated using an ARIMA (p,0,q) model in Equation (5).
16




 (5)
where
is the estimated linear component of the time series at time t,  (i=1,2,, p) is the
past values of the time series,
(i=1,2,, p) is autoregressive parameters,
(j=1,2,,q) is
moving-average parameters, and  ( j=1,2,, q) is the error term at time t-j. The lag lengths, p
and q, are selected by the BIC rule. Then, the residual (et) of the ARIMA model is obtained from
Equation (6).
et = yt
(6)
where yt is the observed time series of interest at time t and
is the estimated linear component
of the yt using the ARIMA model in Equation (5).
In the second step, an ANN is applied to the residual (et) to capture the nonlinear component (
)
as represented by Equation (7).
= g(et−1, et−2, , et−n) + εt (7)
where g() denotes a sigmoid function, n represents the number of nodes in the input layer of the
ANN, et−i (i=1,2,, n) is the residual of the ARIMA model at time t-i, and εt is the error term.
Combining the linear and nonlinear components of a time series approximated by Equations (5)
and (7), respectively, the future values of yt are forecasted by the hybrid model using the estimates
of
=
+
. In the forecasting exercise,
and
respectively denote the predicted values of
the linear component by the ARIMA model and the nonlinear component by the ANN.
17
FORECASTING PERFORMANCES
Linear ARIMA, nonlinear ANN, and their hybrid model were developed for each city individually
using its CCI data. Therefore, a total of 189 models, which are equal to the number of CCI time
series data multiplied by the number of different models and the number of forecasting horizons,
were finally specified for forecasting each twenty city-CCIs and national CCI over short-term,
mid-term, and long-term forecasting horizons using ARIMA, ANN, and hybrid models.
The relative forecasting accuracy of the three competing models is evaluated based on out-of-
sample forecasting performance for predicting both national and city-level CCIs. Three different
forecasting horizons are considered: short-term (12-month-ahead forecast), medium-term (36-
month-ahead forecast), and long-term (60-month-ahead forecast), comparable to the timelines of
actual construction projects. The training and testing periods of the out-of-sample forecasting
exercises are selected accordingly, as shown in Table 3.
Table 3. Training and testing periods for out-of-sample forecasting by forecasting horizon
Forecasting horizon
Training period
Testing period
Short-term (12 months)
January 1995 ~ December 2018
January 2019 ~ December 2019
Medium-term (36 months)
January 1995 ~ December 2016
January 2017 ~ December 2019
Long-term (60 months)
January 1995 ~ December 2014
January 2015 ~ December 2019
The effective estimation sample runs from January 1995 until December 2018 (288 monthly
observations) for the short-term horizon, from January 1995 until December 2016 (264 monthly
observations) for the medium-term horizon, and from January 1995 until December 2014 (240
monthly observations) for the long-term horizon. For the short-run forecasting horizon, for
18
example, the CCI data from January 1995 to December 2018 are used as a training dataset for
model estimation, and the remaining data from January 2019 to December 2019 are used as a
testing dataset to evaluate the out-of-sample forecasting performance. It is well known that ANNs
perform random initialization and produce different results at each run (e.g., Büyükşahin and
Ertekin 2019). Since there is no established method for network configuration in ANNs, the best
network configuration is chosen for ANNs after building for each combination of network
parameters and evaluated on the testing dataset.
Root Mean-Squared Errors (RMSE)
Traditionally, forecasting performances are often evaluated using the root-mean-squared errors
(RMSEs). The models with smaller RMSEs are considered to show more accurate forecasting
performances. The RMSE of a forecasting model for the forecasting horizon of h can be written
as Equation (8).
 

 

 (8)
where
and respectively represent the forecast value and actual value of a variable (y) at time
t, h denotes the forecasting horizon, and T denotes the sample size of the training period. Then,
Equation (9) and (10) represent the ratio of RMSE of ARIMA and ANN models compared to the
RMSE of the hybrid model (hereafter, the RMSE ratio), respectively.
The RMSE ratio between ARIMA and the hybrid model = 
 (9)
The RMSE ratio between ANN and the hybrid model = 
 (10)
19
The RMSE ratio serves to indicate the relative out-of-sample performances of individual models
to the hybrid model. If the RMSE ratio is greater than unity, i.e., RMSEARIMA > RMSEHybrid or
RMSEANN > RMSEHybrid, then it indicates the superiority of the hybrid model, and vice versa. If the
value of the RMSE ratio is equal to unity, then the RMSE of an individual model (ARIMA or
ANN) is the same as that of the hybrid model, suggesting an equivalent forecast accuracy of the
individual model to that of the hybrid model. Table 4 reports the RMSE ratio for the national CCI
and the 20 city averages over the three forecasting horizons.
Table 4. The RMSE ratios of out-of-sample forecasts
Forecasting horizon
Short-term
Medium-term
Long-term
RMSE ratios
ARIMA
ANN
ARIMA
ANN
ARIMA
ANN
National
1.57
1.29
7.54
3.48
7.25
11.43
20-city average
1.89
1.67
2.85
3.32
3.15
5.29
As shown in Table 4, the RMSE ratio is consistently larger than unity in all cases considered,
suggesting an outperformance of the hybrid model over individual models. Interestingly, the
outperformance of the hybrid model turns out to be stronger in longer forecasting horizons in
which the ratio is greater. Figure 1 displays the RMSE ratio of individual models to the hybrid
model for the national and city-level CCIs over the three forecasting horizons. The RMSE ratio
serves to indicate the relative performance of individual models to the hybrid model. The unity
line in Figure 1 provides a guideline where there is no significant difference between the
forecasting performance of an individual model (ARIMA or ANN) and that of a hybrid model. If
the value of the RMSE ratio is equal to unity, then the RMSE of an individual model (ARIMA or
ANN) is the same as the RMSE of the hybrid model, suggesting an equivalent accuracy of out-of-
20
sample forecasts between the two models. The ratio greater than the unity line indicates the
superiority of the hybrid model and vice versa. In other words, the higher column above the unity
line indicates the relative underperformance of the individual model compared to the hybrid model.
Several observations can be made from Figure 1. First, the hybrid model performs better than
individual models in predicting the national CCI. The RMSE ratio for the national CCI is
consistently larger than the unity line regardless of the forecasting horizons. The outperformance
of the hybrid model is stronger when the forecasting horizon is longer. Second, for the city-level
CCIs, the hybrid model outperforms in some cities, but not in others, especially when the
forecasting horizon is shorter. In the long-term forecasting horizon, however, the hybrid model
dominates both ARIMA and ANNs in all city CCIs. Third, it is not easy to tell the dominance
between two individual models, ARIMA and ANN.
Figure 1 indicates an outperformance of the hybrid model in many forecasting cases under
consideration. While the visual evidence is informative, it may be helpful if there exists further
concrete evidence based on more formal analysis. To this end, the formal testing tool proposed by
Giacomini and White (2006) is utilized to determine the best-performing model by assessing
whether the out-of-sample forecasting performance of the competing models is statistically
different from each other.
21
(a)
(b)
(c)
Figure 1. Ratio of RMSE to Hybrid Model for national and city-level CCIs for: (a) short-
term forecasting of 12 months; (b) medium-term forecasting of 36 months; and (c) long-term
forecasting of 60 months.
22
The Giacomini-White Test
Giacomini and White (2006) propose a formal test that can be used to examine whether the mean
squared errors (MSEs) of two competing models are significantly different from each other.
The null hypothesis of the GW test (henceforth, the GW test) is given by
H0

 (11)
where 
represents the MSE of the h-step ahead forecast of Model A, 
is the
corresponding MSE of Model B, and denotes the information set at time t. Under the null
hypothesis that the forecast performance of the two models under comparison is not statistically
different from each other, the GW test statistic is given by
 

   (12)
where  and 
respectively denote the MSEs of two models under comparison. Since the
GW test statistic is a two-sided test statistic with an asymptotic standard normal distribution,
inference can be made from the signs and p-values of the test statistics. The GW test compares
only two competing forecasting models at a time. Hence, the GW test is applied to three pairs of
forecasting models in this research: (i) ARIMA versus ANN, (ii) ANN versus hybrid, and (iii)
ARIMA versus hybrid. Table 5 contains the GW test results for the three forecasting horizons.
Table 5 shows that the three models under comparison are not much significantly different in the
short-term forecasting horizon as the GW test results suggest no evidence of a significant
difference in forecasting performances among them in most cases considered. The hybrid model
outperforms both ARIMA and ANNs only in six out of 20 cities. This is perhaps because the one-
23
year out-of-sample period is not sufficiently long enough for the GW tests to find significant
differences among the competing models.
Table 5. The Giacomini-White (GW) test results for forecasting performances of three
competing models
Legend
V = the forecasting outperformance of the model over the other model(s);
― = no evidence of forecasting outperformance over the other model(s)
City
Short-term horizon
Medium-term horizon
Long-term horizon
ARIMA
ANN
Hybrid
ARIMA
ANN
Hybrid
ARIMA
ANN
Hybrid
NAT
V
V
ATL
V
BAL
V
V
BHM
V
BOS
V
V
V
V
CHI
V
V
V
CIN
V
V
CLE
V
V
DAL
V
V
DEN
V
V
V
DET
V
V
V
KCT
V
V
V
LAX
V
V
MIN
V
V
V
NOL
V
NYC
V
V
V
PHL
V
V
V
V
PIT
SFC
V
V
V
SEA
V
V
V
STL
V
V
Note: Entries are based on Giacomini-White (G-W) tests under the null hypothesis that there is no statistically significant
difference in the mean-squared errors (MSEs) of out-of-sample forecasts of two forecasting models under comparison. V
represents the case of forecasting outperformance over the other model(s) under the given forecasting environment, as the
null hypothesis of the G-W test is rejected at the 5% significance level. ‘―’ represents no evidence of forecasting
outperformance over the other model(s) under the given forecasting environment.
24
The dominance of the hybrid model, however, increases with forecasting horizons. In the medium-
term horizons, the hybrid model outperforms the other two models in forecasting national CCI and
thirteen out of 20 city-level CCIs. The hybrid model performs the best in the long forecasting
horizon. The hybrid model outperforms the two individual models in almost all the cases
considered for the 5-year forecasting horizons.
Regardless of the forecasting horizons, the forecasting performance of the hybrid model is more
accurate than (or at least comparable to) that of individual models in all cases considered in the
current research. The hybrid model was validated as a more accurate model than the individual
ARIMA or ANN, providing higher accuracy and robustness in forecasting the CCIs under different
scenarios. This reflects that ARIMA or ANN, when individually used, cannot capture dynamic
patterns in the CCI series, which may contain both linear and nonlinear components. This explains
why combining two individual models by utilizing the features of each model can be effective in
enhancing the accuracy of CCI forecasts. When two individual models’ results are compared
between themselves, ANN outperforms ARIMA for the CCI forecasts in the short horizon, but
ARIMA performs better than ANN in the long horizon. However, there seems no clear dominance
between the two individual models, in line with the findings of previous studies in the literature.
The outperformance of the hybrid model is further illustrated in Figure 2. Figure 2 compares the
performances of the hybrid model and individual models (i.e., ARIMA, ANN, and Random Forest)
for forecasting national CCIs over three different forecasting horizons.
25
(a)
(b)
(c)
Figure 2. Out-of-sample forecasting performances of four models for the national CCI for:
(a) short-term forecasting of 12 months; (b) mid-term forecasting of 36 months; and (c) long-
term forecasting of 60 months.
26
Random Forest model as a nonparametric and nonlinear machine learning method has successfully
forecasted the future values of time series using binary splits and bootstrapped data (Bhattacharyya
et al. 2021; Yoon 2021; Mukherjee et al. 2018; Rudžianskaitė–Kvaraciejienė et al. 2015; Vitorino
et al. 2014). Therefore, Random Forest is additionally considered as one of the most efficient data
mining forecasting methods (Sekhar and Mahdu 2016). Table 6 summarizes the RMSE values of
the hybrid model and individual ARIMA, ANN, and Random Forest models for forecasting
national CCI over short-term, mid-term, and long-term horizons. Since the hybrid model provides
the lowest RMSEs in all forecasting cases considered in Table 6, the hybrid model was examined
to outperform all individual models in forecasting the national CCI under three different
forecasting horizons.
Table 6. The RMSE values of hybrid and individual models for forecasting the national CCI
Forecasting horizons
Short-term
Medium-term
Long-term
Hybrid ARIMA-ANN
24.37
57.70
31.21
ARIMA
38.80
150.68
232.80
ANN
39.39
262.27
387.48
Random Forest
81.86
146.84
212.06
27
The results displayed in Figure 2 are very similar to those outlined above, clearly demonstrating
the better performance of the hybrid model over individual models for various forecasting
horizons. The top panel (a) of Figure 2 displays the short-term out-of-sample forecasts of the
national CCI from the three competing models along with the actual values (in black solid line).
For the entire forecasting period of Jan-Dec 2019, the forecasts of the hybrid model (in red dot-
dashed line) are closer to the actual national CCIs than that of individual ARIMA, ANN, and
Random Forest models, in line with the results from the GW test reported in Table 5. The middle
panel (b) of Figure 2 exhibits the same exercise for the medium-term horizon with the testing
period of Jan 2017 to Dec 2019. Again, the hybrid model appears to achieve more accurate forecast
results for predicting the national CCI, compared to the three individual models. A similar
conclusion is drawn from the bottom panel (c) of Figure 2, which exhibits the long-term out-of-
sample forecasting performances for the forecasting period of Jan 2015 to Dec 2019. The better
performance of the hybrid model in all forecasting horizons must have come from treating both
the linear and nonlinear components of the time series sequentially.
The forecasting results in the current study consistently point to the outperformance of the hybrid
model over individual ARIMA and ANNs in forecasting the CCIs. The dominance of the hybrid
model over individual models is stronger for long-term forecasting horizons. This is consistent
with the findings by Babu and Reddy (2014) and Rathnayaka et al. (2015) that the hybrid model
works better than a single ARIMA and ANN in forecasting time series with high volatility.
The better performance of the hybrid model can be attributed to the comparative advantage of the
hybrid model in capturing dynamic movements of CCIs that are partly linear and partly nonlinear.
Since construction projects typically run over multiple years, their costs are likely to experience
28
volatile movements with some structural changes occurring during the entire construction period
(Shrestha et al. 2013). Besides, since the CCI series is a weighted average of multiple
subcomponents like material prices and labor costs that may follow different dynamic patterns, the
hybrid model can better capture the heterogeneous dynamics of the subcomponents.
With that said, the hybrid model adopted here assumes that the relationship between linear and
nonlinear components is additive. Hence, the empirical results obtained here may change if the
linear and nonlinear components are not additively associated. Additionally, the hybrid model
decomposes data into linear and nonlinear components by assuming that linear components can be
captured by ARIMA and nonlinear components from the residuals of the ARIMA model can be
approximated by ANN. As noted by Büyükşahin and Ertekin (2019), however, the forecasting
performances of the hybrid model may be affected if such assumptions are violated in real-world
applications. Nevertheless, the conclusion drawn here regarding the superiority of the hybrid
model over individual ARIMA or ANNs is likely unaltered because alternative methods of
decomposition of linear and nonlinear components in the hybrid model are reported to still
dominate ARIMA and ANNs in other studies.
GUIDANCE FOR MODEL SELECTION AND DEVELOPMENT
This study proposed the univariate hybrid ARIMA-ANN models for forecasting CCIs at the U.S.
national and 20-city levels. Two practical examples are provided to demonstrate a walk-through
application of an ANN and the hybrid forecasting model. New York city-CCI and Dallas city-CCI
are selected for forecasting because they represent the construction costs with high growth rates
and low growth rates over past decades in the U.S., respectively. The construction costs in New
29
York city whose construction spending is around $62 billion in 2018 are one of the most expensive
ones in the entire U.S. with a rapid growth rate of 3.74% (Hall 2019). The construction costs in
Dallas city are growing, but the annual growth rates in Dallas city-CCI from 1995 to 2019 are
relatively lower (1.9%) than the average annual growth rates of twenty U.S. major cities (2.99%).
Example 1 presents a walk-through implementation of an ANN in forecasting the New York city-
level CCI.
Example 1. Suppose that a contractor in New York City needs to estimate a construction cost for
a 3-year local construction project starting in 2017. The contractor decided to refer to ENR city-
CCI for estimating construction cost movements from 2017 to 2019. The contractor could
implement an ANN model to approximate the growth rate data of New York CCI from 2017 to
2019.
The three-layer ANN (36, 24, 1), ANN (24,16,1), and ANN (12,8,1) were developed using the
historical values of New York city-CCI from 1995 to 2016. The ANN (36,24,1) has thirty-six
previous monthly values as input nodes, twenty-four hidden nodes whose number is two-thirds of
the number of input nodes, and one output node (Karsoliya 2012). The ANN (24,16,1) has twenty-
four previous monthly values as input nodes, sixteen hidden nodes, and one output node. The ANN
(12,8,1) has twelve previous monthly values as input nodes, eight hidden nodes, and one output
node.
30
Table 7 presents the in-sample and out-of-sample forecasting errors. The ANN (36, 24, 1) has the
least forecasting errors for both in-sample and out-of-sample data.
Table 7. In-sample and out-of-sample forecasting errors of ANNs for the 3-year-ahead
forecasts of New York city-CCI
Forecasting Errors
In-sample
Out-of-sample
Models
MAPE
RMSE
MAPE
RMSE
ANN (36,24,1)
0.27
44.49
8.86
1842.84
ANN (24,16,1)
0.4
71.96
12.04
2447.97
ANN (12,8,1)
0.55
120.25
11.36
2274.97
Figure 3 illustrates the 3-year-ahead forecasts for New York city-CCI using the ANNs.
Figure 3. Out-of-sample forecasting performances of the ANNs for the 3-year-ahead
forecasts of New York City-CCI.
31
Example 2 explains a walk-through application of a hybrid model in forecasting the Dallas city-
level CCI.
Example 2. Suppose that a contractor in Dallas, Texas, considers participating in a bidding process
for a 5-year local construction project starting in 2015. The contractor referred to historical ENR
city-CCIs to forecast and estimate construction cost movements from 2015 to 2019. Since the
growth rates of Dallas CCI can be represented as a combination of linear components and nonlinear
components by Equation (13), the contractor could create the hybrid model to represent the growth
rate data of Dallas CCI from the last 5-year time span of 1995 to 2014.
∆DALt = Lt + Nt (13)
In the first step, the linear component is estimated using a linear ARIMA (1,0,1) model in Equation
(14). The ARIMA (1,0,1) model indicates the ARIMA model with one autoregressive term (p) and
one moving average term (q). The ARIMA model was selected by the BIC rule.
   (14)
where
is the estimated linear component of the Dallas CCI growth rate at time t (t=January 2015
to December 2019),  is the past values of the Dallas CCI growth rate at time t-1 which
indicates the growth rate one month previous to the forecast month (t), and is the error term at
time t. Then, the residual (et) of the ARIMA model is obtained from Equation (15).
et = DALt
(15)
32
where ∆DALt is the observed monthly Dallas CCI growth rate at time t from 1995 to 2014 and
is the estimated linear component of the Dallas CCI growth rate at time t using the ARIMA model
in Equation (14).
In the second step, the residuals (et) from 1995 to 2014 generated by Equation (15) are
approximated by the feedforward ANN (12,8,1) in Figure 4. The ANN (12,8,1) denotes an ANN
model with twelve input nodes, eight hidden nodes, and one output node. Previous twelve residuals
were recursively trained to forecast the next residual. The sigmoid function is used as a transfer
function between layers. The ANN (12,8,1) in Figure 4 is used to forecast the 5-year-ahead
residuals of Dallas city-CCIs in the hybrid model.
Figure 4. ANN for approximating and forecasting the residuals of Dallas-CCIs.
33
The hybrid model forecasts the 5-year-ahead future growth rates of the Dallas city-CCI by
combining the linear and nonlinear components approximated by ARIMA (1,0,1) and ANN
(12,8,1) additively. In this forecasting example, the hybrid model outperforms in forecasting the
5-year-ahead Dallas city-CCIs with a mean average prediction error (MAPE) of 0.32 while the
individual ARIMA and ANN provide the MAPEs of 0.7 and 5.8, respectively.
The empirical results of CCI forecasting performances using hybrid models provide several
caveats for cost engineers and capital planners in construction projects. First, the hybrid model is
recommended for forecasting the national as well as city-level CCIs because the hybrid model
displayed stable and robust predictability in forecasting CCIs under different scenarios regardless
of construction location (city) and duration (forecasting horizon). In all cases considered in this
study, the hybrid model provided more accurate forecasts than the individual models, including
ARIMA, ANN, and Random Forest. Second, the forecasting performance of the hybrid model is
more accurate, especially when the forecasting horizon is longer. Therefore, it is recommended to
implement the hybrid model for forecasting CCIs, especially in long-term construction projects.
The 5-year-ahead forecasts of the national CCI, New York city-CCI, and Dallas city-CCI using
hybrid models are illustrated in Figure 5. The significant gaps between individual city-CCIs and
national CCI suggest that using the national CCI for local projects can cause large forecasting
errors in cost estimation and adjustment.
34
Figure 5. 5-year-ahead forecasts of the national CCI, New York City-CCI, and Dallas-CCI
using hybrid models.
Third, the hybrid model outperforms an individual ARIMA or ANN, especially when the city-
CCIs show higher growth rates or more frequent discrete changes. For example, the hybrid model
provides more accurate CCI forecasts for the cities with high annual growth rates of CCIs such as
BOS (3.36%), CHI (4.1%), NYC (3.74%), STL (2.99%), SFC (2.72%), and SEA (2.98%) that are
primarily located in eastern and western parts of the U.S. compared to the cities with lower annual
growth rates of CCIs such as DAL (1.9%) and NOL (2.0%) in southern states.
One of the major reasons for cross-city heterogeneities in city-CCI movements is different
economic and market situations across cities, such as construction labor market situations. Since
the CCIs are mainly composed of labor costs, construction labor wages and market situations in
each city affect the CCI movements. The construction labor market is less flexible compared to
the construction material market due to several reasons such as labor retraining and relocation
costs, annual legislative contracts, union wage premiums, and minimum wage standards (Alaloul
et al. 2021, Bobeica et al. 2019, Finkel 2015, Hwang 2011, Phillips 1982). Thus, construction labor
35
costs increase discretely over a year with heterogenous growth rates across cities. City-CCIs are
also prone to discrete fluctuations with different growth rates based on each city’s economic
situation, such as the local construction labor market. The hybrid models show better forecasting
performances over individual models for CCIs with higher growth rates and more dynamic
fluctuations. Lastly, proper decomposition of time series into linear and nonlinear components is
critical for the forecasting performances of hybrid models. For example, if linear and nonlinear
components are combined multiplicatively, a multiplicative hybrid model is recommended (Babu
and Reddy 2014, Büyükşahin and Ertekin 2019). The hybrid model used in this study assumes the
additive combination of linear and nonlinear components that can be approximated by ARIMA
and ANN, respectively. If the linear and nonlinear components in CCI are not combined additively
(e.g., multiplicative combination), the forecasting performances of the additive hybrid model
cannot be as accurate as in this study.
CONCLUSIONS
Cost estimation in budget planning and contract bidding is important for the success of
construction projects because both underestimation and overestimation of project cost lead to
undesirable consequences. Inaccurate cost estimation can increase the probabilities of project
failures and social costs through resource misallocations. Given that construction projects typically
run over multiple years during which construction costs vary nontrivially, it is challenging to
forecast the movements of project costs in the stage of budgeting and bidding.
Researchers and practitioners have applied popular forecasting models, such as linear ARIMA
models or nonlinear ANNs to construction cost measures such as CCIs for improving the accuracy
36
of construction cost estimation. While linear models such as ARIMA produce better forecasting
accuracy for linear dynamics of a time series, nonlinear methods such as ANN are more
appropriate for capturing its nonlinear dynamics. Unfortunately, there exists no consensus on the
underlying dynamics of CCIs, which are constructed from multiple subcomponents. It is suggested
in the literature that using a combination of linear and nonlinear models may enhance the
forecasting accuracy by exploiting distinctive features of each model, compared to implementing
one of them individually. Despite the potential benefit of this hybrid model approach, little
attention has been devoted to employing such hybrid models for forecasting CCIs.
Via an extensive out-of-sample forecasting exercise for various forecasting horizons, this study
examined the outperformance of the hybrid model in forecasting both national and city-level CCIs,
relative to those of individual ARIMA and ANN models. The results of this research favor the
hybrid models over individual ARIMA or ANN models, especially for longer forecasting horizons.
The hybrid models consistently provide better forecasting performances than the individual
ARIMA and ANN models in all cases considered. In other words, the hybrid models show stable
and robust predictability in forecasting CCIs under different scenarios including construction
location (city) and duration (forecasting horizon). This is probably because the hybrid model is
designed for capturing both linear and nonlinear dynamics embedded in the subcomponents of the
CCI series.
Even though different forecasting models have been used for forecasting CCIs, no individual
forecasting model shows superior performance dominantly across different conditions such as
construction locations and forecasting horizons (Cao et al. 2015; Lam and Oshodi 2016b; Oshodi
et al. 2017; Shahandashti and Ashuri 2013; Shahandashti 2014; Shiha et al. 2020; Wang and Ashuri
37
2016). This is perhaps because those individual forecasting models do not fully regard and
represent the nature of CCI fluctuations. This research hypothesizes that the CCI fluctuations
consist of both linear and nonlinear components. Based on the hypothesis of an additive
combination of linear and nonlinear components in CCI fluctuations, the researchers were inspired
to investigate the performance of hybrid ARIMA-ANN models in CCI forecasting.
The current study is the first attempt to develop and implement the univariate hybrid ARIMA-
ANN model to forecast CCIs at the U.S. national and city levels. The findings of this study indicate
stable and robust outperformance of hybrid models based on the empirical results that forecasting
performances of individual models can be improved by combining them. The hybrid model
provided more accurate forecasts than other individual models such as ARIMA, ANN, and
Random Forest regardless of city and forecasting horizon. Also, this study found that the
fluctuations in the CCI time series have both linear and nonlinear dynamics over time. The
outperformance of the hybrid model over linear ARIMA and nonlinear ANN models in CCI
forecasting indicates that the fluctuations in CCIs consist of both linearities and nonlinearities.
Therefore, it is misleading to assume that the CCIs have a solely linear or nonlinear data structure.
The findings of this study suggest an additional caveat for using the hybrid model to forecast
heterogenous CCI movements across twenty different U.S. cities. The hybrid model outperforms
an individual ARIMA or ANN in CCI forecasting, especially when the CCI shows higher annual
growth rates and more frequent discrete changes. In other words, when the economic situations
such as the construction labor market are frequently changing, the predictability of the hybrid
model becomes even more accurate than that of an individual model.
38
Even though the findings of this research provide a practical recommendation to implement the
hybrid model in construction cost estimation, this research is subject to several limitations. First,
this study focused on univariate forecasting models using the historical observations of time series
data. In the cities where additional explanatory variables, such as leading indicators of construction
costs (e.g., Choi et al. 2021, Mahdavian et al. 2021), are available, it would be interesting to
investigate whether the conclusions of this study still hold. Second, among a wide variety of
nonlinear models, this research solely focused on ANNs. Although ANNs have been popularly
adopted for forecasting, it would be instructive to examine whether the use of alternative nonlinear
models such as recurrent neural networks (RNN) or long-short term memory (LSTM) can enhance
the performance of the hybrid model (Jang et al. 2020, Kim et al. 2021b). Third, the hybrid model
employed in this research assumes a specific relation and decomposition of linear and nonlinear
models. A related fruitful line of research would be to probe the sources of linear and nonlinear
dynamics based on the CCI subcomponent analysis.
Although statistical forecasting models including the hybrid model produced reasonably accurate
forecasts for construction costs in the current research, these models have not been examined for
forecasting construction costs in the extreme abnormal situations. In the literature, forecasting
models such as nonlinear ANNs have shown reasonably accurate forecasts after a short period into
the extreme abnormal events (Ahmad et al. 2021; Fahiman et al. 2019; Malefors et al. 2021).
However, for the future work, it is recommended that the robustness and feasibility of the proposed
statistical models for forecasting construction costs be examined during abnormal situations. Also,
it is of interest to examine if macroeconomic or construction market leading indicators can improve
the accuracy in forecasting construction costs.
39
DATA AVAILABILITY
Some or all data, models, or codes used during the study were provided by a third party. Direct
requests for these materials may be made to the provider as indicated in the Acknowledgements.
ACKNOWLEDGEMENT
This research uses monthly construction cost indices (CCIs) from January 1995 to December 2019,
which are published by Engineering News-Record (ENR) for 20 major cities in the United States.
REFERENCES
Abd Rahman, N. H., Lee, M. H., and Latif, M. T. (2015). Artificial neural networks and fuzzy time
series forecasting: an application to air quality. Quality and Quantity, 49(6), 2633-2647.
Adebiyi, A. A., Adewumi, A. O., and Ayo, C. K. (2014). Comparison of ARIMA and artificial
neural networks models for stock price prediction. Journal of Applied Mathematics, 2014.
Ahmad, M., Khan, Y. A., Jiang, C., Kazmi, S. J. H., & Abbas, S. Z. (2021). The impact of COVID‐
19 on unemployment rate: An intelligent based unemployment rate prediction in selected
countries of Europe. International Journal of Finance & Economics.
Ahmadi, M. H., Mohseni-Gharyehsafa, B., Farzaneh-Gord, M., Jilte, R. D., Kumar, R., and Chau,
K. W. (2019). Applicability of connectionist methods to predict dynamic viscosity of
silver/water nanofluid by using ANN-MLP, MARS and MPR algorithms. Engineering
Applications of Computational Fluid Mechanics, 13(1), 220-228.
Alaloul, W. S., Musarat, M. A., Liew, M. S., Qureshi, A. H., and Maqsoom, A. (2021).
Investigating the impact of inflation on labour wages in Construction Industry of
Malaysia. Ain Shams Engineering Journal, 12(2), 1575-1582.
Ashuri, B., and Lu, J. (2010). Time series analysis of ENR construction cost index. Journal of
Construction Engineering and Management, 136(11), 1227-1237.
40
Ashuri, B., Shahandashti, S. M., and Lu, J. (2012). “Empirical tests for identifying leading
indicators of ENR construction cost index.” Construction Management and Economics,
30(11), 917-927.
Assaad, R., and El-adaway, I. H. (2021). Impact of dynamic workforce and workplace variables
on the productivity of the construction industry: New gross construction productivity
indicator. Journal of Management in Engineering, 37(1), 04020092.
Babu, C. N., and Reddy, B. E. (2014). A moving-average filter based hybrid ARIMAANN for
forecasting time series data. Applied Soft Computing, 23, 27-38.
Bhattacharyya, A., Yoon, S., Weidner, T. J., and Hastak, M. (2021). Purdue Index for Construction
Analytics: Prediction and Forecasting Model Development. Journal of Management in
Engineering, 37(5), 04021052.
Bian, Z., Zhang, Z., Liu, X., and Qin, X. (2019). Unobserved component model for predicting
monthly traffic volume. Journal of Transportation Engineering, Part A: Systems, 145(12),
04019052.
Bobeica, E., Ciccarelli, M., and Vansteenkiste, I. (2019). The link between labor cost and price
inflation in the euro area.
Büyükşahin, Ü. Ç., and Ertekin, Ş. (2019). Improving forecasting accuracy of time series data
using a new hybrid ARIMA-ANN method and empirical mode decomposition.
Neurocomputing, 361, 151-163.
Cantarelli, C. C., Flyvbjerg, B. (2013). Mega-projects’ cost performance and lock-in: problems
and solutions. In International handbook on mega-projects. Edward Elgar Publishing.
Cao, M. T., Cheng, M. Y., and Wu, Y. W. (2015). Hybrid computational model for forecasting
Taiwan construction cost index. Journal of Construction Engineering and
Management, 141(4), 04014089.
Cao, Y., and Ashuri, B. (2020). Predicting the volatility of highway construction cost index using
long short-term memory. Journal of Management in Engineering, 36(4), 04020020.
41
Choi, C-Y., Ryu, K. R., and Shahandashti, M. (2021). Predicting City-Level Construction Cost
Index Using Linear Forecasting Models. Journal of Construction Engineering and
Management, 147(2), 04020158.
Ciaburro, G., and Venkateswaran, B. (2017). Neural Networks with R: Smart models using CNN,
RNN, deep learning, and artificial intelligence principles. Packt Publishing Ltd.
Ciulla, G., D’Amico, A., Di Dio, V., and Brano, V. L. (2019). Modelling and analysis of real-
world wind turbine power curves: Assessing deviations from nominal curve by neural
networks. Renewable energy, 140, 477-492.
Congressional Budget Office (CBO). (2018). Public Spending on Transportation and Water
Infrastructure, 1956 to 2017. Congressional Budget Office.
Cook, T. R., and Doh, T. (2019). Assessing Macroeconomic Tail Risks in a Data-Rich
Environment. Federal Reserve Bank of Kansas City Working Paper RWP, 19-12.
De, P., Sahu, D., Pandey, A., Gulati, B. K., Chandhiok, N., Shukla, A. K., Mohan, P., and Mitra,
R. G. (2016). Post millennium development goals prospect on child mortality in India: an
analysis using autoregressive integrated moving averages (ARIMA) model. Health, 8(15),
1845.
ENR. (2021). Historical Indices. Engineering News-Record. Retrieved from
https://www.enr.com/economics/historical_indices
Fahiman, F., Erfani, S. M., & Leckie, C. (2019, July). Robust and accurate short-term load
forecasting: A cluster oriented ensemble learning approach. In 2019 International Joint
Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.
Fan, R. Y., Ng, S. T., and Wong, J. M. (2010). Reliability of the BoxJenkins model for forecasting
construction demand covering times of economic austerity. Construction Management and
Economics, 28(3), 241-254.
Fard, A. K., and Akbari-Zadeh, M. R. (2014). A hybrid method based on wavelet, ANN and
ARIMA model for short-term load forecasting. Journal of Experimental and Theoretical
Artificial Intelligence, 26(2), 167-182.
42
Fattah, J., Ezzine, L., Aman, Z., El Moussami, H., and Lachhab, A. (2018). Forecasting of demand
using ARIMA model. International Journal of Engineering Business Management, 10,
1847979018808673.
Finkel, G. (2015). The economics of the construction industry. Routledge.
Giacomini, R., and White, H. (2006). Tests of conditional forecasting
ability. Econometrica, 74(6), 1545-1578.
Giraitis, L., Kapetanios, G., and Yates, T. (2018). Inference on multivariate heteroscedastic time
varying random coefficient models. Journal of Time Series Analysis, 39(2), 129-149.
Golafshani, E. M., Behnood, A., and Arashpour, M. (2020). Forecasting the compressive strength
of normal and High-Performance Concretes using ANN and ANFIS hybridized with Grey
Wolf Optimizer. Construction and Building Materials, 232, 117266.
Hall, M. (2019, February 6). Mounting Costs Push Construction Industry Harder Toward Tech
Solutions. Building on Our Heritage. Retrieved from
https://marxrealty.com/press/mounting-costs-push-construction-industry-harder-toward-
tech-solutions/
Han, M., Zhang, R., Qiu, T., Xu, M., and Ren, W. (2017). Multivariate chaotic time series
prediction based on improved grey relational analysis. IEEE Transactions on Systems,
Man, and Cybernetics: Systems, 49(10), 2144-2154.
Han, S., Ko, Y., Kim, J., and Hong, T. (2018). Housing market trend forecasts through statistical
comparisons based on big data analytic methods. Journal of Management in
Engineering, 34(2), 04017054.
Hwang, S. (2011). Time series models for forecasting construction costs using time series
indexes. Journal of Construction Engineering and Management, 137(9), 656-662.
Jang, Y., Jeong, I., and Cho, Y. K. (2020). Business failure prediction of construction contractors
using a LSTM RNN with accounting, construction market, and macroeconomic
variables. Journal of management in engineering, 36(2), 04019039.
43
Janzamin, M., Sedghi, H., and Anandkumar, A. (2015). Beating the perils of non-convexity:
Guaranteed training of neural networks using tensor methods. arXiv preprint
arXiv:1506.08473.
Kamruzzaman, M., Makino, Y., and Oshita, S. (2016). Parsimonious model development for real-
time monitoring of moisture in red meat using hyperspectral imaging. Food
chemistry, 196, 1084-1091.
Karaca, I., Gransberg, D. D., and Jeong, H. D. (2020). Improving the accuracy of early cost
estimates on transportation infrastructure projects. Journal of Management in
Engineering, 36(5), 04020063.
Karsoliya, S. (2012). Approximating number of hidden layer neurons in multiple hidden layer
BPNN architecture. International Journal of Engineering Trends and Technology, 3(6),
714-717.
Khashei, M., and Bijari, M. (2011). A novel hybridization of artificial neural networks and
ARIMA models for time series forecasting. Applied Soft Computing, 11(2), 2664-2675.
Kim, S., Abediniangerabi, B., and Shahandashti, M. (2020). Forecasting Pipeline Construction
Costs Using Time Series Methods. In Pipelines 2020 (pp. 198-209). Reston, VA:
American Society of Civil Engineers.
Kim, S., Abediniangerabi, B., and Shahandashti, M. (2021a). Pipeline Construction Cost
Forecasting Using Multivariate Time Series Methods. Journal of Pipeline Systems
Engineering and Practice, 12(3), 04021026.
Kim, S., Abediniangerabi, B., and Shahandashti, M. (2021b). Forecasting Pipeline Construction
Costs Using Recurrent Neural Networks. In Pipelines 2021 (pp. 325-335).
Kim, Y., Son, H. G., and Kim, S. (2019). Short term electricity load forecasting for institutional
buildings. Energy Reports, 5, 1270-1280.
Lam, K. C., and Oshodi, O. S. (2016a). Using univariate models for construction output
forecasting: Comparing artificial intelligence and econometric techniques. Journal of
Management in Engineering, 32(6), 04016021.
44
Lam, K. C., and Oshodi, O. S. (2016b). Forecasting construction output: a comparison of artificial
neural network and Box-Jenkins model. Engineering, Construction and Architectural
Management.
Liu, H., Tian, H. Q., and Li, Y. F. (2012). Comparison of two new ARIMA-ANN and ARIMA-
Kalman hybrid methods for wind speed prediction. Applied Energy, 98, 415-424.
Mahdavian, A., Shojaei, A., Salem, M., Yuan, J. S., and Oloufa, A. A. (2021). Data-Driven
Predictive Modelling of Highway Construction Cost Items. Journal of Construction
Engineering and Management, 147(3), 04020180.
Malefors, C., Secondi, L., Marchetti, S., & Eriksson, M. (2021). Food waste reduction and
economic savings in times of crisis: The potential of machine learning methods to plan
guest attendance in Swedish public catering during the Covid-19 pandemic. Socio-
Economic Planning Sciences, 101041.
May, R., Dandy, G., and Maier, H. (2011). Review of input variable selection methods for artificial
neural networks. Artificial neural networks-methodological advances and biomedical
applications, 10, 16004.
McNichol, E. (2016). It’s time for states to invest in infrastructure. Center on Budget and Policy
Priorities, February. Available: www. cbpp. org/research/state-budget-and-tax/its-time-
for-states-to-invest-ininfrastructure [Accessed 12 April 2017].
Mir, M., Kabir, H. D., Nasirzadeh, F., and Khosravi, A. (2021). Neural network-based interval
forecasting of construction material prices. Journal of Building Engineering, 39, 102288.
Moon, S., Chi, S., and Kim, D. Y. (2018). Predicting construction cost index using the
autoregressive fractionally integrated moving average model. Journal of Management in
Engineering, 34(2), 04017063.
Mukherjee, S., Nateghi, R., and Hastak, M. (2018). A multi-hazard approach to assess severe
weather-induced major power outage risks in the us. Reliability Engineering and System
Safety, 175, 283-305.
45
Oshodi, O. S., Ejohwomu, O. A., Famakin, I. O., and Cortez, P. (2017). Comparing univariate
techniques for tender price index forecasting: Box-Jenkins and neural network
model. Construction Economics and Building, 17(3), 109-123.
Phillips, B. A. (1982). The impact of inflation upon US highway maintenance and construction
costs. Transportation Research Part A: General, 16(1), 1-11.
Rathnayaka, R. K. T., Seneviratna, D. M. K. N., Jianguo, W., and Arumawadu, H. I. (2015,
October). A hybrid statistical approach for stock market forecasting based on artificial
neural network and ARIMA time series models. In 2015 International Conference on
Behavioral, Economic and Socio-cultural Computing (BESC) (pp. 54-60). IEEE.
Rudžianskaitė–Kvaraciejienė, R., Apanavičienė, R., and Gelžinis, A. (2015). Modelling the
effectiveness of PPP road infrastructure projects by applying random forests. Journal of
Civil Engineering and Management, 21(3), 290-299.
Sekhar, C. R., and Madhu, E. (2016). Mode choice analysis using random forrest decision
trees. Transportation Research Procedia, 17, 644-652.
Shahandashti, S. M. (2014). Analysis of construction cost variations using macroeconomic, energy
and construction market variables (Doctoral dissertation, Georgia Institute of
Technology).
Shahandashti, S. M., and Ashuri, B. (2013). Forecasting engineering news-record construction cost
index using multivariate time series models. Journal of Construction Engineering and
Management, 139(9), 1237-1243.
Shahandashti, S. M., and Ashuri, B. (2016). Highway construction cost forecasting using vector
error correction models. Journal of management in engineering, 32(2), 04015040.
Shiha, A., Dorra, E. M., and Nassar, K. (2020). Neural Networks Model for Forecasting of
Construction Material Prices in Egypt Using Macroeconomic Indicators. Journal of
Construction Engineering and Management, 146(3), 04020010.
46
Shrestha, P. P., Burns, L. A., and Shields, D. R. (2013). Magnitude of construction cost and
schedule overruns in public work projects. Journal of Construction Engineering, 2013(1),
1-9.
Steyerberg, E. W. (2019). Assumptions in regression models: additivity and linearity. In Clinical
Prediction Models (pp. 227-245). Springer, Cham.
Tealab, A., Hefny, H., and Badr, A. (2017). Forecasting of nonlinear time series using
ANN. Future Computing and Informatics Journal, 2(1), 39-47.
Tijanić, K., Car-Pušić, D., and Šperac, M. (2019). Cost estimation in road construction using
artificial neural network. Neural Computing and Applications, 1-13.
Valipour, M., Banihabib, M. E., and Behbahani, S. M. R. (2013). Comparison of the ARIMA,
ARIMA, and the autoregressive artificial neural network models in forecasting the monthly
inflow of Dez dam reservoir. Journal of hydrology, 476, 433-441.
Vitorino, D., Coelho, S. T., Santos, P., Sheets, S., Jurkovac, B., and Amado, C. (2014). A random
forest algorithm applied to condition-based wastewater deterioration modeling and
forecasting. Procedia Engineering, 89, 401-410.
Wang, J., and Ashuri, B. (2017). Predicting ENR construction cost index using machine-learning
algorithms. International Journal of Construction Education and Research, 13(1), 47-63.
Wang, L., Zou, H., Su, J., Li, L., and Chaudhry, S. (2013). An ARIMA‐ANN hybrid model for
time series forecasting. Systems Research and Behavioral Science, 30(3), 244-259.
Xu, J. W., and Moon, S. (2013). Stochastic forecast of construction cost index using a cointegrated
vector autoregression model. Journal of Management in Engineering, 29(1), 10-18.
Yip, H. L., Fan, H., and Chiang, Y. H. (2014). Forecasting the maintenance cost of construction
equipment: Comparison between general regression neural network and BoxJenkins time
series models. Automation in Construction, 38, 30-38.
Yoon, J. (2021). Forecasting of real GDP growth using machine learning models: Gradient
boosting and random forest approach. Computational Economics, 57(1), 247-265.
47
Yu, X., Ye, C., and Xiang, L. (2016). Application of artificial neural network in the diagnostic
system of osteoporosis. Neurocomputing, 214, 376-381.
Zevin, A. (2020). Lumber Prices Drop in 2019. ENR: Engineering News-Record, 284(6), 36.
Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network
model. Neurocomputing, 50, 159-175.
Zhang, X., Xue, T., and Stanley, H. E. (2018). Comparison of econometric models and artificial
neural networks algorithms for the forecasting of baltic dry index. IEEE Access, 7, 1647-
1657.
Zhao, L., Mbachu, J., and Zhang, H. (2019). Forecasting residential building costs in New Zealand
using a univariate approach. International Journal of Engineering Business
Management, 11, 1847979019880061.
... Hence, past lagged values of each of the target variables can contain useful information that helps forecast future values of that target variable. In their attempts to forecast the PPIs of several construction materials or composite cost indexes, researchers often use lagged values of the target variable (Ashuri and Lu 2010;Ilbeigi et al. 2016;Joukar and Nahmens 2015;Kim et al. 2021b) or lagged values of national macroeconomic indicators (Shahandashti and Ashuri 2013;Xu and Moon 2011;Shahandashti and Ashuri 2015;Baek and Ashuri 2019;Swei 2019;Choi et al. 2020;Kim et al. 2021a) as predictors of the target variable. Forecasting models that utilize one input, past lagged values of one variable to forecast future values of that variable, are referred to as univariate models (Ashuri and Lu 2010;Ilbeigi et al. 2017). ...
... Time series methods have gained increased popularity among researchers due to their capacity to produce accurate out-of-sample forecasts (Ashuri and Lu 2010;Abediniangerabi et al. 2017;El-adaway et al. 2019;Kim et al. 2021b), and their ability to capture the internal structures of data sets (Joukar and Nahmens 2015). Although many researchers documented the capabilities of time series methods when compared to traditional econometric methods, the strengths, and weaknesses of univariate or multivariate time series methods are discussed without a clear consensus on the better approach (Kim et al. 2021a). ...
... Further, despite their capabilities to produce accurate outof-sample forecasts, multivariate time series models are susceptible to eventual underperformance if the temporal associations between the leading indicators and the target variable are not thoroughly studied (Swei 2019;Choi et al. 2020). Multivariate timeseries approaches that focus on forecasting construction material prices rely on either vector autoregression (VAR) to produce short-term forecasts (Hwang 2011;Ashuri et al. 2012;Xu and Moon 2011;Kim et al. 2021b), or its restricted form vector error correction (VEC) to produce short-and long-term forecasts (Wong and Ng 2010;Ashuri 2013, 2015;Jiang et al. 2013;Faghih and Kashani 2018;Choi et al. 2020;Kim et al. 2021b). The incorporation of leading indicators, mainly macroeconomic indicators, as additional inputs in VAR and VEC models, is typically validated by comparing the multivariate forecasting models against univariate ones. ...
Article
Supply chain instabilities and inflated material prices have had a disruptive impact on cost estimating of construction projects. While several research efforts used national macroeconomic indicators to forecast the prices of domestically produced construction materials, none of the existing studies investigated whether the lagged macroeconomic indicators of the main trading partners could enhance the predictability of the prices of cement, steel, and lumber in the US construction sector. This paper fills this knowledge gap. The authors adopted a multi-step methodology that included: (1) collecting data on the target variables and the candidate leading indicators; (2) identifying the structural breaks in the collected data sets; (3) conducting causality tests to identify short-term associations and cointegration tests to examine long-term relationships; (4) developing vector error correction (VEC) models to forecast the prices in the short and long terms; and (5) evaluating the performance of the proposed models against existing forecasting models in the literature. Results of the Granger test and Johansen test indicate that Canada’s overall producer price index (PPI) is a consistent leading indicator of the prices of cement, and Mexico’s overall PPI is a consistent leading indicator of the prices of steel. Findings indicate no statistical evidence to suggest that neither Canada’s PPI nor Mexico’s PPI can be leading indicators of lumber prices. Over an 18-month ahead of sample horizon, the presented VEC models of cement and steel prices outperformed existing models, particularly beyond the 1-year-ahead forecasts. Utilization of the proposed forecasting models can significantly enhance the accuracy of cost estimates and feasibility studies of construction projects. This provides proactive financial planning for construction contractors and project owners through improved short- and long-term forecasting of the prices of main construction materials.
... A Support Vector Machine (SVM), a typical nonlinear modeling tool, has a highly complex computational function and its computational performance in solving multiclassification problems is minimal [5]. The Artificial Neural Network (ANN), the most commonly used nonlinear modeling tool, possesses excellent nonlinear modeling ability but has several disadvantages related to learning, local minimum and slow convergence speed [6]. ...
... Kim et al. [6] proposed a prediction model of construction cost, based on the Regression Comprehensive Moving Average Model (RCMAM) and ANN. The complexity and prediction workload of this hybrid model were larger than those of RCMAM or ANN. ...
Article
Full-text available
Predicting construction costs often involves disadvantages, such as low prediction accuracy, poor promotion value and unfavorable efficiency, owing to the complex composition of construction projects, a large number of personnel, long working periods and high levels of uncertainty. To address these concerns, a prediction index system and a prediction model were developed. First, the factors influencing construction cost were first identified, a prediction index system including 14 secondary indexes was constructed and the methods of obtaining data were presented elaborately. A prediction model based on the Random Forest (RF) algorithm was then constructed. Bird Swarm Algorithm (BSA) was used to optimize RF parameters and thereby avoid the effect of the random selection of RF parameters on prediction accuracy. Finally, the engineering data of a construction company in Xinyu, China were selected as a case study. The case study showed that the maximum relative error of the proposed model was only 1.24%, which met the requirements of engineering practice. For the selected cases, the minimum prediction index system that met the requirement of prediction accuracy included 11 secondary indexes. Compared with classical metaheuristic optimization algorithms (Particle Swarm Optimization, Genetic Algorithms, Tabu Search, Simulated Annealing, Ant Colony Optimization, Differential Evolution and Artificial Fish School), BSA could more quickly determine the optimal combination of calculation parameters, on average. Compared with the classical and latest forecasting methods (Back Propagation Neural Network, Support Vector Machines, Stacked Auto-Encoders and Extreme Learning Machine), the proposed model exhibited higher forecasting accuracy and efficiency. The prediction model proposed in this study could better support the prediction of construction cost, and the prediction results provided a basis for optimizing the cost management of construction projects.
... Kim et al. [14] in his study combines the ARIMA [18] and ANN [27] models to predict the budget required for construction projects of twenty major cities in the United States, published by Engineering News-Record (ENR). In this study, three types of prediction tests are carried out, two with the ARIMA and ANN models separately and one with the hybrid ARIMA-ANN model. ...
Article
Full-text available
Understanding the projected costs of projects within various sectors of a country is crucial for resource allocation and timely delivery. In Colombia, comprehensive government project data is accessible through the National Government’s open data platform. Utilizing these datasets from the National Planning Department, we construct a predictive model leveraging regression analysis to estimate the expenses associated with governmental initiatives. This work evaluates several regression models, using diverse evaluation error metrics, to determine the most effective model for deployment. A key component of our approach is to combine textual attributes into a single variable, and subsequently apply text mining techniques, in order to obtain insights from free text fields in the data sets. Ultimately, the Adaboost model combined with TF-IDF emerged as the most precise combination of models, exhibiting a mean average precision error (MAPE) of 17.6%, closely followed by the Random Forest model combined with TF-IDF with a MAPE of 17.9%.
... In previous studies, linear models such as the autoregressive moving average (ARIMA) [1] model have been utilised due to their simple structure and robustness to data size and noise levels. However, the ARIMA model is unsuitable for capturing the nonlinearities of time series in engineering costs [2], which negatively affects prediction accuracy. To solve this problem, the support vector machine (SVM) [3][4][5], backpropagation (BP) neural network [6][7][8], and other machine learning models have been applied to cost prediction. ...
Article
Full-text available
In construction project management, accurate cost forecasting is critical for ensuring informed decision making. In this article, a construction cost prediction method based on an improved bidirectional long- and short-term memory (BiLSTM) network is proposed to address the high interactivity among construction cost data and difficulty in feature extraction. Firstly, the correlation between cost-influencing factors and the unilateral cost is calculated via grey correlation analysis to select the characteristic index. Secondly, a BiLSTM network is used to capture the temporal interactions in the cost data at a deep level, and the hybrid attention mechanism is incorporated to enhance the model’s feature extraction capability to comprehensively capture the interactions among the features in the cost data. Finally, a hyperparameter optimisation method based on the improved particle swarm optimisation algorithm is proposed using the prediction accuracy as the fitness function of the algorithm. The MAE, RMSE, MPE, MAPE, and coefficient of determination of the simulated prediction results of the proposed method on the dataset are 7.487, 8.936, 0.236, 0.393, and 0.996%, respectively, where MPE is a positive coefficient. This avoids the serious consequences of underestimating the cost. Compared with the unimproved BiLSTM, the MAE, RMSE, and MAPE are reduced by 15.271, 18.193, and 0.784%, respectively, which reflects the superiority and effectiveness of the method and can provide technical support for project cost estimation in the construction field.
... ENR has been used by Kim et al. [33] to develop time series models for the average pipe and labor costs of 20 cities from 1995 to 2016. Further, Kim et al. [34] used ENR data to improve accuracy in predicting city-level CCIs through a combination of ANNs and ARIMA. Similarly, Shrestha [35] used ENR to adjust the operational costs of highway rest area facilities. ...
Article
Full-text available
This study presents a novel approach for forecasting the construction cost index (CCI) of building materials in developing countries. Such estimations are challenging due to the need for a longer time, the influence of inflation, and fluctuating project prices in developing countries. This study used three techniques—a modified Artificial Neural Network (ANN), time series, and linear regression—to predict and forecast the local building material CCI in Pakistan. The predicted CCI is based on materials, including bricks, steel, cement, sand, and gravel. In addition, the swish activation function was introduced to increase the accuracy of the associated algorithms. The results suggest that the ANN model has superior prediction results, with the lowest Mean Error (ME), Mean Absolute Error (MAE), and Theil’s U statistic (U-Stat) values of 0.04, 28.3, and 0.62, respectively. The time series and regression models have ME values of 0.22 and 0.3, MAE values of 30.07 and 28.3, and U-Stat values of 0.65 and 0.64, respectively. The proposed models can assist contractors, project managers, and owners through an accurately estimated cost index. Such accurate CCIs help correctly estimate project budgets based on building material prices to mitigate project risks, delays, and failures.
... Hao et al. (2018) reviewed some of the commonly used drought forecasting methods, including data-driven models (Xu et al., 2020), regression models (Sun et al., 2012), conditional probability models (Hao et al., 2016), machine learning algorithms (Xu et al., 2022), and others, each of which has advantages and disadvantages. For example, among the data-driven models, the autoregressive integrated moving average (ARIMA) model has better performance in long-term drought forecasting (Belayneh et al., 2014), but it has poor performance in capturing nonlinear features (Kim et al., 2022;Wang, 2022). Machine learning is now an important part of data-driven research in the earth sciences (Ham et al., 2019;Reichstein et al., 2019) involving the atmosphere (Retsch et al., 2022;Shen et al., 2018), land surface Zhang et al., 2022b;Zheng et al., 2022), and oceans (Asgarimehr et al., 2022;Chen et al., 2019), and it has evolved rapidly over the past decade. ...
Article
Full-text available
Machine learning, a key thruster of Construction 4.0, has seen exponential publication growth in the last ten years. Many studies have identified ML as the future, but few have critically examined the applications and limitations of various algorithms in construction management. Therefore, this article comprehensively reviewed the top 100 articles from 2018 to 2023 about ML algorithms applied in construction risk management, provided their strengths and limitations, and identified areas for improvement. The study found that integrating various data sources, including historical project data, environmental factors, and stakeholder information, has become a common trend in construction risk. However, the challenges associated with the need for extensive and high-quality datasets, models’ interpretability, and construction projects’ dynamic nature pose significant barriers. The recommendations presented in this paper can facilitate interdisciplinary collaboration between traditional construction and machine learning, thereby enhancing the development of specialized algorithms for real-world projects.
Chapter
Due to the depletion of fossil fuel resources and environmental concerns caused by traditional fuel systems in recent years, the share of renewable energy sources in current energy production has been increasing. Among these energy sources, wind and solar energy stand out compared to other sources. Wind energy is a clean, sustainable and low-cost energy source. Wind and solar energies vary considerably according to the stochastic environment of meteorological conditions. Solar and wind energy variability and uncontrollability lead to power quality, generation-consumption balance and reliability problems of solar and wind energy systems. For this reason, it is important to know and predict the wind speed and solar radiation characteristics of the regions where the systems are installed. In this study, meteorological data of Antalya Serik Region were analyzed using statistical methods and wavelet transform. Thus, the potentials of wind and solar energies in the study area and large and small-scale events affecting these potentials were determined. In addition, a short-term estimation study was made for wind intensity and solar radiation using the time series of meteorological data. Besides SARMA, SARMAX and NAR models, Wavelet-NARX, SARMAX-NAR and NAR-SARMAX hybrid models are employed. Hybrid models are successfully produced better results than component forecasts.KeywordsANNWind speed forecastingHybrid forecastingNARNARXSARMASARMAXWavelet
Article
Full-text available
Food waste is a significant problem within public catering establishments in any normal situation. During spring 2020 the Covid-19 pandemic placed the public catering system under greater pressure, revealing weaknesses within the system and generation of food waste due to rapidly changing consumption patterns. In times of crisis, it is especially important to conserve resources and allocate existing resources to areas where they can be of most use, but this poses significant challenges. This study evaluated the potential of a forecasting model to predict guest attendance during the start and throughout the pandemic. This was done by collecting data on guest attendance in Swedish school and preschool catering establishments before and during the pandemic, and using a machine learning approach to predict future guest attendance based on historical data. Comparison of various learning methods revealed that random forest produced more accurate forecasts than a simple artificial neural network, with conditional mean absolute prediction error of <0.15 for the trained dataset. Economic savings were obtained by forecasting compared with a no-plan scenario, supporting selection of the random forest approach for effective forecasting of meal planning. Overall, the results obtained using forecasting models for meal planning in times of crisis confirmed their usefulness. Continuous use can improve estimates for the test period, due to the agile and flexible nature of these models. This is particularly important when guest attendance is unpredictable, so that production planning can be optimized to reduce food waste and contribute to a more sustainable and resilient food system.
Article
Full-text available
Labours in construction are one of the main pillars in the construction industry of Malaysia for projects execution. Construction labours not only contributes to the development of the construction industry but also impacts the Malaysian economy. Consideration of labour wages is made in the initial phase of the project budget, however, wages are getting changed over time. The inflation rate is one of the key factors which affect labours wages. Regrettably, the inflation rate is being ignored while computing labour wages for projects budget development, resulting in cost overrun of construction projects. In this regard, the correlation coefficient test was used to determine the impact of the inflation rate on labour wages gathered from the year 2013 to 2019. The results showed that a significant acceptable relationship exists among the inflation rate and several categories of labour wages. Most of the labour wages showed a negative relationship with the inflation rate, indicating the deviation in the wages, thus, result in cost overrun. To steer the cost overrun effect, it is recommended to adopt automation system and introduce the Industrial Revolution (IR) 4.0 in construction projects as a replacer of labours.
Article
Full-text available
Unemployment remains a major cause for both developed and developing nations, due to which they lose their financial and economic impact as a whole. Unemployment rate prediction achieved researcher attention from a fast few years. The intention of doing our research is to examine the impact of the coronavirus on the unemployment rate. Accurately predicting the unemployment rate is a stimulating job for policymakers, which plays an imperative role in a country's financial and financial development planning. Classical time series models such as ARIMA models and advanced non‐linear time series methods be previously hired for unemployment rate prediction. It is known to us that mostly these data sets are non‐linear as well as non‐stationary. Consequently, a random error can be produced by a distinct time series prediction model. Our research considers hybrid prediction approaches supported by linear and non‐linear models to preserve forecast the unemployment rates much precisely. These hybrid approaches of the unemployment rate can advance their estimates by reproducing the unemployment ratio irregularity. These models' appliance is exposed to six unemployment rate statistics sets from Europe's selected countries, specifically France, Spain, Belgium, Turkey, Italy and Germany. Among these hybrid models, the hybrid ARIMA‐ARNN forecasting model performed well for France, Belgium, Turkey and Germany, whereas hybrid ARIMA‐SVM performed outclass for Spain and Italy. Furthermore, these models are used for the best future prediction. Results show that the unemployment rate will be higher in the coming years, which is the consequence of the coronavirus, and it will take at least 5 years to overcome the impact of COVID‐19 in these countries.
Article
Full-text available
Pipe material and labor costs constitute about seventy percent of pipeline construction costs. Pipe and labor costs are subject to considerable fluctuations over time. These fluctuations are problematic for cost estimation and bid preparation in pipeline projects, which are mostly large and long-term projects. The accurate prediction of pipe and labor costs is invaluable for cost estimators to prepare accurate bids and manage the cost contingencies. However, existing literature does not take advantage of the leading indicators of pipeline construction cost time series to accurately forecast cost fluctuations in pipeline projects. The objective of this research is to identify the leading indicators of pipeline construction costs and develop multivariate time series models for forecasting cost fluctuations in pipeline projects. Nineteen potential leading indicators of pipe and labor costs were initially selected based on a comprehensive review of construction cost forecasting literature. The leading indicators were identified from this pool of potential leading indicators based on unit root tests and Granger causality tests. Multivariate time series models were developed based on the results of cointegration tests. Vector Error Correction (VEC) models were developed for the cointegrated variables, while Vector Autoregressive (VAR) models were developed for the non-cointegrated variables. Since multivariate time series models include information from the identified leading indicators, multivariate time series models are often expected to deliver more accurate forecasts than univariate time series models. The forecasting accuracies of multivariate time series models were compared with those of univariate time series models based on three common error measures: mean absolute prediction error (MAPE), root mean squared error (RMSE), and mean average error (MAE). The results show that multivariate time series models outperform univariate models for forecasting cost fluctuations in pipeline projects. The findings of this research contribute to the state of knowledge by identifying leading indicators of pipe and labor costs and developing multivariate time series models to forecast them. The multivariate time series models with leading indicators are more accurate than univariate models for forecasting cost fluctuations in pipeline projects. It is expected that the proposed multivariate time series forecasting models contribute to the enhancement of the theory and practice of pipeline construction cost forecasting and help cost engineers and investment planners to prepare more accurate bids, cost estimates, and budgets for pipeline projects.
Article
Purdue Index for Construction (Pi-C) was developed to gauge the health of the construction industry. It is a composite index consisting of five dimensions: economic, stability, social, development, and quality. This research conducts a data-driven analysis to provide prediction and time-series forecasting models for Pi-C to (1) monitor and (2) provide guidance on how to improve the future health trajectory for the U.S. construction. Seasonal Autoregressive Integrated Moving Average (SARIMA) technique is applied for future trend analysis; Multiple Linear Regression (MLR) and Random Forests (RF) are applied for prediction models of Pi-C data analytics. It is expected that the proposed prediction and time-series forecasting models will help decision makers, including policy developers, and construction practitioners to take necessary action in a timely manner as well as open the discourse on advanced application of analytics and data-driven decision making in the construction industry.
Article
Accurate prediction of material costs is essential for proper management and budgeting of construction projects. Material price fluctuation is one of the most important contributors to deviations from the initial estimated cost in construction projects. Traditional machine learning techniques often fail to generate accurate predictive estimates due to high uncertainties associated with material prices. To address this issue, this research proposes an artificial neural network (ANN)-based method to quantify uncertainties through the generation of forecasting intervals. The optimal lower upper bound estimation (optimal LUBE) method is adopted to train ANN to generate intervals directly. The proposed method is used to predict construction material prices in the US for asphalt and steel. It is shown that traditional regression analysis and ANN-based single-point estimates are of limited value for the prediction of material price. In contrast, prediction intervals provide reliable estimates for material prices and they reduce the possibility of project failure due to the inaccuracy of initial estimated costs. The results obtained from three other cost functions are compared to the proposed optimal LUBE cost function to testify the accuracy of the model. The achieved results show that the proposed optimal LUBE cost function presents the most accurate prediction intervals. This study employs a stacking procedure for monitoring and controlling the training process and validation purpose. The proposed interval forecasting method presents a new direction for cost prediction studies and will provide project managers with extra information to manage risks associated with project costs.
Article
The highway network is an economically necessary form of transportation that has a significant impact on the quality of the life of the citizens who use it. Cost overruns in highway projects have been a universal occurrence that jeopardize the development, maintenance, and expansion of this vital infrastructure. Incorrect cost estimations can drive decision makers to pass ineffective policies that have played a large role in the cost overruns of transportation construction projects. The existing prediction models in the literature are limited in one or multiple areas of modeling approach, inputs, and model development robustness. In this research, a model was developed to accurately predict the total construction cost of highway projects by utilizing machine learning algorithms. This study developed a modeling pipeline to automate much of the cost forecasting process, reducing the amount of manual work and dependence on skilled data scientists. This study used the Florida Department of Transportation's (FDOT's) critical highway construction cost items between 2001 and 2017 to test the model. The highways of Florida were selected for testing due to the states' population growth, high immigrant population, logistics, and hurricane frequency. This study used a pool of five categories of independent variables (69 variables total), including the construction market, energy market, socioeconomics, US economy, and temporal variables, which were compiled from relevant sources and existing literature. The results revealed that our linear model exhibits superiority in generalization and prediction of cost items over nonlinear models and is capable of accurately forecasting highway construction costs. Our suggested approach in this study also provides more accurate forecasts for the detailed cost estimation by considering the monthly historical information for the average 92.6% of the six highway construction types mentioned with a 92.51% prediction accuracy. By employing our developed model, local governments, network operators, contractors, and logistics sectors would be capable of a more exact prediction of highway construction costs.