ArticlePDF Available

Improving Accuracy in Predicting City-level Construction Cost Indices by Combining Linear ARIMA and Nonlinear ANNS

November 2021
Journal of Management in Engineering

November 2021

DOI:10.1061/(ASCE)ME.1943-5479.0001008

Authors:

Sooin Kim

University of Texas at Arlington

C.Y. Choi

University of Texas at Arlington

Accurate cost forecasting in budget planning and contract bidding is crucial for the success of construction projects. Linear models such as the autoregressive integrated moving average (ARIMA) and nonlinear models such as the artificial neural network (ANN) have been adopted in the literature for forecasting construction costs. However, both linear and nonlinear models are subject to some limitations derived from their modeling structure and assumptions. This study proposes a hybrid ARIMA-ANN model for forecasting construction costs and explores whether the hybrid ARIMA-ANN model can provide more accurate forecasts than an individual ARIMA or ANN. The national and city-level construction cost indices (CCIs) are forecasted for three forecasting horizons (short-term, mid-term, and long-term) using three forecasting models: (1) linear autoregressive integrated moving average (ARIMA), (2) nonlinear artificial neural networks (ANNs), and (3) the hybrid ARIMA-ANN model. Out-of-sample forecasting exercise reveals that the hybrid model combining the distinctive features of both ARIMA and ANNs performs better than individual models in most forecasting cases, especially for longer-term forecasting horizons. The findings can help project planners, cost engineers, and decision-makers prepare for more accurate budgets and bids for diverse construction projects in different locations.

Content uploaded by Sooin Kim

Content may be subject to copyright.

IMPROVING ACCURACY IN PREDICTING CITY-LEVEL

CONSTRUCTION COST INDICES BY COMBINING LINEAR ARIMA

AND NONLINEAR ANNS

Sooin Kim, S.M.ASCE1, Chi-Young Choi2, Mohsen Shahandashti, M.ASCE3, and

Kyeong Rok Ryu, A.M.ASCE4

1Graduate Research Assistant, Department of Civil Engineering, The University of Texas at Arlington, 416

S. Yates St., Arlington, TX 76010. E-mail: sooin.kim@uta.edu

2Professor, Department of Economics, The University of Texas at Arlington, 701 S. West St., Arlington,

TX 76019. E-mail: cychoi@uta.edu

3Associate Professor, Department of Civil Engineering, The University of Texas at Arlington, 416 S. Yates

St., Arlington, TX 76010. E-mail: mohsen@uta.edu

4Assistant Professor, Department of Civil Engineering, The University of Texas at Arlington, 416 S. Yates

St., Arlington, TX 76010. E-mail: kyeongrok.ryu@uta.edu

ABSTRACT

Accurate cost forecasting in budget planning and contract bidding is crucial for the success of

construction projects. Linear models such as the autoregressive integrated moving average

(ARIMA) and nonlinear models such as the artificial neural network (ANN) have been adopted in

the literature for forecasting construction costs. However, both linear and nonlinear models are

subject to some limitations derived from their modeling structure and assumptions. This study

proposes a hybrid ARIMA-ANN model for forecasting construction costs and explores whether

the hybrid ARIMA-ANN model can provide more accurate forecasts than an individual ARIMA

or ANN. The national and city-level construction cost indices (CCIs) are forecasted for three

forecasting horizons (short-term, mid-term, and long-term) using three forecasting models: (1)

linear autoregressive integrated moving average (ARIMA), (2) nonlinear artificial neural networks

(ANNs), and (3) the hybrid ARIMA-ANN model. Out-of-sample forecasting exercise reveals that

the hybrid model combining the distinctive features of both ARIMA and ANNs performs better

than individual models in most forecasting cases, especially for longer-term forecasting horizons.

The findings can help project planners, cost engineers, and decision-makers prepare for more

accurate budgets and bids for diverse construction projects in different locations.

Keywords: Hybrid model, Out-of-sample forecasting, City-level ENR Construction cost

index (CCI), U.S. cities, ARIMA model, ANN.

INTRODUCTION

Cost estimation is an essential part of budgeting and bidding at the beginning stage of construction

projects. Inaccurate cost estimation leads to not only direct losses to stakeholders but also socially

undesirable consequences (Choi et al. 2021, Kim et al. 2020). Cost overestimation can cause a

bidding failure to a bidder by offering an uncompetitive budget compared to competing bidders.

Moreover, if the cost is overestimated in a large public infrastructure project, then the government

as a project owner needs to allocate or finance more budget for the project. This ultimately leads

to the opportunity loss for other potential projects as well as increases the implicit or explicit

government financial burden. According to the Congressional Budget Office (CBO), the combined

federal, state, and local spending on infrastructure projects was $441 billion as of 2017, about 2.3

percent of the U.S. GDP (CBO 2018). State and local governments also spend approximately 85

percent of their capital budget on key public infrastructures (McNichol 2016). Cost

underestimation is equally problematic as it causes project cost overruns, which in turn increases

the risks of project delay, abandonment, financial losses, and even insolvency of the contractors

(Cantarelli and Flyvbjerg 2013). As such, accurate cost estimation is crucial not only for the

success of a construction project but also for the efficient allocation of limited resources (Kim et

al. 2021a). Nevertheless, accurate cost estimation and forecasting is a challenging task due to large

fluctuations of construction costs over time, often measured by the Construction Cost Indices

(CCIs).

This has sparked many researchers to seek approaches to improving the accuracy of forecasting

CCIs in two different directions. On the one hand, a great deal of effort has been directed at the

development of better-performing forecasting models. On the other hand, a more recent study by

Choi et al. (2021) shows that the use of city-level CCIs can enhance the forecasting accuracy of

local construction costs that vary widely across project locations. The current study sits at the

intersection of these two strands of the literature. To be specific, the present study aims to show

how to improve the forecasting accuracy of construction costs in different locations by creating a

hybrid forecasting model that is superior to the individual models.

LITERATURE REVIEW

Most construction industry variables including construction costs are subject to volatile

fluctuations, which can increase contingency costs and lead to failure in construction project

process including bidding, cost estimating, and investing. Thus, various forecasting models have

been implemented to acquire more accurate forecasts of construction industry variables.

Multivariate forecasting models use macroeconomic and social leading indicators as independent

variables for forecasting the construction industry variables. Bhattacharyya et al. (2021) included

macroeconomic and social variables such as construction spending, construction backlog

indicator, and employment in multiple linear regression and random forest models for forecasting

Purdue Index for Construction (Pi-C). Assaad and El-adaway (2021) investigated the impacts of

the dynamic workforce and workplace variables such as total construction employment and

average weekly hours on construction productivity. Shahandashti and Ashuri (2016) identified

macroeconomic leading indicators such as crude oil price for forecasting the national highway

construction cost index (NHCCI). Xu and Moon (2013) forecasted the CCI using a cointegrated

vector autoregressive (VAR) model based on the interactive relationship between the CCI and

consumer price index (CPI).

Although it is useful to identify macroeconomic and social indicators and investigate their impacts

on construction industry variables for forecasting, parsimonious univariate models have several

advantages, especially when statistical data on leading indicators are limited or unavailable (Choi

et al. 2021, Lam and Oshodi 2016a, Hwang 2011). For example, Han et al. (2018) compared

forecasting accuracies between multiple linear regression and univariate ARIMA models and

concluded that the univariate ARIMA provided more accurate forecasts for the near future trends

of the home sales index (HSI).

Multivariate models could provide more accurate forecasts by considering the relationships

between CCI and macroeconomic indicators to some extent. However, multivariate models require

several parameters to be estimated, which can consequently lead to greater specification and

forecasting errors than the errors of parsimonious univariate models (Cook and Doh 2019;

Steyerberg 2018; Giraitis et al. 2018; Han et al. 2017). Parsimonious univariate models can avoid

overfitting problems and outperform an overparameterized model (Cook and Doh 2019).

Parsimonious univariate models are also preferable due to their comparable simplicity for

convenient industrial applications (Nobis et al. 2019; Kamruzzaman et al. 2016).

To this end, the scope of this research primarily focuses on implementing univariate time series

forecasting models for CCIs. In future research, it is recommended to investigate the relationships

between the CCIs and macroeconomic indicators and examine whether multivariate models can

improve forecasting accuracy over univariate hybrid models. Moreover, the hybrid model can be

updated to include multivariate time series models in future research. Overfitting problems and

model specification errors should be fully and cautiously investigated in such hybrid models

including multivariate time series models.

Two univariate time series forecasting models have been popularly adopted for this purpose: the

linear autoregressive integrated moving average (ARIMA) models and the nonlinear artificial

neural networks (ANNs). Moon et al. (2018) forecasted CCI using univariate linear ARIMA and

ARFIMA (Autoregressive Fractionally Integrated Moving Average) models. Lam and Oshodi

(2016a) applied the ARIMA, ANN, and Support Vector Machine models for predicting gross

values in the construction industry. As summarized in Table 1, however, no clear consensus exists

on a better performing model because neither of them consistently gives the best results in various

situations.

Table 1. Summary of previous studies on forecasting construction costs

Study

Methodology

Main findings

Ashuri and

Lu (2010)

ARIMA model

Seasonal ARIMA models outperform other univariate

time series models in forecasting the ENR national CCI.

Choi et al.

(2021)

ARIMA and VEC model

(Vector Error Correction)

Recommends a parsimonious ARIMA model for

forecasting the city-level construction cost index (CCI) in

the absence of leading indicators.

Fan et al.

(2010)

ARIMA model

ARIMA models cannot capture sudden changes with

turning points in the construction market.

Zhao et al.

(2019)

ARIMA model

No single dominant model was found in forecasting

residential building costs in New Zealand. The dominance

of the ARIMA model in out-of-sample forecasting

performances varies with data characteristics.

Mir et al.

(2021)

ANN

The overfitting problems of ANN were alleviated with the

optimal lower and upper bound estimation method in

forecasting construction material prices.

Shiha et al.

(2020)

ANN

ANNs perform better than linear models in forecasting

construction material price movements.

Tijanić et al.

(2019)

ANN

The forecasting accuracy of ANN depends on the quality

and quantity of input data.

Cao and

Ashuri (2020)

ARIMA and ANNs

ANNs outperform seasonal ARIMA models in forecasting

highway construction costs.

Mahdavian et

al. (2021)

Linear regression and

neural networks

Linear regression models exhibit superiority in

forecasting highway construction costs over nonlinear

neural networks.

Oshodi et al

(2017)

ARIMA and ANNs

ANNs outperform ARIMA models for forecasting the

tender price index.

Yip et al.

(2014)

ARIMA and ANNs

No dominance between ARIMA and ANNs in forecasting

construction equipment maintenance costs.

This lack of consensus is because each model has different strengths and weaknesses, as

summarized in Table 2.

Table 2. Strengths and weaknesses of ARIMA and ANNs

Approach

Strengths

Weaknesses

ARIMA

• Simple and relatively easy to

implement (Fattah et al. 2018)

• Less sensitive to data size and noise

level (Fard and Akbari-Zadeh 2014).

• More robust to overfitting problems

(Valipour et al. 2013).

• Linearity assumption may not be

applicable to many real-world time

series (Khashei and Bijari 2011).

• Difficult to capture structural changes

in time series (De et al. 2016, Fan et al.

2010).

• Possibly ignore autocorrelations of

higher-order.

ANN

• Suitable for capturing nonlinear

dynamics (Oshodi et al. 2017).

• More flexible approximation using

multiple functions.

• Accommodate heterogeneous

dynamics in data movements

(Büyükşahin and Ertekin 2019).

• Network fitting mechanism does not

always work (Janzamin et al. 2015).

• Sensitive to data size and noise level

(Zhang et al. 2018).

• Long and costly trial and error process

for approximation.

As a linear parametric time series model, the ARIMA model is simple, easy to implement, and

robust to data size and noise level (Fard and Akbari-Zadeh 2014, Fattah et al. 2018). However,

the ARIMA model is not suited to capture nonlinear dynamics of time series since it assumes a

linear relationship between past and future observations (Oshodi et al. 2017, Wang and Ashuri

2017). By contrast, the ANN has an advantage in estimating nonlinear and volatile components in

time series by extracting the information from the observed data without assuming a specific

relationship between past and future observations. Despite the attractive features of flexibility and

nonlinearity, ANNs are not necessarily preferred over linear models when it comes to out-of-

sample forecasting performances because of the so-called over-fitting problem.

To overcome these limitations of individual ARIMA and ANN models which cause no clear

consensus on a better forecasting model, hybrid models were suggested to provide dominant

predictability in forecasting different time series data such as sunspot dataset, Canadian lynx

dataset, foreign exchange rates, and stock market price indices (Büyükşahin and Ertekin 2019,

Khashei and Bijari 2011, Rathnayaka et al. 2015, Wang et al. 2013, Zhang 2003). We hypothesize

that the proposed hybrid approach for forecasting CCIs can overcome the limitations of individual

ARIMA and ANN models and provide more accurate forecasts than individual models by

estimating individually and combining both linear and nonlinear components in CCIs. The basic

idea of hybrid models is to utilize the distinctive strengths of ARIMA and ANNs to alleviate the

limitations of each model. By incorporating the advantage of each model into a combined model,

the hybrid model can achieve more accurate forecast results by reducing the error of adopting an

inappropriate method. For instance, the ARIMA model in the hybrid model can mitigate the

overfitting problems of the ANN, while the ANN in the hybrid model can capture the dynamics

that the linear function fails to approximate (Tealab et al. 2017). A common practice in the hybrid

model is to decompose time series data into its linear and nonlinear components, then apply an

appropriate type of model to each of them separately. The hybrid ARIMA-ANN model considered

herein approximates the linear component of a time series using an ARIMA model before

implementing an ANN. The ANN is then applied to approximate the nonlinearities in the residuals

of the ARIMA model that are not captured by the linear ARIMA model. Consequently, the hybrid

model can reduce the forecasting errors stemming from the overfitting problems of the ANN while

capturing nonlinear fluctuations in the ARIMA model residuals with the more flexible ANN.

This paper is the first attempt to develop and adopt a hybrid ARIMA-ANN model for CCI

forecasting to overcome the limitations of individual ARIMA and ANN models, which cause no

clear consensus on a better forecasting model in the study of CCI forecasting. The national CCI

has been forecasted using linear and nonlinear models individually in the previous studies. The

present study creates the hybrid ARIMA-ANN model (hereafter, hybrid model) to examine

whether and how much it can achieve more accurate forecasting results in both national and city-

level CCIs over individual ARIMA or ANN models, under various forecasting horizons typically

valuable to practitioners.

The hybrid model is particularly suited for forecasting CCIs, which are constructed by the

combination of multiple sub-indices, such as labor costs and material costs, that are likely to follow

different dynamic patterns across locations. Given that the main advantage of the hybrid model

comes from approximating the linear and nonlinear components of the time series in different

ways, it can effectively capture the different dynamics of subcomponents and thus yield more

accurate forecasts of CCIs. Moreover, since the dynamics of CCI movements may differ across

locations, particularly due to the heterogeneous dynamics of subcomponents, individual models

working well in one location may not necessarily work well in others. As a result, either ARIMA

or ANN alone may not be sufficient to capture the heterogeneous dynamics of CCIs in different

locations. This makes the hybrid model a promising technique for forecasting the movements in

CCIs at the city level.

DATA

This research uses monthly construction cost indices (CCIs) from January 1995 to December 2019,

which are published by Engineering News-Record (ENR) for 20 major cities in the United States.

The CCI in each city is a weighted average of subcomponents, such as common labor costs and

material costs, whose weights are fixed across cities. The city CCI consists of 81% common labor

costs, 13% of steel prices, 5% of lumber prices, and 1% of cement prices (Zevin 2020). Common

labor costs are measured by 200 hours of common labor at the average of common labor cost rates

in each city. Material prices are measured by 25 cwt of standard structural steel price, 1.128 tons

of Portland cement price, and 1,088 board-ft of 24 lumber price (ENR 2021). The national CCI

is then constructed from the simple average of the CCIs of 20 major cities. As a popular cost index

of many construction projects, the ENR CCI has been widely used for cost estimation, bid

preparation, and project budgeting (Ashuri et al. 2012). Following the research by Choi et al.

(2021), the current research collects the ENR CCIs for both the national level (NAT) and twenty

cities: Atlanta (ATL), Baltimore (BAL), Birmingham (BHM), Boston (BOS), Chicago (CHI),

Cincinnati (CIN), Cleveland (CLE), Dallas (DAL), Denver (DEN), Detroit (DET), Kansas City

(KCT), Los Angeles (LAX), Minneapolis (MIN), New Orleans (NOL), New York City (NYC),

Philadelphia (PHL), Pittsburg (PIT), San Francisco (SFC), Seattle (SEA), and St. Louis (STL).

The dataset, therefore, consists of 21 CCI series (national plus twenty cities) for 25 years (January

1995 to December 2019), resulting in 300 monthly observations for each series (N = 21, T = 300).

For the empirical analysis below, each CCI series is seasonally adjusted using the U.S. Census

Bureau’s X13-ARIMA seasonal adjustment method.

METHODOLOGIES

Researchers in the CCI literature have sought a better forecasting model over the years. Among

numerous forecasting models available in the literature, this research focuses on three models for

comparison: (1) linear ARIMA, (2) nonlinear ANNs, and (3) hybrid ARIMA-ANNs. While linear

ARIMA models and nonlinear ANNs have been popularly employed in the literature, the hybrid

model has yet been used to forecast CCIs. It is therefore worth investigating the forecasting

performance of the hybrid ARIMA-ANN model approach in comparison with those of ARIMA

and ANNs.

Autoregressive Integrated Moving Average (ARIMA) Model

Built on the combination of Autoregressive (AR) and Moving Average (MA) models, ARIMA

models assume that future data values are linearly dependent on the current and past data

observations as well as random errors.

A typical ARIMA(p,0,q) model can be represented by Equation (1).





 



 (1)

where yt denotes the current observation of a time series of interest, yt−i (i=1,2,, p) represent its

past observations, and  (j=0,1,2,, q) are random errors with zero mean and finite variance.

p and q respectively denote the orders of the autoregressive (AR) term and moving average (MA)

term, which are selected by the Bayesian information criterion (BIC) rule.

Since the stationarity of the time series is required for ARIMA models, an appropriate data

transformation is needed before estimating an ARIMA model. Because all the level CCIs in this

research turn out to be nonstationary, they are transformed to the growth rates by taking the first

log differencing before estimating an ARIMA model.

Thanks to the simplicity, ARIMA models have been widely adopted for forecasting CCIs. Studies,

in general, find that ARIMA models have decent predictive power for linear and stationary time

series processes. However, due to the underlying linearity assumption, ARIMA models show

limited accuracy in forecasting nonlinear dynamic patterns observed in many real-world time

series data (Zhang 2003). To cope with this problem, several types of nonlinear models have been

considered in the literature. A difficulty arising in this regard is to correctly specify the form of

nonlinearity. Among a wealth of nonlinear models, an artificial neural network (ANN) has been

extensively considered in the literature.

Artificial Neural Network (ANN)

ANN is a nonlinear, nonparametric, and data-driven machine learning method that mimics the

central nervous system of the human brain (Ciaburro and Venkateswaran 2017). ANN comprises

several layers, including one input layer, one or more hidden layers, and one output layer (Abd

Rahman et al. 2015). A set of nodes, or artificial neurons, are organized in each layer. The artificial

neurons with associated weights and thresholds in each layer are interconnected with the neurons

in the following layer. Any artificial neuron whose output is above its specified threshold value is

activated to send information to the next layer of the network. When the artificial neuron is not

activated, the information is not transferred to the next layer of the network (Yu et al. 2016).

Comprised of a set of artificial neurons and multilayer perceptrons (MLPs), the basic idea of ANN

is to process and transfer information through nonlinear activation functions without imposing any

prior assumption (Ahmadi et al. 2019; May et al. 2011).

The ANN with a three-layer network used for the current study is represented by Equation (2).

yt = 0 +



  



  (2)

where yt denotes a time series of interest at time t and yt−i (i=1,2,, s) denotes its past observations,

which are typically fed into the nodes in the input layer. g(∙) is a transfer function of the hidden

layer.  (i=0,1,2,, s; j=1,2,, r) is weights from the input layer to the hidden layer. j

(j=1,2,, r) is weights from the hidden layer to the output layer. s and r denote the number of

nodes in input and hidden layers, respectively.  is the random error at time t.

Note that the nonlinear feature of ANNs mainly stems from the nonlinearity of this transfer

function. In the current study, the sigmoid function in Equation (3) is used as a transfer function

of the hidden layer.

g(x) = 

 (3)

which transforms the input values of past observations to be bounded between 0 and 1.

 ( j=1,2, , s) and  (i=1,2, , r) represent the input-to-hidden weights and the hidden-to-

output weights, respectively.  is the error term.

The estimation process begins with feeding the past observations of yt into the nodes in the input

layer, which is then sent to the hidden layer, where further information is filtered out to fit the data

using the back-propagation training algorithm. The extracted information is transferred to the

output layer to produce the final output based on the nonlinear transfer function. Since the

forecasting accuracy of ANNs varies with the number of nodes in the hidden layer, forecast results

are obtained from the best training algorithm by changing the number of hidden layer nodes.

In the current study, a total of sixty-three ANN models were fitted to the growth rates of twenty

city-CCIs and national CCI over short-term, mid-term, and long-term forecasting horizons. Three-

layer ANNs were developed since the three-layer ANN can estimate any nonlinear data

movements if the numbers of nodes are determined properly (Liu et al. 2012). The number of input

nodes (s) was selected based on the least in-sample errors among twelve, twenty-four, and thirty-

six months. The best-fitted ANN models were achieved when the number of hidden nodes is

adjusted to the number of nodes in the input layer. As a rule of thumb, the number of hidden nodes

is two-thirds of input nodes (Karsoliya 2012). The ANNs were trained recursively using the back-

propagation training algorithm.

As data-driven, self-adaptive methods with few prior assumptions, ANNs have several attractive

features relative to the linear ARIMA model. First, the ANN is suited for capturing nonlinear

dynamics of time series because of its modeling structure using nonlinear network algorithms.

Therefore, nonlinear parametric ANNs have effectively solved nonlinear problems in the real

world, in contrast to conventional forecasting techniques like linear ARIMA models (Kim et al.

2021b, Ciulla et al. 2019). Second, since the ANNs are trained by the features of the data (data-

driven approach) without any strict modeling assumption such as linearity and stationarity, it is

capable of approximating any measurable functions between input and output values through its

data filtering process in the hidden layer. Consequently, it is often reported that nonlinear models

improve upon linear models in characterizing the in-sample properties of a time series. At the same

time, however, this flexibility of ANNs is known to lead to an overfitting problem in out-of-sample

forecasting (Golafshani et al. 2020, Zhang et al. 2018). This is why ANNs do not necessarily

outperform ARIMA models in out-of-sample forecasting. Besides, in practice, the number of

nodes and layers in an ANN needs to be estimated, which can lead to additional forecast

uncertainty.

Hybrid ARIMA-ANN model

The hybrid ARIMA-ANN model was proposed to alleviate the limitations of each model by

exploiting the strengths of ARIMA and ANNs (Büyükşahin and Ertekin 2019, Khashei and Bijari

2011, Rathnayaka et al. 2015, Wang et al. 2013, Zhang 2003). Previous studies in other disciplines

show that combining linear and nonlinear models can be effective in improving forecasting

performance, especially in the absence of any single dominant forecasting model (Büyükşahin and

Ertekin 2019, Zhang 2003). Zhang (2003) shows that the hybrid model can achieve improved

forecasting accuracy over individual ARIMA or ANNs by exploiting the unique strength of

ARIMA and ANNs in capturing linear and nonlinear dynamics, respectively. Buyuksahin and

Ertekin (2019) proposed a hybrid ARIMA-ANN model that works in a more general structure

using empirical mode decomposition. Yet, little is known about whether such a combination of

ARIMA and ANNs can improve the accuracy of individual linear or nonlinear models in

forecasting CCIs. This study fills the gap by examining the forecasting performance of the hybrid

model of ARIMA and ANN in comparison with those of individual ARIMA and ANNs. The

hybrid model is particularly relevant for the CCI data, which are constructed from multiple

subcomponents, such as material costs and labor costs, that are likely to follow different dynamics.

The hybrid ARIMA-ANN model employed here consists of two steps. The time series of interest

(yt) is supposed to be a combination of linear (Lt) and nonlinear (Nt) components, as represented

by Equation (4).

yt = Lt + Nt. (4)

In the first step, the linear component is estimated using an ARIMA (p,0,q) model in Equation (5).





 



 





 (5)

where 

 is the estimated linear component of the time series at time t,  (i=1,2,, p) is the

past values of the time series, 

 (i=1,2,, p) is autoregressive parameters, 

 (j=1,2,,q) is

moving-average parameters, and  ( j=1,2,, q) is the error term at time t-j. The lag lengths, p

and q, are selected by the BIC rule. Then, the residual (et) of the ARIMA model is obtained from

Equation (6).

et = yt − 

 (6)

where yt is the observed time series of interest at time t and 

 is the estimated linear component

of the yt using the ARIMA model in Equation (5).

In the second step, an ANN is applied to the residual (et) to capture the nonlinear component (

)

as represented by Equation (7).



 = g(et−1, et−2, , et−n) + εt (7)

where g() denotes a sigmoid function, n represents the number of nodes in the input layer of the

ANN, et−i (i=1,2,, n) is the residual of the ARIMA model at time t-i, and εt is the error term.

Combining the linear and nonlinear components of a time series approximated by Equations (5)

and (7), respectively, the future values of yt are forecasted by the hybrid model using the estimates

of 

 = 

 + 

. In the forecasting exercise, 

 and 

 respectively denote the predicted values of

the linear component by the ARIMA model and the nonlinear component by the ANN.

FORECASTING PERFORMANCES

Linear ARIMA, nonlinear ANN, and their hybrid model were developed for each city individually

using its CCI data. Therefore, a total of 189 models, which are equal to the number of CCI time

series data multiplied by the number of different models and the number of forecasting horizons,

were finally specified for forecasting each twenty city-CCIs and national CCI over short-term,

mid-term, and long-term forecasting horizons using ARIMA, ANN, and hybrid models.

The relative forecasting accuracy of the three competing models is evaluated based on out-of-

sample forecasting performance for predicting both national and city-level CCIs. Three different

forecasting horizons are considered: short-term (12-month-ahead forecast), medium-term (36-

month-ahead forecast), and long-term (60-month-ahead forecast), comparable to the timelines of

actual construction projects. The training and testing periods of the out-of-sample forecasting

exercises are selected accordingly, as shown in Table 3.

Table 3. Training and testing periods for out-of-sample forecasting by forecasting horizon

Forecasting horizon

Training period

Testing period

Short-term (12 months)

January 1995 ~ December 2018

January 2019 ~ December 2019

Medium-term (36 months)

January 1995 ~ December 2016

January 2017 ~ December 2019

Long-term (60 months)

January 1995 ~ December 2014

January 2015 ~ December 2019

The effective estimation sample runs from January 1995 until December 2018 (288 monthly

observations) for the short-term horizon, from January 1995 until December 2016 (264 monthly

observations) for the medium-term horizon, and from January 1995 until December 2014 (240

monthly observations) for the long-term horizon. For the short-run forecasting horizon, for

example, the CCI data from January 1995 to December 2018 are used as a training dataset for

model estimation, and the remaining data from January 2019 to December 2019 are used as a

testing dataset to evaluate the out-of-sample forecasting performance. It is well known that ANNs

perform random initialization and produce different results at each run (e.g., Büyükşahin and

Ertekin 2019). Since there is no established method for network configuration in ANNs, the best

network configuration is chosen for ANNs after building for each combination of network

parameters and evaluated on the testing dataset.

Root Mean-Squared Errors (RMSE)

Traditionally, forecasting performances are often evaluated using the root-mean-squared errors

(RMSEs). The models with smaller RMSEs are considered to show more accurate forecasting

performances. The RMSE of a forecasting model for the forecasting horizon of h can be written

as Equation (8).

 



 



 (8)

where 

 and  respectively represent the forecast value and actual value of a variable (y) at time

t, h denotes the forecasting horizon, and T denotes the sample size of the training period. Then,

Equation (9) and (10) represent the ratio of RMSE of ARIMA and ANN models compared to the

RMSE of the hybrid model (hereafter, the RMSE ratio), respectively.

The RMSE ratio between ARIMA and the hybrid model = 

 (9)

The RMSE ratio between ANN and the hybrid model = 

 (10)

The RMSE ratio serves to indicate the relative out-of-sample performances of individual models

to the hybrid model. If the RMSE ratio is greater than unity, i.e., RMSEARIMA > RMSEHybrid or

RMSEANN > RMSEHybrid, then it indicates the superiority of the hybrid model, and vice versa. If the

value of the RMSE ratio is equal to unity, then the RMSE of an individual model (ARIMA or

ANN) is the same as that of the hybrid model, suggesting an equivalent forecast accuracy of the

individual model to that of the hybrid model. Table 4 reports the RMSE ratio for the national CCI

and the 20 city averages over the three forecasting horizons.

Table 4. The RMSE ratios of out-of-sample forecasts

Forecasting horizon

Short-term

Medium-term

Long-term

RMSE ratios

ARIMA

ANN

ARIMA

ANN

ARIMA

ANN

National

1.57

1.29

7.54

3.48

7.25

11.43

20-city average

1.89

1.67

2.85

3.32

3.15

5.29

As shown in Table 4, the RMSE ratio is consistently larger than unity in all cases considered,

suggesting an outperformance of the hybrid model over individual models. Interestingly, the

outperformance of the hybrid model turns out to be stronger in longer forecasting horizons in

which the ratio is greater. Figure 1 displays the RMSE ratio of individual models to the hybrid

model for the national and city-level CCIs over the three forecasting horizons. The RMSE ratio

serves to indicate the relative performance of individual models to the hybrid model. The unity

line in Figure 1 provides a guideline where there is no significant difference between the

forecasting performance of an individual model (ARIMA or ANN) and that of a hybrid model. If

the value of the RMSE ratio is equal to unity, then the RMSE of an individual model (ARIMA or

ANN) is the same as the RMSE of the hybrid model, suggesting an equivalent accuracy of out-of-

sample forecasts between the two models. The ratio greater than the unity line indicates the

superiority of the hybrid model and vice versa. In other words, the higher column above the unity

line indicates the relative underperformance of the individual model compared to the hybrid model.

Several observations can be made from Figure 1. First, the hybrid model performs better than

individual models in predicting the national CCI. The RMSE ratio for the national CCI is

consistently larger than the unity line regardless of the forecasting horizons. The outperformance

of the hybrid model is stronger when the forecasting horizon is longer. Second, for the city-level

CCIs, the hybrid model outperforms in some cities, but not in others, especially when the

forecasting horizon is shorter. In the long-term forecasting horizon, however, the hybrid model

dominates both ARIMA and ANNs in all city CCIs. Third, it is not easy to tell the dominance

between two individual models, ARIMA and ANN.

Figure 1 indicates an outperformance of the hybrid model in many forecasting cases under

consideration. While the visual evidence is informative, it may be helpful if there exists further

concrete evidence based on more formal analysis. To this end, the formal testing tool proposed by

Giacomini and White (2006) is utilized to determine the best-performing model by assessing

whether the out-of-sample forecasting performance of the competing models is statistically

different from each other.

(a)

(b)

(c)

Figure 1. Ratio of RMSE to Hybrid Model for national and city-level CCIs for: (a) short-

term forecasting of 12 months; (b) medium-term forecasting of 36 months; and (c) long-term

forecasting of 60 months.

The Giacomini-White Test

Giacomini and White (2006) propose a formal test that can be used to examine whether the mean

squared errors (MSEs) of two competing models are significantly different from each other.

The null hypothesis of the GW test (henceforth, the GW test) is given by

H0



  (11)

where 

 represents the MSE of the h-step ahead forecast of Model A, 

 is the

corresponding MSE of Model B, and  denotes the information set at time t. Under the null

hypothesis that the forecast performance of the two models under comparison is not statistically

different from each other, the GW test statistic is given by

 





 

   (12)

where  and 

 respectively denote the MSEs of two models under comparison. Since the

GW test statistic is a two-sided test statistic with an asymptotic standard normal distribution,

inference can be made from the signs and p-values of the test statistics. The GW test compares

only two competing forecasting models at a time. Hence, the GW test is applied to three pairs of

forecasting models in this research: (i) ARIMA versus ANN, (ii) ANN versus hybrid, and (iii)

ARIMA versus hybrid. Table 5 contains the GW test results for the three forecasting horizons.

Table 5 shows that the three models under comparison are not much significantly different in the

short-term forecasting horizon as the GW test results suggest no evidence of a significant

difference in forecasting performances among them in most cases considered. The hybrid model

outperforms both ARIMA and ANNs only in six out of 20 cities. This is perhaps because the one-

year out-of-sample period is not sufficiently long enough for the GW tests to find significant

differences among the competing models.

Table 5. The Giacomini-White (GW) test results for forecasting performances of three

competing models

Legend

V = the forecasting outperformance of the model over the other model(s);

― = no evidence of forecasting outperformance over the other model(s)

City

Short-term horizon

Medium-term horizon

Long-term horizon

ARIMA

ANN

Hybrid

ARIMA

ANN

Hybrid

ARIMA

ANN

Hybrid

NAT

―

ATL

―

BAL

―

BHM

―

BOS

―

CHI

―

CIN

―

CLE

―

DAL

―

DEN

―

DET

―

KCT

―

LAX

―

MIN

―

NOL

―

NYC

―

PHL

―

PIT

―

SFC

―

SEA

―

STL

―

Note: Entries are based on Giacomini-White (G-W) tests under the null hypothesis that there is no statistically significant

difference in the mean-squared errors (MSEs) of out-of-sample forecasts of two forecasting models under comparison. ‘V’

represents the case of forecasting outperformance over the other model(s) under the given forecasting environment, as the

null hypothesis of the G-W test is rejected at the 5% significance level. ‘―’ represents no evidence of forecasting

outperformance over the other model(s) under the given forecasting environment.

The dominance of the hybrid model, however, increases with forecasting horizons. In the medium-

term horizons, the hybrid model outperforms the other two models in forecasting national CCI and

thirteen out of 20 city-level CCIs. The hybrid model performs the best in the long forecasting

horizon. The hybrid model outperforms the two individual models in almost all the cases

considered for the 5-year forecasting horizons.

Regardless of the forecasting horizons, the forecasting performance of the hybrid model is more

accurate than (or at least comparable to) that of individual models in all cases considered in the

current research. The hybrid model was validated as a more accurate model than the individual

ARIMA or ANN, providing higher accuracy and robustness in forecasting the CCIs under different

scenarios. This reflects that ARIMA or ANN, when individually used, cannot capture dynamic

patterns in the CCI series, which may contain both linear and nonlinear components. This explains

why combining two individual models by utilizing the features of each model can be effective in

enhancing the accuracy of CCI forecasts. When two individual models’ results are compared

between themselves, ANN outperforms ARIMA for the CCI forecasts in the short horizon, but

ARIMA performs better than ANN in the long horizon. However, there seems no clear dominance

between the two individual models, in line with the findings of previous studies in the literature.

The outperformance of the hybrid model is further illustrated in Figure 2. Figure 2 compares the

performances of the hybrid model and individual models (i.e., ARIMA, ANN, and Random Forest)

for forecasting national CCIs over three different forecasting horizons.

(a)

(b)

(c)

Figure 2. Out-of-sample forecasting performances of four models for the national CCI for:

(a) short-term forecasting of 12 months; (b) mid-term forecasting of 36 months; and (c) long-

term forecasting of 60 months.

Random Forest model as a nonparametric and nonlinear machine learning method has successfully

forecasted the future values of time series using binary splits and bootstrapped data (Bhattacharyya

et al. 2021; Yoon 2021; Mukherjee et al. 2018; Rudžianskaitė–Kvaraciejienė et al. 2015; Vitorino

et al. 2014). Therefore, Random Forest is additionally considered as one of the most efficient data

mining forecasting methods (Sekhar and Mahdu 2016). Table 6 summarizes the RMSE values of

the hybrid model and individual ARIMA, ANN, and Random Forest models for forecasting

national CCI over short-term, mid-term, and long-term horizons. Since the hybrid model provides

the lowest RMSEs in all forecasting cases considered in Table 6, the hybrid model was examined

to outperform all individual models in forecasting the national CCI under three different

forecasting horizons.

Table 6. The RMSE values of hybrid and individual models for forecasting the national CCI

Forecasting horizons

Short-term

Medium-term

Long-term

Hybrid ARIMA-ANN

24.37

57.70

31.21

ARIMA

38.80

150.68

232.80

ANN

39.39

262.27

387.48

Random Forest

81.86

146.84

212.06

The results displayed in Figure 2 are very similar to those outlined above, clearly demonstrating

the better performance of the hybrid model over individual models for various forecasting

horizons. The top panel (a) of Figure 2 displays the short-term out-of-sample forecasts of the

national CCI from the three competing models along with the actual values (in black solid line).

For the entire forecasting period of Jan-Dec 2019, the forecasts of the hybrid model (in red dot-

dashed line) are closer to the actual national CCIs than that of individual ARIMA, ANN, and

Random Forest models, in line with the results from the GW test reported in Table 5. The middle

panel (b) of Figure 2 exhibits the same exercise for the medium-term horizon with the testing

period of Jan 2017 to Dec 2019. Again, the hybrid model appears to achieve more accurate forecast

results for predicting the national CCI, compared to the three individual models. A similar

conclusion is drawn from the bottom panel (c) of Figure 2, which exhibits the long-term out-of-

sample forecasting performances for the forecasting period of Jan 2015 to Dec 2019. The better

performance of the hybrid model in all forecasting horizons must have come from treating both

the linear and nonlinear components of the time series sequentially.

The forecasting results in the current study consistently point to the outperformance of the hybrid

model over individual ARIMA and ANNs in forecasting the CCIs. The dominance of the hybrid

model over individual models is stronger for long-term forecasting horizons. This is consistent

with the findings by Babu and Reddy (2014) and Rathnayaka et al. (2015) that the hybrid model

works better than a single ARIMA and ANN in forecasting time series with high volatility.

The better performance of the hybrid model can be attributed to the comparative advantage of the

hybrid model in capturing dynamic movements of CCIs that are partly linear and partly nonlinear.

Since construction projects typically run over multiple years, their costs are likely to experience

volatile movements with some structural changes occurring during the entire construction period

(Shrestha et al. 2013). Besides, since the CCI series is a weighted average of multiple

subcomponents like material prices and labor costs that may follow different dynamic patterns, the

hybrid model can better capture the heterogeneous dynamics of the subcomponents.

With that said, the hybrid model adopted here assumes that the relationship between linear and

nonlinear components is additive. Hence, the empirical results obtained here may change if the

linear and nonlinear components are not additively associated. Additionally, the hybrid model

decomposes data into linear and nonlinear components by assuming that linear components can be

captured by ARIMA and nonlinear components from the residuals of the ARIMA model can be

approximated by ANN. As noted by Büyükşahin and Ertekin (2019), however, the forecasting

performances of the hybrid model may be affected if such assumptions are violated in real-world

applications. Nevertheless, the conclusion drawn here regarding the superiority of the hybrid

model over individual ARIMA or ANNs is likely unaltered because alternative methods of

decomposition of linear and nonlinear components in the hybrid model are reported to still

dominate ARIMA and ANNs in other studies.

GUIDANCE FOR MODEL SELECTION AND DEVELOPMENT

This study proposed the univariate hybrid ARIMA-ANN models for forecasting CCIs at the U.S.

national and 20-city levels. Two practical examples are provided to demonstrate a walk-through

application of an ANN and the hybrid forecasting model. New York city-CCI and Dallas city-CCI

are selected for forecasting because they represent the construction costs with high growth rates

and low growth rates over past decades in the U.S., respectively. The construction costs in New

York city whose construction spending is around $62 billion in 2018 are one of the most expensive

ones in the entire U.S. with a rapid growth rate of 3.74% (Hall 2019). The construction costs in

Dallas city are growing, but the annual growth rates in Dallas city-CCI from 1995 to 2019 are

relatively lower (1.9%) than the average annual growth rates of twenty U.S. major cities (2.99%).

Example 1 presents a walk-through implementation of an ANN in forecasting the New York city-

level CCI.

Example 1. Suppose that a contractor in New York City needs to estimate a construction cost for

a 3-year local construction project starting in 2017. The contractor decided to refer to ENR city-

CCI for estimating construction cost movements from 2017 to 2019. The contractor could

implement an ANN model to approximate the growth rate data of New York CCI from 2017 to

2019.

The three-layer ANN (36, 24, 1), ANN (24,16,1), and ANN (12,8,1) were developed using the

historical values of New York city-CCI from 1995 to 2016. The ANN (36,24,1) has thirty-six

previous monthly values as input nodes, twenty-four hidden nodes whose number is two-thirds of

the number of input nodes, and one output node (Karsoliya 2012). The ANN (24,16,1) has twenty-

four previous monthly values as input nodes, sixteen hidden nodes, and one output node. The ANN

(12,8,1) has twelve previous monthly values as input nodes, eight hidden nodes, and one output

node.

Table 7 presents the in-sample and out-of-sample forecasting errors. The ANN (36, 24, 1) has the

least forecasting errors for both in-sample and out-of-sample data.

Table 7. In-sample and out-of-sample forecasting errors of ANNs for the 3-year-ahead

forecasts of New York city-CCI

Forecasting Errors

In-sample

Out-of-sample

Models

MAPE

RMSE

MAPE

RMSE

ANN (36,24,1)

0.27

44.49

8.86

1842.84

ANN (24,16,1)

0.4

71.96

12.04

2447.97

ANN (12,8,1)

0.55

120.25

11.36

2274.97

Figure 3 illustrates the 3-year-ahead forecasts for New York city-CCI using the ANNs.

Figure 3. Out-of-sample forecasting performances of the ANNs for the 3-year-ahead

forecasts of New York City-CCI.

Example 2 explains a walk-through application of a hybrid model in forecasting the Dallas city-

level CCI.

Example 2. Suppose that a contractor in Dallas, Texas, considers participating in a bidding process

for a 5-year local construction project starting in 2015. The contractor referred to historical ENR

city-CCIs to forecast and estimate construction cost movements from 2015 to 2019. Since the

growth rates of Dallas CCI can be represented as a combination of linear components and nonlinear

components by Equation (13), the contractor could create the hybrid model to represent the growth

rate data of Dallas CCI from the last 5-year time span of 1995 to 2014.

∆DALt = Lt + Nt (13)

In the first step, the linear component is estimated using a linear ARIMA (1,0,1) model in Equation

(14). The ARIMA (1,0,1) model indicates the ARIMA model with one autoregressive term (p) and

one moving average term (q). The ARIMA model was selected by the BIC rule.



   (14)

where 

 is the estimated linear component of the Dallas CCI growth rate at time t (t=January 2015

to December 2019),  is the past values of the Dallas CCI growth rate at time t-1 which

indicates the growth rate one month previous to the forecast month (t), and  is the error term at

time t. Then, the residual (et) of the ARIMA model is obtained from Equation (15).

et = ∆DALt − 

 (15)

where ∆DALt is the observed monthly Dallas CCI growth rate at time t from 1995 to 2014 and 



is the estimated linear component of the Dallas CCI growth rate at time t using the ARIMA model

in Equation (14).

In the second step, the residuals (et) from 1995 to 2014 generated by Equation (15) are

approximated by the feedforward ANN (12,8,1) in Figure 4. The ANN (12,8,1) denotes an ANN

model with twelve input nodes, eight hidden nodes, and one output node. Previous twelve residuals

were recursively trained to forecast the next residual. The sigmoid function is used as a transfer

function between layers. The ANN (12,8,1) in Figure 4 is used to forecast the 5-year-ahead

residuals of Dallas city-CCIs in the hybrid model.

Figure 4. ANN for approximating and forecasting the residuals of Dallas-CCIs.

The hybrid model forecasts the 5-year-ahead future growth rates of the Dallas city-CCI by

combining the linear and nonlinear components approximated by ARIMA (1,0,1) and ANN

(12,8,1) additively. In this forecasting example, the hybrid model outperforms in forecasting the

5-year-ahead Dallas city-CCIs with a mean average prediction error (MAPE) of 0.32 while the

individual ARIMA and ANN provide the MAPEs of 0.7 and 5.8, respectively.

The empirical results of CCI forecasting performances using hybrid models provide several

caveats for cost engineers and capital planners in construction projects. First, the hybrid model is

recommended for forecasting the national as well as city-level CCIs because the hybrid model

displayed stable and robust predictability in forecasting CCIs under different scenarios regardless

of construction location (city) and duration (forecasting horizon). In all cases considered in this

study, the hybrid model provided more accurate forecasts than the individual models, including

ARIMA, ANN, and Random Forest. Second, the forecasting performance of the hybrid model is

more accurate, especially when the forecasting horizon is longer. Therefore, it is recommended to

implement the hybrid model for forecasting CCIs, especially in long-term construction projects.

The 5-year-ahead forecasts of the national CCI, New York city-CCI, and Dallas city-CCI using

hybrid models are illustrated in Figure 5. The significant gaps between individual city-CCIs and

national CCI suggest that using the national CCI for local projects can cause large forecasting

errors in cost estimation and adjustment.

Figure 5. 5-year-ahead forecasts of the national CCI, New York City-CCI, and Dallas-CCI

using hybrid models.

Third, the hybrid model outperforms an individual ARIMA or ANN, especially when the city-

CCIs show higher growth rates or more frequent discrete changes. For example, the hybrid model

provides more accurate CCI forecasts for the cities with high annual growth rates of CCIs such as

BOS (3.36%), CHI (4.1%), NYC (3.74%), STL (2.99%), SFC (2.72%), and SEA (2.98%) that are

primarily located in eastern and western parts of the U.S. compared to the cities with lower annual

growth rates of CCIs such as DAL (1.9%) and NOL (2.0%) in southern states.

One of the major reasons for cross-city heterogeneities in city-CCI movements is different

economic and market situations across cities, such as construction labor market situations. Since

the CCIs are mainly composed of labor costs, construction labor wages and market situations in

each city affect the CCI movements. The construction labor market is less flexible compared to

the construction material market due to several reasons such as labor retraining and relocation

costs, annual legislative contracts, union wage premiums, and minimum wage standards (Alaloul

et al. 2021, Bobeica et al. 2019, Finkel 2015, Hwang 2011, Phillips 1982). Thus, construction labor

costs increase discretely over a year with heterogenous growth rates across cities. City-CCIs are

also prone to discrete fluctuations with different growth rates based on each city’s economic

situation, such as the local construction labor market. The hybrid models show better forecasting

performances over individual models for CCIs with higher growth rates and more dynamic

fluctuations. Lastly, proper decomposition of time series into linear and nonlinear components is

critical for the forecasting performances of hybrid models. For example, if linear and nonlinear

components are combined multiplicatively, a multiplicative hybrid model is recommended (Babu

and Reddy 2014, Büyükşahin and Ertekin 2019). The hybrid model used in this study assumes the

additive combination of linear and nonlinear components that can be approximated by ARIMA

and ANN, respectively. If the linear and nonlinear components in CCI are not combined additively

(e.g., multiplicative combination), the forecasting performances of the additive hybrid model

cannot be as accurate as in this study.

CONCLUSIONS

Cost estimation in budget planning and contract bidding is important for the success of

construction projects because both underestimation and overestimation of project cost lead to

undesirable consequences. Inaccurate cost estimation can increase the probabilities of project

failures and social costs through resource misallocations. Given that construction projects typically

run over multiple years during which construction costs vary nontrivially, it is challenging to

forecast the movements of project costs in the stage of budgeting and bidding.

Researchers and practitioners have applied popular forecasting models, such as linear ARIMA

models or nonlinear ANNs to construction cost measures such as CCIs for improving the accuracy

of construction cost estimation. While linear models such as ARIMA produce better forecasting

accuracy for linear dynamics of a time series, nonlinear methods such as ANN are more

appropriate for capturing its nonlinear dynamics. Unfortunately, there exists no consensus on the

underlying dynamics of CCIs, which are constructed from multiple subcomponents. It is suggested

in the literature that using a combination of linear and nonlinear models may enhance the

forecasting accuracy by exploiting distinctive features of each model, compared to implementing

one of them individually. Despite the potential benefit of this hybrid model approach, little

attention has been devoted to employing such hybrid models for forecasting CCIs.

Via an extensive out-of-sample forecasting exercise for various forecasting horizons, this study

examined the outperformance of the hybrid model in forecasting both national and city-level CCIs,

relative to those of individual ARIMA and ANN models. The results of this research favor the

hybrid models over individual ARIMA or ANN models, especially for longer forecasting horizons.

The hybrid models consistently provide better forecasting performances than the individual

ARIMA and ANN models in all cases considered. In other words, the hybrid models show stable

and robust predictability in forecasting CCIs under different scenarios including construction

location (city) and duration (forecasting horizon). This is probably because the hybrid model is

designed for capturing both linear and nonlinear dynamics embedded in the subcomponents of the

CCI series.

Even though different forecasting models have been used for forecasting CCIs, no individual

forecasting model shows superior performance dominantly across different conditions such as

construction locations and forecasting horizons (Cao et al. 2015; Lam and Oshodi 2016b; Oshodi

et al. 2017; Shahandashti and Ashuri 2013; Shahandashti 2014; Shiha et al. 2020; Wang and Ashuri

2016). This is perhaps because those individual forecasting models do not fully regard and

represent the nature of CCI fluctuations. This research hypothesizes that the CCI fluctuations

consist of both linear and nonlinear components. Based on the hypothesis of an additive

combination of linear and nonlinear components in CCI fluctuations, the researchers were inspired

to investigate the performance of hybrid ARIMA-ANN models in CCI forecasting.

The current study is the first attempt to develop and implement the univariate hybrid ARIMA-

ANN model to forecast CCIs at the U.S. national and city levels. The findings of this study indicate

stable and robust outperformance of hybrid models based on the empirical results that forecasting

performances of individual models can be improved by combining them. The hybrid model

provided more accurate forecasts than other individual models such as ARIMA, ANN, and

Random Forest regardless of city and forecasting horizon. Also, this study found that the

fluctuations in the CCI time series have both linear and nonlinear dynamics over time. The

outperformance of the hybrid model over linear ARIMA and nonlinear ANN models in CCI

forecasting indicates that the fluctuations in CCIs consist of both linearities and nonlinearities.

Therefore, it is misleading to assume that the CCIs have a solely linear or nonlinear data structure.

The findings of this study suggest an additional caveat for using the hybrid model to forecast

heterogenous CCI movements across twenty different U.S. cities. The hybrid model outperforms

an individual ARIMA or ANN in CCI forecasting, especially when the CCI shows higher annual

growth rates and more frequent discrete changes. In other words, when the economic situations

such as the construction labor market are frequently changing, the predictability of the hybrid

model becomes even more accurate than that of an individual model.

Even though the findings of this research provide a practical recommendation to implement the

hybrid model in construction cost estimation, this research is subject to several limitations. First,

this study focused on univariate forecasting models using the historical observations of time series

data. In the cities where additional explanatory variables, such as leading indicators of construction

costs (e.g., Choi et al. 2021, Mahdavian et al. 2021), are available, it would be interesting to

investigate whether the conclusions of this study still hold. Second, among a wide variety of

nonlinear models, this research solely focused on ANNs. Although ANNs have been popularly

adopted for forecasting, it would be instructive to examine whether the use of alternative nonlinear

models such as recurrent neural networks (RNN) or long-short term memory (LSTM) can enhance

the performance of the hybrid model (Jang et al. 2020, Kim et al. 2021b). Third, the hybrid model

employed in this research assumes a specific relation and decomposition of linear and nonlinear

models. A related fruitful line of research would be to probe the sources of linear and nonlinear

dynamics based on the CCI subcomponent analysis.

Although statistical forecasting models including the hybrid model produced reasonably accurate

forecasts for construction costs in the current research, these models have not been examined for

forecasting construction costs in the extreme abnormal situations. In the literature, forecasting

models such as nonlinear ANNs have shown reasonably accurate forecasts after a short period into

the extreme abnormal events (Ahmad et al. 2021; Fahiman et al. 2019; Malefors et al. 2021).

However, for the future work, it is recommended that the robustness and feasibility of the proposed

statistical models for forecasting construction costs be examined during abnormal situations. Also,

it is of interest to examine if macroeconomic or construction market leading indicators can improve

the accuracy in forecasting construction costs.

DATA AVAILABILITY

Some or all data, models, or codes used during the study were provided by a third party. Direct

requests for these materials may be made to the provider as indicated in the Acknowledgements.

ACKNOWLEDGEMENT

This research uses monthly construction cost indices (CCIs) from January 1995 to December 2019,

which are published by Engineering News-Record (ENR) for 20 major cities in the United States.

REFERENCES

Abd Rahman, N. H., Lee, M. H., and Latif, M. T. (2015). Artificial neural networks and fuzzy time

series forecasting: an application to air quality. Quality and Quantity, 49(6), 2633-2647.

Adebiyi, A. A., Adewumi, A. O., and Ayo, C. K. (2014). Comparison of ARIMA and artificial

neural networks models for stock price prediction. Journal of Applied Mathematics, 2014.

Ahmad, M., Khan, Y. A., Jiang, C., Kazmi, S. J. H., & Abbas, S. Z. (2021). The impact of COVID‐

19 on unemployment rate: An intelligent based unemployment rate prediction in selected

countries of Europe. International Journal of Finance & Economics.

Ahmadi, M. H., Mohseni-Gharyehsafa, B., Farzaneh-Gord, M., Jilte, R. D., Kumar, R., and Chau,

K. W. (2019). Applicability of connectionist methods to predict dynamic viscosity of

silver/water nanofluid by using ANN-MLP, MARS and MPR algorithms. Engineering

Applications of Computational Fluid Mechanics, 13(1), 220-228.

Alaloul, W. S., Musarat, M. A., Liew, M. S., Qureshi, A. H., and Maqsoom, A. (2021).

Investigating the impact of inflation on labour wages in Construction Industry of

Malaysia. Ain Shams Engineering Journal, 12(2), 1575-1582.

Ashuri, B., and Lu, J. (2010). Time series analysis of ENR construction cost index. Journal of

Construction Engineering and Management, 136(11), 1227-1237.

Ashuri, B., Shahandashti, S. M., and Lu, J. (2012). “Empirical tests for identifying leading

indicators of ENR construction cost index.” Construction Management and Economics,

30(11), 917-927.

Assaad, R., and El-adaway, I. H. (2021). Impact of dynamic workforce and workplace variables

on the productivity of the construction industry: New gross construction productivity

indicator. Journal of Management in Engineering, 37(1), 04020092.

Babu, C. N., and Reddy, B. E. (2014). A moving-average filter based hybrid ARIMA–ANN for

forecasting time series data. Applied Soft Computing, 23, 27-38.

Bhattacharyya, A., Yoon, S., Weidner, T. J., and Hastak, M. (2021). Purdue Index for Construction

Analytics: Prediction and Forecasting Model Development. Journal of Management in

Engineering, 37(5), 04021052.

Bian, Z., Zhang, Z., Liu, X., and Qin, X. (2019). Unobserved component model for predicting

monthly traffic volume. Journal of Transportation Engineering, Part A: Systems, 145(12),

04019052.

Bobeica, E., Ciccarelli, M., and Vansteenkiste, I. (2019). The link between labor cost and price

inflation in the euro area.

Büyükşahin, Ü. Ç., and Ertekin, Ş. (2019). Improving forecasting accuracy of time series data

using a new hybrid ARIMA-ANN method and empirical mode decomposition.

Neurocomputing, 361, 151-163.

Cantarelli, C. C., Flyvbjerg, B. (2013). Mega-projects’ cost performance and lock-in: problems

and solutions. In International handbook on mega-projects. Edward Elgar Publishing.

Cao, M. T., Cheng, M. Y., and Wu, Y. W. (2015). Hybrid computational model for forecasting

Taiwan construction cost index. Journal of Construction Engineering and

Management, 141(4), 04014089.

Cao, Y., and Ashuri, B. (2020). Predicting the volatility of highway construction cost index using

long short-term memory. Journal of Management in Engineering, 36(4), 04020020.

Choi, C-Y., Ryu, K. R., and Shahandashti, M. (2021). Predicting City-Level Construction Cost

Index Using Linear Forecasting Models. Journal of Construction Engineering and

Management, 147(2), 04020158.

Ciaburro, G., and Venkateswaran, B. (2017). Neural Networks with R: Smart models using CNN,

RNN, deep learning, and artificial intelligence principles. Packt Publishing Ltd.

Ciulla, G., D’Amico, A., Di Dio, V., and Brano, V. L. (2019). Modelling and analysis of real-

world wind turbine power curves: Assessing deviations from nominal curve by neural

networks. Renewable energy, 140, 477-492.

Congressional Budget Office (CBO). (2018). Public Spending on Transportation and Water

Infrastructure, 1956 to 2017. Congressional Budget Office.

Cook, T. R., and Doh, T. (2019). Assessing Macroeconomic Tail Risks in a Data-Rich

Environment. Federal Reserve Bank of Kansas City Working Paper RWP, 19-12.

De, P., Sahu, D., Pandey, A., Gulati, B. K., Chandhiok, N., Shukla, A. K., Mohan, P., and Mitra,

R. G. (2016). Post millennium development goals prospect on child mortality in India: an

analysis using autoregressive integrated moving averages (ARIMA) model. Health, 8(15),

1845.

ENR. (2021). Historical Indices. Engineering News-Record. Retrieved from

https://www.enr.com/economics/historical_indices

Fahiman, F., Erfani, S. M., & Leckie, C. (2019, July). Robust and accurate short-term load

forecasting: A cluster oriented ensemble learning approach. In 2019 International Joint

Conference on Neural Networks (IJCNN) (pp. 1-8). IEEE.

Fan, R. Y., Ng, S. T., and Wong, J. M. (2010). Reliability of the Box–Jenkins model for forecasting

construction demand covering times of economic austerity. Construction Management and

Economics, 28(3), 241-254.

Fard, A. K., and Akbari-Zadeh, M. R. (2014). A hybrid method based on wavelet, ANN and

ARIMA model for short-term load forecasting. Journal of Experimental and Theoretical

Artificial Intelligence, 26(2), 167-182.

Fattah, J., Ezzine, L., Aman, Z., El Moussami, H., and Lachhab, A. (2018). Forecasting of demand

using ARIMA model. International Journal of Engineering Business Management, 10,

1847979018808673.

Finkel, G. (2015). The economics of the construction industry. Routledge.

Giacomini, R., and White, H. (2006). Tests of conditional forecasting

ability. Econometrica, 74(6), 1545-1578.

Giraitis, L., Kapetanios, G., and Yates, T. (2018). Inference on multivariate heteroscedastic time

varying random coefficient models. Journal of Time Series Analysis, 39(2), 129-149.

Golafshani, E. M., Behnood, A., and Arashpour, M. (2020). Forecasting the compressive strength

of normal and High-Performance Concretes using ANN and ANFIS hybridized with Grey

Wolf Optimizer. Construction and Building Materials, 232, 117266.

Hall, M. (2019, February 6). Mounting Costs Push Construction Industry Harder Toward Tech

Solutions. Building on Our Heritage. Retrieved from

https://marxrealty.com/press/mounting-costs-push-construction-industry-harder-toward-

tech-solutions/

Han, M., Zhang, R., Qiu, T., Xu, M., and Ren, W. (2017). Multivariate chaotic time series

prediction based on improved grey relational analysis. IEEE Transactions on Systems,

Man, and Cybernetics: Systems, 49(10), 2144-2154.

Han, S., Ko, Y., Kim, J., and Hong, T. (2018). Housing market trend forecasts through statistical

comparisons based on big data analytic methods. Journal of Management in

Engineering, 34(2), 04017054.

Hwang, S. (2011). Time series models for forecasting construction costs using time series

indexes. Journal of Construction Engineering and Management, 137(9), 656-662.

Jang, Y., Jeong, I., and Cho, Y. K. (2020). Business failure prediction of construction contractors

using a LSTM RNN with accounting, construction market, and macroeconomic

variables. Journal of management in engineering, 36(2), 04019039.

Janzamin, M., Sedghi, H., and Anandkumar, A. (2015). Beating the perils of non-convexity:

Guaranteed training of neural networks using tensor methods. arXiv preprint

arXiv:1506.08473.

Kamruzzaman, M., Makino, Y., and Oshita, S. (2016). Parsimonious model development for real-

time monitoring of moisture in red meat using hyperspectral imaging. Food

chemistry, 196, 1084-1091.

Karaca, I., Gransberg, D. D., and Jeong, H. D. (2020). Improving the accuracy of early cost

estimates on transportation infrastructure projects. Journal of Management in

Engineering, 36(5), 04020063.

Karsoliya, S. (2012). Approximating number of hidden layer neurons in multiple hidden layer

BPNN architecture. International Journal of Engineering Trends and Technology, 3(6),

714-717.

Khashei, M., and Bijari, M. (2011). A novel hybridization of artificial neural networks and

ARIMA models for time series forecasting. Applied Soft Computing, 11(2), 2664-2675.

Kim, S., Abediniangerabi, B., and Shahandashti, M. (2020). Forecasting Pipeline Construction

Costs Using Time Series Methods. In Pipelines 2020 (pp. 198-209). Reston, VA:

American Society of Civil Engineers.

Kim, S., Abediniangerabi, B., and Shahandashti, M. (2021a). Pipeline Construction Cost

Forecasting Using Multivariate Time Series Methods. Journal of Pipeline Systems

Engineering and Practice, 12(3), 04021026.

Kim, S., Abediniangerabi, B., and Shahandashti, M. (2021b). Forecasting Pipeline Construction

Costs Using Recurrent Neural Networks. In Pipelines 2021 (pp. 325-335).

Kim, Y., Son, H. G., and Kim, S. (2019). Short term electricity load forecasting for institutional

buildings. Energy Reports, 5, 1270-1280.

Lam, K. C., and Oshodi, O. S. (2016a). Using univariate models for construction output

forecasting: Comparing artificial intelligence and econometric techniques. Journal of

Management in Engineering, 32(6), 04016021.

Lam, K. C., and Oshodi, O. S. (2016b). Forecasting construction output: a comparison of artificial

neural network and Box-Jenkins model. Engineering, Construction and Architectural

Management.

Liu, H., Tian, H. Q., and Li, Y. F. (2012). Comparison of two new ARIMA-ANN and ARIMA-

Kalman hybrid methods for wind speed prediction. Applied Energy, 98, 415-424.

Mahdavian, A., Shojaei, A., Salem, M., Yuan, J. S., and Oloufa, A. A. (2021). Data-Driven

Predictive Modelling of Highway Construction Cost Items. Journal of Construction

Engineering and Management, 147(3), 04020180.

Malefors, C., Secondi, L., Marchetti, S., & Eriksson, M. (2021). Food waste reduction and

economic savings in times of crisis: The potential of machine learning methods to plan

guest attendance in Swedish public catering during the Covid-19 pandemic. Socio-

Economic Planning Sciences, 101041.

May, R., Dandy, G., and Maier, H. (2011). Review of input variable selection methods for artificial

neural networks. Artificial neural networks-methodological advances and biomedical

applications, 10, 16004.

McNichol, E. (2016). It’s time for states to invest in infrastructure. Center on Budget and Policy

Priorities, February. Available: www. cbpp. org/research/state-budget-and-tax/its-time-

for-states-to-invest-ininfrastructure [Accessed 12 April 2017].

Mir, M., Kabir, H. D., Nasirzadeh, F., and Khosravi, A. (2021). Neural network-based interval

forecasting of construction material prices. Journal of Building Engineering, 39, 102288.

Moon, S., Chi, S., and Kim, D. Y. (2018). Predicting construction cost index using the

autoregressive fractionally integrated moving average model. Journal of Management in

Engineering, 34(2), 04017063.

Mukherjee, S., Nateghi, R., and Hastak, M. (2018). A multi-hazard approach to assess severe

weather-induced major power outage risks in the us. Reliability Engineering and System

Safety, 175, 283-305.

Oshodi, O. S., Ejohwomu, O. A., Famakin, I. O., and Cortez, P. (2017). Comparing univariate

techniques for tender price index forecasting: Box-Jenkins and neural network

model. Construction Economics and Building, 17(3), 109-123.

Phillips, B. A. (1982). The impact of inflation upon US highway maintenance and construction

costs. Transportation Research Part A: General, 16(1), 1-11.

Rathnayaka, R. K. T., Seneviratna, D. M. K. N., Jianguo, W., and Arumawadu, H. I. (2015,

October). A hybrid statistical approach for stock market forecasting based on artificial

neural network and ARIMA time series models. In 2015 International Conference on

Behavioral, Economic and Socio-cultural Computing (BESC) (pp. 54-60). IEEE.

Rudžianskaitė–Kvaraciejienė, R., Apanavičienė, R., and Gelžinis, A. (2015). Modelling the

effectiveness of PPP road infrastructure projects by applying random forests. Journal of

Civil Engineering and Management, 21(3), 290-299.

Sekhar, C. R., and Madhu, E. (2016). Mode choice analysis using random forrest decision

trees. Transportation Research Procedia, 17, 644-652.

Shahandashti, S. M. (2014). Analysis of construction cost variations using macroeconomic, energy

and construction market variables (Doctoral dissertation, Georgia Institute of

Technology).

Shahandashti, S. M., and Ashuri, B. (2013). Forecasting engineering news-record construction cost

index using multivariate time series models. Journal of Construction Engineering and

Management, 139(9), 1237-1243.

Shahandashti, S. M., and Ashuri, B. (2016). Highway construction cost forecasting using vector

error correction models. Journal of management in engineering, 32(2), 04015040.

Shiha, A., Dorra, E. M., and Nassar, K. (2020). Neural Networks Model for Forecasting of

Construction Material Prices in Egypt Using Macroeconomic Indicators. Journal of

Construction Engineering and Management, 146(3), 04020010.

Shrestha, P. P., Burns, L. A., and Shields, D. R. (2013). Magnitude of construction cost and

schedule overruns in public work projects. Journal of Construction Engineering, 2013(1),

1-9.

Steyerberg, E. W. (2019). Assumptions in regression models: additivity and linearity. In Clinical

Prediction Models (pp. 227-245). Springer, Cham.

Tealab, A., Hefny, H., and Badr, A. (2017). Forecasting of nonlinear time series using

ANN. Future Computing and Informatics Journal, 2(1), 39-47.

Tijanić, K., Car-Pušić, D., and Šperac, M. (2019). Cost estimation in road construction using

artificial neural network. Neural Computing and Applications, 1-13.

Valipour, M., Banihabib, M. E., and Behbahani, S. M. R. (2013). Comparison of the ARIMA,

ARIMA, and the autoregressive artificial neural network models in forecasting the monthly

inflow of Dez dam reservoir. Journal of hydrology, 476, 433-441.

Vitorino, D., Coelho, S. T., Santos, P., Sheets, S., Jurkovac, B., and Amado, C. (2014). A random

forest algorithm applied to condition-based wastewater deterioration modeling and

forecasting. Procedia Engineering, 89, 401-410.

Wang, J., and Ashuri, B. (2017). Predicting ENR construction cost index using machine-learning

algorithms. International Journal of Construction Education and Research, 13(1), 47-63.

Wang, L., Zou, H., Su, J., Li, L., and Chaudhry, S. (2013). An ARIMA‐ANN hybrid model for

time series forecasting. Systems Research and Behavioral Science, 30(3), 244-259.

Xu, J. W., and Moon, S. (2013). Stochastic forecast of construction cost index using a cointegrated

vector autoregression model. Journal of Management in Engineering, 29(1), 10-18.

Yip, H. L., Fan, H., and Chiang, Y. H. (2014). Forecasting the maintenance cost of construction

equipment: Comparison between general regression neural network and Box–Jenkins time

series models. Automation in Construction, 38, 30-38.

Yoon, J. (2021). Forecasting of real GDP growth using machine learning models: Gradient

boosting and random forest approach. Computational Economics, 57(1), 247-265.

Yu, X., Ye, C., and Xiang, L. (2016). Application of artificial neural network in the diagnostic

system of osteoporosis. Neurocomputing, 214, 376-381.

Zevin, A. (2020). Lumber Prices Drop in 2019. ENR: Engineering News-Record, 284(6), 36.

Zhang, G. P. (2003). Time series forecasting using a hybrid ARIMA and neural network

model. Neurocomputing, 50, 159-175.

Zhang, X., Xue, T., and Stanley, H. E. (2018). Comparison of econometric models and artificial

neural networks algorithms for the forecasting of baltic dry index. IEEE Access, 7, 1647-

1657.

Zhao, L., Mbachu, J., and Zhang, H. (2019). Forecasting residential building costs in New Zealand

using a univariate approach. International Journal of Engineering Business

Management, 11, 1847979019880061.

Forecasting Construction Material Prices Using Macroeconomic Indicators of Trading Partners

Article

Jun 2024
J MANAGE ENG

Supply chain instabilities and inflated material prices have had a disruptive impact on cost estimating of construction projects. While several research efforts used national macroeconomic indicators to forecast the prices of domestically produced construction materials, none of the existing studies investigated whether the lagged macroeconomic indicators of the main trading partners could enhance the predictability of the prices of cement, steel, and lumber in the US construction sector. This paper fills this knowledge gap. The authors adopted a multi-step methodology that included: (1) collecting data on the target variables and the candidate leading indicators; (2) identifying the structural breaks in the collected data sets; (3) conducting causality tests to identify short-term associations and cointegration tests to examine long-term relationships; (4) developing vector error correction (VEC) models to forecast the prices in the short and long terms; and (5) evaluating the performance of the proposed models against existing forecasting models in the literature. Results of the Granger test and Johansen test indicate that Canada’s overall producer price index (PPI) is a consistent leading indicator of the prices of cement, and Mexico’s overall PPI is a consistent leading indicator of the prices of steel. Findings indicate no statistical evidence to suggest that neither Canada’s PPI nor Mexico’s PPI can be leading indicators of lumber prices. Over an 18-month ahead of sample horizon, the presented VEC models of cement and steel prices outperformed existing models, particularly beyond the 1-year-ahead forecasts. Utilization of the proposed forecasting models can significantly enhance the accuracy of cost estimates and feasibility studies of construction projects. This provides proactive financial planning for construction contractors and project owners through improved short- and long-term forecasting of the prices of main construction materials.

Construction cost prediction system based on Random Forest optimized by the Bird Swarm Algorithm

Article

Full-text available

Jul 2023
MATH BIOSCI ENG

Predicting construction costs often involves disadvantages, such as low prediction accuracy, poor promotion value and unfavorable efficiency, owing to the complex composition of construction projects, a large number of personnel, long working periods and high levels of uncertainty. To address these concerns, a prediction index system and a prediction model were developed. First, the factors influencing construction cost were first identified, a prediction index system including 14 secondary indexes was constructed and the methods of obtaining data were presented elaborately. A prediction model based on the Random Forest (RF) algorithm was then constructed. Bird Swarm Algorithm (BSA) was used to optimize RF parameters and thereby avoid the effect of the random selection of RF parameters on prediction accuracy. Finally, the engineering data of a construction company in Xinyu, China were selected as a case study. The case study showed that the maximum relative error of the proposed model was only 1.24%, which met the requirements of engineering practice. For the selected cases, the minimum prediction index system that met the requirement of prediction accuracy included 11 secondary indexes. Compared with classical metaheuristic optimization algorithms (Particle Swarm Optimization, Genetic Algorithms, Tabu Search, Simulated Annealing, Ant Colony Optimization, Differential Evolution and Artificial Fish School), BSA could more quickly determine the optimal combination of calculation parameters, on average. Compared with the classical and latest forecasting methods (Back Propagation Neural Network, Support Vector Machines, Stacked Auto-Encoders and Extreme Learning Machine), the proposed model exhibited higher forecasting accuracy and efficiency. The prediction model proposed in this study could better support the prediction of construction cost, and the prediction results provided a basis for optimizing the cost management of construction projects.

Forecasting Government Project Costs in Colombia: Combining Regression-Based and Text-Mining Approaches for Predictive Analysis

Article

Full-text available

Apr 2024

Understanding the projected costs of projects within various sectors of a country is crucial for resource allocation and timely delivery. In Colombia, comprehensive government project data is accessible through the National Government’s open data platform. Utilizing these datasets from the National Planning Department, we construct a predictive model leveraging regression analysis to estimate the expenses associated with governmental initiatives. This work evaluates several regression models, using diverse evaluation error metrics, to determine the most effective model for deployment. A key component of our approach is to combine textual attributes into a single variable, and subsequently apply text mining techniques, in order to obtain insights from free text fields in the data sets. Ultimately, the Adaboost model combined with TF-IDF emerged as the most precise combination of models, exhibiting a mean average precision error (MAPE) of 17.6%, closely followed by the Random Forest model combined with TF-IDF with a MAPE of 17.9%.

Construction Project Cost Prediction Method Based on Improved BiLSTM

Article

Full-text available

Jan 2024

In construction project management, accurate cost forecasting is critical for ensuring informed decision making. In this article, a construction cost prediction method based on an improved bidirectional long- and short-term memory (BiLSTM) network is proposed to address the high interactivity among construction cost data and difficulty in feature extraction. Firstly, the correlation between cost-influencing factors and the unilateral cost is calculated via grey correlation analysis to select the characteristic index. Secondly, a BiLSTM network is used to capture the temporal interactions in the cost data at a deep level, and the hybrid attention mechanism is incorporated to enhance the model’s feature extraction capability to comprehensively capture the interactions among the features in the cost data. Finally, a hyperparameter optimisation method based on the improved particle swarm optimisation algorithm is proposed using the prediction accuracy as the fitness function of the algorithm. The MAE, RMSE, MPE, MAPE, and coefficient of determination of the simulated prediction results of the proposed method on the dataset are 7.487, 8.936, 0.236, 0.393, and 0.996%, respectively, where MPE is a positive coefficient. This avoids the serious consequences of underestimating the cost. Compared with the unimproved BiLSTM, the MAE, RMSE, and MAPE are reduced by 15.271, 18.193, and 0.784%, respectively, which reflects the superiority and effectiveness of the method and can provide technical support for project cost estimation in the construction field.

Forecasting Construction Cost Index through Artificial Intelligence

Article

Full-text available

Oct 2023

This study presents a novel approach for forecasting the construction cost index (CCI) of building materials in developing countries. Such estimations are challenging due to the need for a longer time, the influence of inflation, and fluctuating project prices in developing countries. This study used three techniques—a modified Artificial Neural Network (ANN), time series, and linear regression—to predict and forecast the local building material CCI in Pakistan. The predicted CCI is based on materials, including bricks, steel, cement, sand, and gravel. In addition, the swish activation function was introduced to increase the accuracy of the associated algorithms. The results suggest that the ANN model has superior prediction results, with the lowest Mean Error (ME), Mean Absolute Error (MAE), and Theil’s U statistic (U-Stat) values of 0.04, 28.3, and 0.62, respectively. The time series and regression models have ME values of 0.22 and 0.3, MAE values of 30.07 and 28.3, and U-Stat values of 0.65 and 0.64, respectively. The proposed models can assist contractors, project managers, and owners through an accurately estimated cost index. Such accurate CCIs help correctly estimate project budgets based on building material prices to mitigate project risks, delays, and failures.

Spatiotemporal characteristics and forecasting of short-term meteorological drought in China

Article

Jul 2023
J HYDROL

Construction Project Estimation with LSTM:Materials, Costs and Timelines

Conference Paper

Apr 2024

Machine learning algorithms for safer construction sites: Critical review

Article

Full-text available

Apr 2024

Machine learning, a key thruster of Construction 4.0, has seen exponential publication growth in the last ten years. Many studies have identified ML as the future, but few have critically examined the applications and limitations of various algorithms in construction management. Therefore, this article comprehensively reviewed the top 100 articles from 2018 to 2023 about ML algorithms applied in construction risk management, provided their strengths and limitations, and identified areas for improvement. The study found that integrating various data sources, including historical project data, environmental factors, and stakeholder information, has become a common trend in construction risk. However, the challenges associated with the need for extensive and high-quality datasets, models’ interpretability, and construction projects’ dynamic nature pose significant barriers. The recommendations presented in this paper can facilitate interdisciplinary collaboration between traditional construction and machine learning, thereby enhancing the development of specialized algorithms for real-world projects.

Examining Pipe Cost Changes after Various Disasters in Los Angeles, California

Conference Paper

Aug 2023

Prediction of Wind Speed by Using Machine Learning

Chapter

Jun 2023

Due to the depletion of fossil fuel resources and environmental concerns caused by traditional fuel systems in recent years, the share of renewable energy sources in current energy production has been increasing. Among these energy sources, wind and solar energy stand out compared to other sources. Wind energy is a clean, sustainable and low-cost energy source. Wind and solar energies vary considerably according to the stochastic environment of meteorological conditions. Solar and wind energy variability and uncontrollability lead to power quality, generation-consumption balance and reliability problems of solar and wind energy systems. For this reason, it is important to know and predict the wind speed and solar radiation characteristics of the regions where the systems are installed. In this study, meteorological data of Antalya Serik Region were analyzed using statistical methods and wavelet transform. Thus, the potentials of wind and solar energies in the study area and large and small-scale events affecting these potentials were determined. In addition, a short-term estimation study was made for wind intensity and solar radiation using the time series of meteorological data. Besides SARMA, SARMAX and NAR models, Wavelet-NARX, SARMAX-NAR and NAR-SARMAX hybrid models are employed. Hybrid models are successfully produced better results than component forecasts.KeywordsANNWind speed forecastingHybrid forecastingNARNARXSARMASARMAXWavelet

Forecasting Pipeline Construction Costs Using Recurrent Neural Networks

Conference Paper

Full-text available

Jul 2021

Food waste reduction and economic savings in times of crisis: The potential of machine learning methods to plan guest attendance in Swedish public catering during the Covid-19 pandemic

Article

Full-text available

Mar 2021
Soc Econ Plann Sci

Food waste is a significant problem within public catering establishments in any normal situation. During spring 2020 the Covid-19 pandemic placed the public catering system under greater pressure, revealing weaknesses within the system and generation of food waste due to rapidly changing consumption patterns. In times of crisis, it is especially important to conserve resources and allocate existing resources to areas where they can be of most use, but this poses significant challenges. This study evaluated the potential of a forecasting model to predict guest attendance during the start and throughout the pandemic. This was done by collecting data on guest attendance in Swedish school and preschool catering establishments before and during the pandemic, and using a machine learning approach to predict future guest attendance based on historical data. Comparison of various learning methods revealed that random forest produced more accurate forecasts than a simple artificial neural network, with conditional mean absolute prediction error of <0.15 for the trained dataset. Economic savings were obtained by forecasting compared with a no-plan scenario, supporting selection of the random forest approach for effective forecasting of meal planning. Overall, the results obtained using forecasting models for meal planning in times of crisis confirmed their usefulness. Continuous use can improve estimates for the test period, due to the agile and flexible nature of these models. This is particularly important when guest attendance is unpredictable, so that production planning can be optimized to reduce food waste and contribute to a more sustainable and resilient food system.

Investigating the impact of inflation on labour wages in Construction Industry of Malaysia

Article

Full-text available

Jan 2021

Labours in construction are one of the main pillars in the construction industry of Malaysia for projects execution. Construction labours not only contributes to the development of the construction industry but also impacts the Malaysian economy. Consideration of labour wages is made in the initial phase of the project budget, however, wages are getting changed over time. The inflation rate is one of the key factors which affect labours wages. Regrettably, the inflation rate is being ignored while computing labour wages for projects budget development, resulting in cost overrun of construction projects. In this regard, the correlation coefficient test was used to determine the impact of the inflation rate on labour wages gathered from the year 2013 to 2019. The results showed that a significant acceptable relationship exists among the inflation rate and several categories of labour wages. Most of the labour wages showed a negative relationship with the inflation rate, indicating the deviation in the wages, thus, result in cost overrun. To steer the cost overrun effect, it is recommended to adopt automation system and introduce the Industrial Revolution (IR) 4.0 in construction projects as a replacer of labours.

The impact of COVID-19 on unemployment rate: An intelligent based unemployment rate prediction in selected countries of Europe

Article

Full-text available

Jan 2021

Unemployment remains a major cause for both developed and developing nations, due to which they lose their financial and economic impact as a whole. Unemployment rate prediction achieved researcher attention from a fast few years. The intention of doing our research is to examine the impact of the coronavirus on the unemployment rate. Accurately predicting the unemployment rate is a stimulating job for policymakers, which plays an imperative role in a country's financial and financial development planning. Classical time series models such as ARIMA models and advanced non‐linear time series methods be previously hired for unemployment rate prediction. It is known to us that mostly these data sets are non‐linear as well as non‐stationary. Consequently, a random error can be produced by a distinct time series prediction model. Our research considers hybrid prediction approaches supported by linear and non‐linear models to preserve forecast the unemployment rates much precisely. These hybrid approaches of the unemployment rate can advance their estimates by reproducing the unemployment ratio irregularity. These models' appliance is exposed to six unemployment rate statistics sets from Europe's selected countries, specifically France, Spain, Belgium, Turkey, Italy and Germany. Among these hybrid models, the hybrid ARIMA‐ARNN forecasting model performed well for France, Belgium, Turkey and Germany, whereas hybrid ARIMA‐SVM performed outclass for Spain and Italy. Furthermore, these models are used for the best future prediction. Results show that the unemployment rate will be higher in the coming years, which is the consequence of the coronavirus, and it will take at least 5 years to overcome the impact of COVID‐19 in these countries.

Pipeline Construction Cost Forecasting Using Multivariate Time Series Methods

Article

Full-text available

Dec 2020

Pipe material and labor costs constitute about seventy percent of pipeline construction costs. Pipe and labor costs are subject to considerable fluctuations over time. These fluctuations are problematic for cost estimation and bid preparation in pipeline projects, which are mostly large and long-term projects. The accurate prediction of pipe and labor costs is invaluable for cost estimators to prepare accurate bids and manage the cost contingencies. However, existing literature does not take advantage of the leading indicators of pipeline construction cost time series to accurately forecast cost fluctuations in pipeline projects. The objective of this research is to identify the leading indicators of pipeline construction costs and develop multivariate time series models for forecasting cost fluctuations in pipeline projects. Nineteen potential leading indicators of pipe and labor costs were initially selected based on a comprehensive review of construction cost forecasting literature. The leading indicators were identified from this pool of potential leading indicators based on unit root tests and Granger causality tests. Multivariate time series models were developed based on the results of cointegration tests. Vector Error Correction (VEC) models were developed for the cointegrated variables, while Vector Autoregressive (VAR) models were developed for the non-cointegrated variables. Since multivariate time series models include information from the identified leading indicators, multivariate time series models are often expected to deliver more accurate forecasts than univariate time series models. The forecasting accuracies of multivariate time series models were compared with those of univariate time series models based on three common error measures: mean absolute prediction error (MAPE), root mean squared error (RMSE), and mean average error (MAE). The results show that multivariate time series models outperform univariate models for forecasting cost fluctuations in pipeline projects. The findings of this research contribute to the state of knowledge by identifying leading indicators of pipe and labor costs and developing multivariate time series models to forecast them. The multivariate time series models with leading indicators are more accurate than univariate models for forecasting cost fluctuations in pipeline projects. It is expected that the proposed multivariate time series forecasting models contribute to the enhancement of the theory and practice of pipeline construction cost forecasting and help cost engineers and investment planners to prepare more accurate bids, cost estimates, and budgets for pipeline projects.

The Link between Labor Cost and Price Inflation in the Euro Area

Article

Jan 2019

Purdue Index for Construction Analytics: Prediction and Forecasting Model Development

Article

Sep 2021

Purdue Index for Construction (Pi-C) was developed to gauge the health of the construction industry. It is a composite index consisting of five dimensions: economic, stability, social, development, and quality. This research conducts a data-driven analysis to provide prediction and time-series forecasting models for Pi-C to (1) monitor and (2) provide guidance on how to improve the future health trajectory for the U.S. construction. Seasonal Autoregressive Integrated Moving Average (SARIMA) technique is applied for future trend analysis; Multiple Linear Regression (MLR) and Random Forests (RF) are applied for prediction models of Pi-C data analytics. It is expected that the proposed prediction and time-series forecasting models will help decision makers, including policy developers, and construction practitioners to take necessary action in a timely manner as well as open the discourse on advanced application of analytics and data-driven decision making in the construction industry.

Neural Network-based interval forecasting of construction material prices

Article

Feb 2021

Accurate prediction of material costs is essential for proper management and budgeting of construction projects. Material price fluctuation is one of the most important contributors to deviations from the initial estimated cost in construction projects. Traditional machine learning techniques often fail to generate accurate predictive estimates due to high uncertainties associated with material prices. To address this issue, this research proposes an artificial neural network (ANN)-based method to quantify uncertainties through the generation of forecasting intervals. The optimal lower upper bound estimation (optimal LUBE) method is adopted to train ANN to generate intervals directly. The proposed method is used to predict construction material prices in the US for asphalt and steel. It is shown that traditional regression analysis and ANN-based single-point estimates are of limited value for the prediction of material price. In contrast, prediction intervals provide reliable estimates for material prices and they reduce the possibility of project failure due to the inaccuracy of initial estimated costs. The results obtained from three other cost functions are compared to the proposed optimal LUBE cost function to testify the accuracy of the model. The achieved results show that the proposed optimal LUBE cost function presents the most accurate prediction intervals. This study employs a stacking procedure for monitoring and controlling the training process and validation purpose. The proposed interval forecasting method presents a new direction for cost prediction studies and will provide project managers with extra information to manage risks associated with project costs.

Predicting City-Level Construction Cost Index Using Linear Forecasting Models

Article

Feb 2021

Data-Driven Predictive Modeling of Highway Construction Cost Items

Article

Dec 2020

The highway network is an economically necessary form of transportation that has a significant impact on the quality of the life of the citizens who use it. Cost overruns in highway projects have been a universal occurrence that jeopardize the development, maintenance, and expansion of this vital infrastructure. Incorrect cost estimations can drive decision makers to pass ineffective policies that have played a large role in the cost overruns of transportation construction projects. The existing prediction models in the literature are limited in one or multiple areas of modeling approach, inputs, and model development robustness. In this research, a model was developed to accurately predict the total construction cost of highway projects by utilizing machine learning algorithms. This study developed a modeling pipeline to automate much of the cost forecasting process, reducing the amount of manual work and dependence on skilled data scientists. This study used the Florida Department of Transportation's (FDOT's) critical highway construction cost items between 2001 and 2017 to test the model. The highways of Florida were selected for testing due to the states' population growth, high immigrant population, logistics, and hurricane frequency. This study used a pool of five categories of independent variables (69 variables total), including the construction market, energy market, socioeconomics, US economy, and temporal variables, which were compiled from relevant sources and existing literature. The results revealed that our linear model exhibits superiority in generalization and prediction of cost items over nonlinear models and is capable of accurately forecasting highway construction costs. Our suggested approach in this study also provides more accurate forecasts for the detailed cost estimation by considering the monthly historical information for the average 92.6% of the six highway construction types mentioned with a 92.51% prediction accuracy. By employing our developed model, local governments, network operators, contractors, and logistics sectors would be capable of a more exact prediction of highway construction costs.

Improving Accuracy in Predicting City-level Construction Cost Indices by Combining Linear ARIMA and Nonlinear ANNS

Abstract

Recommended publications

Some Issues in the Identification of Structural Systems for Control and Response Prediction

Forecasting U.S. Shale Gas Monthly Production Using a Hybrid ARIMA and Metabolic Nonlinear Grey Mode...

Forecasting Architecture Billings Index Using Time Series Models

Forecasting Pipeline Construction Costs Using Recurrent Neural Networks

Predicting City-Level Construction Cost Index Using Linear Forecasting Models

Diagnosing and Quantifying Post-Disaster Pipe Material Cost Fluctuations