ArticlePDF Available

Accurate forecasting of building energy consumption via a novel ensembled deep learning method considering the cyclic feature

April 2020
Energy 201(2):117531

April 2020
201(2):117531

DOI:10.1016/j.energy.2020.117531

Authors:

Chengdong Li

Shandong Jianzhu (Architecture and Engineering) University

Show all 5 authorsHide

Short-term forecasting of building energy consumption (BEC) is significant for building energy reduction and real-time demand response. In this study, we propose a new method to realize half-hourly BEC prediction. In this new method, to fully utilize the existing data features and to further promote the forecasting performance, we divide the BEC data into the stable (cyclic) and stochastic components, and propose a novel hybrid model to model the stable and stochastic components respectively. The cyclic feature (CF) is extracted via the spectrum analysis, while the stochastic component is approximated by a novel Deep Belief Network (DBN) and Extreme Learning Machine (ELM) based ensembled model (DEEM). This novel hybrid model is named DEEM + CF. Furthermore, two real-world BEC experiments are performed to verify the proposed method. Also, to display the superiorities of the proposed DEEM + CF, this model is compared with the DBN, DBN + CF, ELM, ELM + CF, Support Vector Regression (SVR) and SVR + CF. Experimental results indicate that the CF has a great influence on the promotion of forecasting accuracy for approximately 20%, and DEEM + CF performance is the best among the comparative models, with at least 3%, 6%, 10% better accuracy than the DBN + CF, ELM + CF and SVR + CF respectively under the criteria of MAE.

Content uploaded by Wangda Zuo

Content may be subject to copyright.

Accurate Forecasting of Building Energy Consumption Via A Novel

Ensembled Deep Learning Method Considering the Cyclic Feature

Guiqing Zhanga, Chenlu Tiana, Chengdong Lia,d,∗, Jun Jason Zhangb, Wangda Zuoc

aShandong Key Laboratory of Intelligent Buildings Technology, School of Information and Electrical

Engineering, Shandong Jianzhu University, Jinan 250101, China

bSchool of Electrical Engineering and Automation, Wuhan University, Wuhan 430072, China

cDepartment of Civil, Environmental and Architectural Engineering, University of Colorado Boulder,

Boulder, CO 80309, U.S.A

dShandong Co-Innovation Center of Green Building, Jinan 250101, China

Abstract

Short-term forecasting of building energy consumption (BEC) is signiﬁcant for building

energy reduction and real-time demand response. In this study, we propose a new method to

realize half-hourly BEC prediction. In this new method, to fully utilize the existing data fea-

tures and to further promote the forecasting performance, we divide the BEC data into the

stable (cyclic) and stochastic components, and propose a novel hybrid model to model the

stable and stochastic components respectively. The cyclic feature (CF) is extracted via the

spectrum analysis, while the stochastic component is approximated by a novel Deep Belief

Network (DBN) and Extreme Learning Machine (ELM) based ensembled model (DEEM).

This novel hybrid model is named DEEM+CF. Furthermore, two real-world BEC experi-

ments are performed to verify the proposed method. Also, to display the superiorities of the

proposed DEEM+CF, this model is compared with the DBN, DBN+CF, ELM, ELM+CF,

Support Vector Regression (SVR) and SVR+CF. Experimental results indicate that the CF

has a great inﬂuence on the promotion of forecasting accuracy for approximately 20%, and

DEEM+CF performance is the best among the comparative models, with at least 3%, 6%,

10% better accuracy than the DBN+CF, ELM+CF and SVR+CF respectively under the

criteria of MAE.

Keywords: Building energy consumption, Cyclic feature, Deep belief network, Extreme

learning machine, Spectrum analysis

1. Introduction

The building energy consumption (BEC) accounts for about 30% of the whole energy

usage in the world, and it is still increasing in a fast speed [1]. The growing BEC has attracted

∗Corresponding author

Email addresses: qqzhang@sdjzu.edu.cn (Guiqing Zhang), chenlutian2017@sdjzu.edu.cn (Chenlu

Tian), lichengdong@sdjzu.edu.cn (Chengdong Li), jun.zhang.ee@whu.edu.cn (Jun Jason Zhang),

Wangda.Zuo@Colorado.edu (Wangda Zuo)

Preprint submitted to Elsevier March 2, 2020

Zhang, G., Tian, C., Li, C., Zhang, J., and Zuo, W., 2020

Accurate Forecasting of Building Energy Consumption Via A Novel Ensembled

Deep Learning Method Considering the Cyclic Feature. Energy.

This paper has been accepted by Energy on 03/31/2020.

much attention worldwide due to the environmental degradation [2]. On the other aspect,

recently, lots of advanced information technologies applied in buildings and grid make it

possible to realize end-to-end connection, pushing the building and grid to a new area where

the building’s role is transformed from the pure customer to multiple identical prosumer [3].

In such conditions, the hourly or half hourly short-term prediction of BEC has become a

foundation task in the real-time demand response, building energy optimization, etc., which

play a great role in both building energy reduction and grid operation and management [4].

To achieve short-term forecasting of BEC, lots of researches are conducted using various

methods. The methods applied in this domain mainly include physical models [5, 6], statistic

models [7], and machine learning methods [8]. Among these methods, machine learning

has become one of the most promising methods recently because of its good capacity in

nonlinear approximation without the need for some detailed or unavailable building and

environmental knowledge. Machine learning can be divided into the traditional machine

learning and deep learning. Each machine learning method has speciﬁc advantages and

application circumstances [9, 10, 11]. Aiming at improving the prediction performance of

BEC, some traditional machine learning methods are always integrated together according to

the application requirements. For example, in [12], the random forest is combined with the

back propagation neural network to generate a hybrid model for performance forecasting

of the ground source heat pump system. Jung [13] utilized the improved least-squared

Support Vector Regression (SVR) to realize more accurate BEC forecasting. Yuan et al.

[14] adopted the particle swarm optimization in an improved ELM for the robust forecasting

of the BEC. Huang et al. [15] constructed an ensemble forecasting model which combined

the extreme gradient boosting, SVR, ELM, and the multiple linear regression for energy

demand forecasting. These ensemble methods have achieved good results, and ELM is one

of the most popular method for its fast computing and good capability of approximation.

Another popular idea to achieve and improve the forecasting of BEC is to combine the

deep learning model with traditional model, because the deep learning has deeper computing

layers and allow higher levels of feature and relation abstractions [16], while the traditional

machine learning has lower computational complexity. Inspired by the idea of parallel system

and parallel learning [17, 18, 19], Tian et al. [20] utilized GAN to achieve data enhancement

which was applied in some traditional machine learning methods to improve the forecasting

results. Fu [21] presented a hybrid model adopting the empirical mode decomposition and

DBN to realize the forecasting of the building cooling load. Li et al. [22] proposed a modiﬁed

DBN utilizing ELM to boost the forecasting accuracy of BEC. However, in existing studies,

the abstracted features from various layers of the deep learning models are not fully utilized.

Even though, people have made lots of contributions to the improvement of machine

learning method in BEC forecasting, one unavoidable problem is that the performance of

machine learning method relies greatly on the input data, thus, it is signiﬁcant to extract and

utilize the valuable features which are inherent in the original data. Recently, deep learning

began to be applied in feature engineering of BEC forecasting. In [23], Autoencoders and

GAN are used in feature extraction to improve the prediction accuracy of BEC. Some other

deep learning models such as DBN are also professional in feature abstraction via layer-

by-layer processing, but such features from each layer of deep learning model are not fully

utilized. Besides the inherent features extracted by some deep learning methods, BEC has

its obvious cyclic feature – the daily periodic feature. People always leave home at about 8-9

am, and go back at 5-7 pm. Even in their work place, they work for a while then have a rest.

Also, the temperature are also periodic in one day [24]. All of these periodic components

combine to lead to the daily cyclic feature of BEC. In [25], the cyclic feature of electricity

demand is analyzed deeply and is utilized to generate the synthetic sequences. However, to

the authors’ knowledge, such cyclic feature has barely been taken into account in the present

algorithms for the BEC forecasting.

In this paper, a novel DBN and ELM based ensembled method considering the cyclic

feature of the observed data, named DEEM+CF, is proposed to achieve half hourly short-

term prediction of BEC. In this new method, the main steps are listed below:

•Firstly, the cyclic feature of daily BEC is extracted by spectrum analysis, and the

original data is divided into stable (cyclic) and the stochastic ones.

•Secondly, the DEEM is utilized to predict the stochastic ones. In the DEEM, diﬀerent

layers of the DBN are used to abstract diﬀerent levels of stochastic data features,

and the new constructed feature sets from each layer of DBN are then used to train

the corresponding ELMs. Such ELMs output the preliminary forecasting results, and

further being integrated by another ELM to generate the ﬁnal predicted results for

the stochastic components. The DEEM takes full use of all abstracted features from

each layer of DBN.

•Thirdly, the predicted results from DEEM are combined with the cyclic feature to give

the ﬁnal forecasting outputs of BEC.

What’s more, to prove the eﬀectiveness and the superiorities of the proposed DEEM+CF

model, two experiments utilizing two real-world datasets are conducted in this paper, and

comparisons with the pure DBN, the DBN+CF, the ELM, the ELM+CF, the SVR and the

SVR+CF are made. Experimental results and comparisons demonstrate that the utilization

of the cyclic feature can greatly promote the BEC prediction accuracy approximately 20%,

and the DEEM+CF performs at least 3%, 6%, 10% better than the DBN+CF, ELM+CF

and SVR+CF.

The remainder of this paper is organized as follows. Section 2 gives a basic introduction

of the DBN and ELM. In section 3, the DEEM+CF model is proposed and illustrated

in details. In Section 4, two experiments utilizing two real-world datasets are performed to

prove the superiorities and eﬀectiveness of the DEEM+CF. Finally, we draw the conclusions

of this research in Section 5.

2. Methodologies

The DBN and ELM models are the basic components of the proposed DEEM. In this

section, DBN and ELM will be introduced brieﬂy.

Figure 1: The architecture of DBN [26]

2.1. Deep Belief Networks (DBN)

DBN is stacked by several Restricted Bolzmann Machines (RBMs) one by one [26] as

depicted in Figure 1. It is expected to extract high levels of features out of the input data

space via layer-by-layer processing.

A single RBM is typically constituted by a hidden layer and a visible layer, and the

nodes of various layers are fully connected. The visible layer nodes are regarded as the

inputs, while the hidden layer nodes are seen as the outputs. The node values in each layer

constitute the binary vector as follows

v={v1, v2,· · · , vi,· · · , vm}T∈ {0,1}m,(1)

h={h1, h2,· · · , hj,· · · , hn}T∈ {0,1}n,(2)

where viis the visible variable in the visible layer, hiis the hidden variable in the hidden

layer, mis the number of the visible layer nodes, and nis the total number of the hidden

layer nodes. The RBM is a model based on energy which is always expected to be lowest.

The energy function could be described as [26]

E(v,h|Θ) = −aTv−bTh−vTW h =−X

aivi−X

bjhj−X

viwij hj,(3)

in which Θ = {W,a,b}represents the set of the model parameters, W∈RI×Jis the

weighting matrix, wij ∈Wis the weighting variable between viand hj,a∈RIand b∈RJ

are the bias vectors, ai∈ais the bias of each vi, and bj∈bis the bias of each hj.

To obtain well trained RBMs, the partial derivative of Θ needs to be computed via Gibbs

sampling, however it is time-consuming to run the Gibbs sampling for many times. To solve

this problem, Hinton [27] proposed the contrastive divergence method to train RBMs, and

this method just needs to run Gibbs sampling for Ktimes. Usually, when K= 1, the RBM

is trained well.

In DBN, the hidden layer of the former RBM is the input layer of the next RBM, and

the output of the ultimate RBM is fed into logistic regression. The training process of the

initial DBN is constituted by two stages which are the pre-training and ﬁne-tuning process.

To begin, suppose that there is a training dataset (X

X , y

y) which has Nsamples {(x

xk, yk)}N

k=1

where x

xk= [x1

k, x2

k,· · · , xm

k]. The detailed training processes for the DBN are listed below

[26]:

•Step 1: Initialize the parameters of DBN including the number of input nodes m,

the number of hidden and output nodes n, and the number of the hidden layers L.

•Step 2: Input X

Xto the visible layer to train the weighting matrix Θ2

1that connects

the input layer and the second layer. Θ2

1is computed. From this training process, the

node values h

h(2) in the second layer of DBN will be obtained.

•Step 3: The node values h

h(2) in the second layer are then used to determine Θ3

Then, the node values h

h(3) in the third layer will be gained.

•Step 4: Let l= 3, the node values h

h(l) in the lth layer are used to train Θl+1

l, and

the output results h

h(l+ 1) of the (l+ 1)th layer in the DBN will be got.

•Step 5: Set l=l+ 1, and then the step 4 is iterated until l > L + 1.

•Step 6: The outputs from the last hidden layer are fed into a logistic regression part to

generate the ﬁnal output of the DBN. Then, utilize the training data set (X

X , y

y) again

to realize the ﬁne tune of all the parameters of the DBN by the backward propagation

algorithm.

2.2. Extreme Learning Machine (ELM)

Suppose that the ELM has nhidden nodes and one output node. The architecture of

the ELM is depicted in Figure 2. For the input x

x= [x1, x2,· · · , xm], the output of the ELM

can be presented as

f(x

x) =

j=1

βjg(x

x, a

aj, bj) (4)

where w

wj= (aj

aj, bj)Tis the weighting vector that connects the input and hidden nodes, and

it is randomly given, β

βis the output weighting vector that connects the hidden and output

layers, and grepresents the activation function.

In the training process of the ELM, no iteration is needed. For the given training dataset

X , y

y) which has Nsamples {(x

xk, yk)}N

k=1 where x

xk= [x1

k, x2

k,· · · , xm

k], we ﬁrstly compute

the training matrix as

H=





g(a

a1x

x1+b1)... g(a

anx

x1+bn)

g(a

a1x

x2+b1)... g(a

anx

x2+bn)

... ... ...

g(a

a1x

xN+b1)... g(a

anx

xN+bn)







,(5)

Figure 2: The architecture overview of ELM [28]

in which the parameters a

ai, bi(i= 1,· · · , n) are randomly given. Then, the weights β

connecting the hidden layer and the output layer are directly computed via the least square

estimation method as

β=H

H+y

y(6)

where “ + ” means the Moore-Penrose generalized inverse, and y

y= [y1, ..., yN]T.

3. The Proposed Forecasting Model Considering the Cyclic Feature

This section presents the proposed DBN and ELM based ensemble method considering

the cyclic feature, named DEEM+CF. For clear elaborations, the scheme of the proposed

forecasting model will be introduced ﬁrstly, and then the cyclic feature extraction will be

given, and ﬁnally, how to construct the DEEM will be illustrated.

3.1. The Scheme of the Proposed DEEM+CF

The scheme for constructing the proposed DEEM+CF model is shown in Figure 3 and

is brieﬂy illustrated as follows:

•Step 1: Extract the cyclic feature which is the stable component of the original BEC

time series data.

•Step 2: Generate the stochastic time series data which is the residual part of the

original BEC data after removing the stable component – the cyclic feature. Then,

transform the stochastic time series data to the stochastic training dataset.

•Step 3: Utilize the stochastic training dataset to optimize the DEEM to achieve the

optimal forecasting performance for the stochastic data.

•Step 4: Integrate the predicted stochastic results with cyclic features to achieve the

ﬁnal prediction of BEC.

Below, we will give the design details of the proposed DEEM+CF.

Figure 3: The proposed forecasting scheme

3.2. The Cyclic Feature Extraction via Spectrum Analysis

3.2.1. Spectrum Analysis

One complicated signal can be transformed to simple waves which have speciﬁc cyclic

periods [29, 30]. Spectrum Analysis is able to achieve such decomposition in format of

Fourier series [31, 32] . Recently, this method is always adopted to analyze the inherent

information in many domains such as transportation [33, 34], electricity forecasting [35],

fault detection [36] and solar radiation analysis [37, 38]. Here, the spectrum analysis is

selected to extract the daily cyclic features of BEC series for its good capacity in ﬁnding

cyclic components. The following of this part is the deﬁnition of cyclic spectrum function .

Assume that f(t) is a periodic series with sampling period T. Then, f(t) could be

expressed to be Fourier series as

f(t) =

j=1

cj·ejkwt (7)

where ajis the coeﬃcient, and w=2π

The Fourier series can also be expanded to be trigonometric polynomials series as

f(t) = c0

∞

γ=1

cγcos(kwt) +

∞

γ=1

dγsin(kwt) (8)

where c0, cγ, dγcan be presented as

c0=T

2ZT

−T

f(t)dt, cγ=T

2ZT

−T

f(t)cos(γwt)dt, dγ=T

2ZT

−T

f(t)sin(γwt)dt. (9)

3.2.2. The Extraction of the Cyclic Feature

To get the cyclic features, ﬁrstly, the average of daily BEC value ptis calculated and

obtained. The average of the daily BEC is expressed as

pt=1

i=1

i(10)

where ptis the average of the daily values at time t,vt

iis the original value at time ton the

ith day, Dis the number of total days.

Then the spectrum function is utilized to extract the cyclic features of daily BEC. To

obtain more reasonable cyclic features, BIC is adopted to evaluate the performance of spec-

trum function. The number of cyclic components (trigonometric waves) will be increased,

and BIC of each diﬀerent spectrum function is calculated. We select the spectrum function

which has the lowest BIC as the cyclic model of BEC, and the daily stable components pt

are obtained as

ˆpt=c0+c1sin 2πt

N+d1cos 2πt

N+· · · +cnsin 2nπt

N+dncos 2nπt

N,(11)

where Nis the number of daily collected data, c0, c1,· · · , cn, d1,· · · , dnare computed via

the least square estimation method as follows







· · ·













1 sin 2π

Ncos 2π

N· · · sin 2nπ

Ncos 2nπ

N

1 sin 4π

Ncos 4π

N· · · sin 4nπ

Ncos 4nπ

N

· · · · · · · · · · · · · · · · · ·

1 sin 2Nπ

Ncos 2Nπ

N· · · sin 2N∗nπ

Ncos 2N∗nπ

N













...







(12)

where “ + ” means the Moore-Penrose generalized inverse.

After the cyclic features are obtained, the stochastic components are computed via get-

ting rid of the stable ones (cyclic features) from the original data. Each original data can

be divided into the stable component and stochastic component as

i= ˆpt+xt

i(13)

where xt

iis the stochastic component at time ton the ith day.

The stable components reﬂect the trend of the BEC, while the stochastic components

present the speciﬁc and random features of the BEC. The stochastic components are com-

bined to 1-D time series data {x1, x2,· · · }. This remaining stochastic BEC data series is then

transformed to the stochastic training dataset (X

X0, y

y) which has Nsamples {(x

x0,k, yk)}N

k=1

where x

x0k= [x1

0,k, x2

0,k,· · · , xm

0,k].

Figure 4: The architecture overview of the proposed DEEM.

3.3. Remaining Stochastic Data-Driven Design of the DEEM

3.3.1. The Framework of the DEEM

In this section, the DBN and ELMs are integrated in an ensemble forecasting method for

the forecasting of the stochastic BEC data. The architecture of the DEEM is shown in Figure

4, in which the DBN is utilized to generate the new representative feature datasets, and the

ELMs are selected to be the premier and ensemble forecasting models. Firstly, the original

training dataset is input to DBN model, and DBN extracts the new data features from the

stochastic training dataset via layer-by-layer processing. Each layer of DBN outputs one

new feature dataset which combines the target values to be the new training dataset. Then

the new training datasets are utilized to train separate ELMs to get the premier predicted

results of target values. Finally, all of the premier results are integrated and then combined

with the target values again to train another ELM and get the ﬁnal predicted results.

The construction steps of the DEEM are listed below.

*Input: The stochastic training data sets (X

X0, y

y), the number of the hidden layers of

the DBN.

*Output: The ﬁnal predicted result ˆy

ˆy

ˆyfor the stochastic component.

•Step 1: Input the stochastic training dataset (X

X0, y

y) to the DBN model, and train

the DBN model. Suppose that the outputs from the ith hidden layer are X

Xi(i=

1,2,· · · , l), then from the ith hidden layer, one new dataset (X

Xi, y

y) will be generated.

•Step 2: Input the generated dataset (X

Xi, y

y) to one corresponding ELM model to train

it and get individual predicted results y

yi(i= 1,2,· · · , l). Besides, the initial training

dataset (X

X0, y

y) is also used to train a single ELM to obtain the predicted results y

y0.

•Step 3: Integrate all of the individual predicted results yi

yis(i= 0,1,· · · , l) by another

ELM model to generate the ﬁnal predicted result ˆy

ˆy

ˆy.

In the DEEM model, from each hidden layer of DBN, we will generate one new dataset.

The input stochastic training dataset and the newly constructed datasets will all be used

to participate the forecasting of the BEC. Compared with the conventional deep learning

models, in the DEEM, the initial training dataset and all of the abstracted features from

the hidden layers are fully utilized.

Below, we will explain the details of such steps.

3.3.2. Training Data Generation and Learning of ELMs

The newly generated training datasets are obtained from each layer of the DBN. The

newly constructed training dataset for the ith hidden layer is (X

Xi, y

y), and X

Xiis obtained as

Xi= ˆgi(X

Xi−1, W

Wi, a

ai, b

bi) (14)

where ˆgi(·) represents the activation function in the ith hidden layer of the DBN, and

Wi, a

ai, b

bi) is the weighting matrix connecting the (i−1)th and ith hidden layers of the

DBN.

The input part X

Xiof the newly generated dataset can be ﬁnally presented as

Xi=





xi,1

xi,2

...

xi,N







=





i,1... xm

i,1

i,2... xm

i,2

... ... ...

i,N ... xm

i,N







(15)

where mis the number of the ith hidden layer nodes in the DBN.

If the DBN has lhidden layers for feature abstraction, we will obtain l+ 1 training

datasets which include lnewly generated training datasets and one original stochastic train-

ing dataset. The l+ 1 training datasets will be utilized to construct l+ 1 corresponding

ELMs and to obtain l+1 premier individual forecasting results yis, which could be computed

yi,k =

j=1

βi,j gi(a

ai,jx

xi,k +bi,j ) (16)

where i= 0,1,· · · , l,k= 1,· · · , N ,gi(·) is the activation function in the ith ELM, (ai,j

ai,j

ai,j , bi,j )

is the weighting vector which connects the input layer and the hidden layer of the ith ELM,

and there are nihidden nodes in the ith ELM. β

βi= [βi,1,· · · , βi,ni]Tis the weighting vector

that connects the hidden and output layers of the ith ELM, and can be determined as

βi=βi,1, βi,2,· · · , βi,niT

=





gi(a

ai,1x

xi,1+bi,1)... gi(a

ai,nix

xi,1+bi,ni)

gi(a

ai,1x

xi,2+bi,1)... gi(a

ai,nix

xi,2+bi,ni)

gi(a

ai,1x

xi,N +bi,1)... gi(a

ai,nix

xi,N +bi,ni)







+











(17)

3.3.3. Design of the Ensemble Part

In the ultimate ensemble part, the l+1 premier predicted results will be ﬁrstly combined

to be a new training dataset, and then the newly generated dataset will be utilized to con-

struct the ensemble model which is chosen to be the ELM again due to its low computation

complexity and good capability in nonlinear approximation.

Assume that the integrated training dataset for the ﬁnal training is (Y

Y , y

y), where Y

Ycan

be expressed as

Y=y

y0, y

y1,· · · , y

yl=





y0,1· · · yl,1

y0,2· · · yl,2

y0,N · · · yl,N







(18)

in which yi,k is the predicted result for the input data x

xi,k, and can be obtained by (16).

Ycan also be expressed as

Y= [y

y(0), y

y(1),· · · , y

y(N)]T(19)

where y

y(i)= [y0,i, y1,i,· · · , yl,i]Tfor i= 1,2,· · · , N .

Then, the integrated training data set will be employed to construct another ELM. To

begin, suppose that the ELM in the ensemble part has qhidden nodes and its input-output

mappings can be given as

ˆyi=

p=1

βpˆg(ˆa

ˆa

ˆapy

y(i)+ˆ

bp) (20)

where i= 1,2,· · · , N , (ˆa

ˆa

ˆap,ˆ

bp) is the weighting matrix that connects the input and hidden

layers of the integration ELM model, ˆg(·) represents the activation function in the integration

ELM, and ˆ

βis the weighting vector connecting the hidden and output layers.

To assure the performance of the ensemble ELM, its weighting vector is also obtained

by the least square estimation as

β=ˆ

β1,ˆ

β2,· · · ,ˆ

βqT=





ˆg(ˆa

ˆa

ˆa1y

y(1) +ˆ

b1)· · · ˆg(ˆa

ˆa

ˆaqy

y(1) +ˆ

bq)

ˆg(ˆa

ˆa

ˆa1y

y(2) +ˆ

b1)· · · ˆg(ˆa

ˆa

ˆaqy

y(2) +ˆ

bq)

... ... ...

ˆg(ˆa

ˆa

ˆa1y

y(N)+ˆ

b1)· · · ˆg(ˆa

ˆa

ˆaqy

y(N)+ˆ

bq)







+





...







(21)

4. Experiments and Comparisons

To verify the advantages of the proposed DEEM+CF model, two comparative experi-

mental studies will be conducted in this section.

4.1. Experimental Setting and Applied Datasets

4.1.1. Comparative Methods

For the purpose of showing the advantages of the proposed DEEM+CF method, ﬁrstly,

several popular regression models including lasso regression [39], ridge regression [40] and

multi-polynomial [41] are adopted to be the comparative methods of spectrum analysis in

cyclic feature extraction. ELM is utilized to be the prediction model. Secondly, several

popular machine learning models, including the DBN, ELM, and the SVR, are selected to

be the comparative models of DEEM+CP. Besides, to verify the eﬀectiveness of the cyclic

feature furthermore, the hybrid models that combine the cyclic feature with the DBN, ELM

and SVR are respectively constructed to be the comparative models too, i.e. the DBN+CF,

ELM+CF, and SVR+CF are also designed to be the comparative models.

4.1.2. Evaluation Indices

To evaluate the forecasting performance, Mean Absolute Error (MAE), Mean Absolute

Percentage Error (MAPE), Root Mean Squared Error (RMSE), and Pearson Correlation

Coeﬃcient (r) are selected as the evaluation indices. The four comparative indices have

been widely used for forecasting accuracy evaluation and are computed as

MAE =1

m=1

|ˆym−ym|(22)

M AP E =1

m=1

|ˆym−ym|

×100% (23)

RMSE =v

m=1

(ˆym−ym)2(24)

r=PM

m=1(ˆym−E(ˆym))(ym−E(ym))

p(ˆym−E(ˆym)2)p(ym−E(ym)2)(25)

where ymand ˆymare respectively the observed values and predicted values, E(·) represents

the average of the samples.

Besides, to determine the number of cyclic features, Bayesian Information Criterion

(BIC) is adopted for model construction of the spectrum function. BIC can balance param-

eter adding and overﬁtting, and lower BIC means better model. BIC is calculated as

BI C = ln(M)k−2ln(ˆ

L) (26)

where M is the number of data, k is the number of parameters adopted by model, ˆ

Lis the

maximum likelihood function of the model.

When the errors of model are independent and distributed according to normal distri-

bution, BIC can be presented as

BI C =Mln( ˆ

σ2) + kln(M) (27)

where ˆ

σ2is the error variance which is computed as

σ2=1

m=1

(ˆym−ym)2(28)

4.1.3. Applied Dataset

Two buildings are chosen as the testing buildings to prove the eﬀectiveness and superiori-

ties of the DEEM+CF method. The BEC datasets are retrieved from https://trynthink.github.io/

buildingsdatasets/.

The ﬁrst building is located in Hialeah which is one of the warmest place in America. Its

energy consumption status was collected every 15 minutes from January 1, 2010 to December

31, 2010. There are 34940 samples in the dataset. Comparatively, the energy consumption

in summer is higher than the other seasons in this building. The original data was processed

and aggregated into the 30 minutes interval, and 17470 samples are obtained ﬁnally. In the

newly dataset, the value scale of energy consumption in half an hour is between 219 to 1032

kW.

The second building is from Pico Rivera, CA where the climate is comfortable. There

are four collected data points in one hour, and the data from January 1, 2010 to October

31, 2010 were selected. The data collected in summer is more ﬂuctuant than in winter. The

data from this building was integrated into the 30 minutes interval too, and there are 14592

samples at last. The highest BEC in half an hour is 997 kW and the lowest value is 191 kW.

The original daily BEC data of the two buildings in one month is displayed in Figure

5(a) and Figure 5(b). In each experiment, the stochastic data and the original data will be

divided into two parts, we use the ﬁrst 70% data as training dataset and the left 30% for

testing, and the size of the input sequence is set to be 10.

(a) (b)

Figure 5: (a) Daily BEC data (kW) in the ﬁrst experiment, (b) Daily BEC data (kW) in the second

experiment.

4.2. The First Experiment

4.2.1. Conﬁguration of the Forecasting Models

The proper conﬁguration of parameters is important for the forecasting accuracy of ma-

chine learning models. In this experiment, the optimal parameters of the models, including

the spectrum function, DEEM, and the comparative models are detailed below.

(a) Conﬁguration of the spectrum function

To obtain proper number of the cyclic waves in spectrum functions, the average of the

BEC time series in the ﬁrst building is computed ﬁrstly via (10) and then input to the

spectrum function model. The number of the cyclic waves in the form of trigonometric

functions changes from 1 to 30. The performance of each spectrum function with speciﬁc

number of cyclic waves is evaluated via BIC. The lower BIC means more reasonable cyclic

features are extracted without signiﬁcant overﬁtting.

Figure 6(a) illustrates the performances of the spectrum functions with diﬀerent number

of cyclic waves. From this ﬁgure, we can see that when the spectrum function has 25 cyclic

waves, it obtains the best performance in this experiment. To show the cyclic features more

clearly, the spectrum map which reﬂect the amplitude of cyclic waves which have diﬀerent

frequencies is show in Figure 6(b). It is clear that there are two signiﬁcant cycles in period

of 3-4 hours and 24 hours, and these two signiﬁcant cycles are combined with other 23 cycles

to present the daily cyclic feature of BEC. The stable and the stochastic time series data are

then obtained. Figure 6(c) demonstrates the ﬁrst 500 original BEC data of the ﬁrst building,

and Figure 6(d) presents all of the remaining stochastic BEC data of the ﬁrst building.

On the other aspect, to evaluate the performance of spectrum function in cyclic feature

extraction, lasso regression, ridge regression and multi polynomial regression are also used

to model the cyclic features, and ELM is selected to be the prediction model. Here, the

penalty coeﬃcient of lasso regression is set to be 1, and the number of dimensions is set to

be 26. The penalty coeﬃcient and the number of dimensions in ridge regression is set to

be 0.01 and 26 separately. The highest degree of independent variable in multi-polynomial

regression is set to be 10. All of the experiments are conducted for ten times, and the

10 20 30 40

−2e+09 0e+00 2e+09

(a)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

BIC

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

(b)

0 50 150 250 350

Amplitude

0 100 200 300 400 500

200 400 600 800

(c)

Originsl Data

0 5000 10000 15000

−400 0 200 400

(d)

Stochastic Data

Figure 6: (a) The performance of the spectrum functions with diﬀerent number of trigonometric waves in

the ﬁrst experiment,(b) The spectrum map of cycle features in the ﬁrst experiment (c) The former 500

original BEC data in the ﬁrst experiment, (d) The stochastic BEC data in the ﬁrst experiment.

predicted results using four cyclic feature models are compared under the criteria of MAE,

MAPE, RMSE and r.

(b) Conﬁguration of the DEEM

The DEEM is composed of the DBN and ELMs, thus, for the purpose of achieving

accurate forecasting, it is signiﬁcant to determine the proper numbers of the hidden layers

and the nodes in each hidden layer of the DBN and the ELMs. In order to seek the best

structure of the DEEM, the parameter searching experiment of the DEEM is conducted in

two stages. In the ﬁrst stage, we ﬁx the numbers of the hidden nodes in the ELMs, while

changing the numbers of the hidden layers and the nodes in each hidden layer of the DBN. In

the second stage, the selected best structure for the DBN in the ﬁrst stage is ﬁxed, while the

numbers of the nodes in the hidden layer of the premier and integration ELMs are changed.

Here, the original data are utilized to determine the best structure of DEEM for evaluation

of the proposed method.

In the ﬁrst stage, we set the number of hidden layers from 1 to 7, and change the number

of the nodes in each hidden layer of the DBN from 50 to 800 at interval of 50. The predicted

performance of each DEEM with diﬀerent number of hidden layers and hidden nodes is

evaluated under the criteria of MAE. Figure 7(a) shows the MAEs of diﬀerent DEEMs. The

lower value of MAE means better forecasting performance of DEEM. It is clear that when

we choose 5 hidden layers and 750 hidden nodes in each layer of the DBN, MAE reaches the

minimal value throughout all of the results.

In the second stage, we ﬁx the structure of the DBN with 5 hidden layers and 750 hidden

(a) (b)

Figure 7: (a) The MAEs of the DEEMs with diﬀerent numbers of hidden layer and hidden nodes in DBN

when the premier and integration ELMs are ﬁxed in the ﬁrst experiment, (b) The MAEs of the DEEMs

with diﬀerent numbers of hidden nodes in premier and integration ELMs when the DBN is ﬁxed in the ﬁrst

experiment.

nodes in each layer, while the number of hidden nodes in the premier ELMs are set from 5 to

50 at the interval of 5, and the number of hidden nodes in the integration ELM is changed

from 10 to 100 at the interval of 10. As the ﬁrst stage, the performances of all of the DEEMs

with diﬀerent number of hidden nodes in premier and integration ELMs are compared under

the criteria of MAE. Figure 7(b) shows the MAE comparison of such DEEMs. It can be

seen from this ﬁgure that the best premier and integration ELMs have 50 and 35 hidden

nodes respectively.

To achieve the rationality of performance comparison, the optimal parameter searching

processes of the DBN, the ELM and the SVR using the original data are also carried to

accomplish the best performance of the comparative models.

(a) (b)

Figure 8: (a) The MAEs of the DBNs with diﬀerent numbers of hidden layer and nodes when the regression

part is ﬁxed in the ﬁrst experiment, (b) Fine details of the forecasting performance in Figure 8(a).

0 100 200 300 400 500

35 40 45 50

Number of the hidden nodes of the ELM

MAE

Polyfit line

(60,33.764)

Figure 9: The MAEs of the ELMs in the ﬁrst experiment.

In this paper, the adopted DBN is composed of several RBMs and one fully connected

layer for logistic regression. For the DBN, the numbers of the hidden layers and the nodes in

each hidden layer are also key factors aﬀecting the forecasting accuracy. To obtain the best

parameters, the number of the nodes in each hidden layer is also changed from 50 to 800 at

interval of 50, the number of the hidden layers is set from 1 to 7, and the number of hidden

nodes in the regression part changes from 5 to 50 at interval of 5. MAE is selected again

to evaluate the performance of DBNs when the number of hidden layer and the numbers of

nodes in hidden layer and regression part are changed separately. The MAE achieves the

lowest result when the DBN has 2 hidden layers, 650 nodes in each hidden layer, and 35

hidden nodes in the regression part. Figure 8(a) shows the forecasting performance of the

DBNs with diﬀerent numbers of hidden nodes and layers but ﬁxed regression part. To trace

the important details of Figure 8(a), the key part of Figure 8(a) is zoomed in Figure 8(b).

The best structure of the ELM for comparison is also explored. The number of hidden

nodes in the ELM is set from 10 to 500 at the interval of 10. Figure 9 shows the MAEs of

such ELMs. We can observe that the best ELM has 60 hidden nodes.

For the SVR, we choose the RBF function to be its kernel function again. And, through

testing, we set the penalty and kernel coeﬃcients to be 0.5 and 0.6 respectively.

4.2.2. Experimental Results

Table 1 shows the average values and standard derivations of MAE, RMSE, MAPE,

and r of the forecasting performance using diﬀerent cyclic feature models when the ELM is

selected to be the prediction model.

Table 2 demonstrates the average values and standard derivations of the MAE, RMSE,

MAPE, and r of the forecasting models considering or without considering the cyclic feature

which is obtained by spectrum analysis. The predicted residential errors of the proposed

DEEM+CF and the other comparative forecasting models are recorded and their kernel

density histograms are shown in Figure 10.

Table 1: Forecasting performance using diﬀerent cyclic feature models when ELM is selected to be the

prediction model in the ﬁrst experiment.

Model MAE RMSE MAPE(%) r

ELM+Lasso 31.627 ±1.274 44.692 ±1.184 5.044 ±0.189 0.971±1.674 ×10−3

ELM+Ridge 32.698 ±0.489 44.235±1.141 5.032 ±0.175 0.971±8.751 ×10−4

ELM+Multi-polynomial 30.468 ±1.230 41.787 ±0.571 4.920 ±0.133 0.975±5.253 ×10−4

ELM+Spectrum 25.450 ±0.338 39.995 ±0.389 5.121 ±0.137 0.977±3.695 ×10−4

ELM 34.009 ±1.054 48.165 ±1.159 5.826 ±0.223 0.964±1.701 ×10−3

Table 2: Performances of the forecasting models in the ﬁrst experiment (”model+CF” is the model consid-

ering the cyclic feature which is extracted by spectrum analysis).

Model MAE RMSE MAPE(%) r

SVR 36.751 ±0.000 48.065 ±0.000 6.479 ±0.000 0.965±0.000 ×10−4

SVR+CF 26.544 ±0.000 36.852 ±0.000 4.869 ±0.000 0.982±0.000 ×10−4

ELM 34.009 ±1.054 48.165 ±1.159 5.826 ±0.223 0.964±1.701 ×10−3

ELM+CF 25.450 ±0.338 39.995 ±0.389 5.121 ±0.137 0.977±3.695 ×10−4

DBN 32.197 ±0.593 46.565 ±0.444 5.504 ±0.108 0.966±6.311 ×10−4

DBN+CF 24.690 ±0.213 34.036 ±0.241 4.497 ±0.058 0.983±2.275 ×10−4

DEEM 30.462 ±0.450 43.892 ±0.385 5.159 ±0.084 0.970±5.116 ×10−4

DEEM+CF 23.832 ±0.069 33.259 ±0.109 4.200 ±0.046 0.984±1.071 ×10−4

4.3. The Second Experiment

4.3.1. Conﬁguration of the Forecasting Models

Similar conﬁguration schemes are utilized in this experiment. Details will be given below.

(a) Conﬁguration of the spectrum function in the second experiment

Figure 11(a) shows the performances of the spectrum functions which have diﬀerent

number of cyclic waves. According to this ﬁgure, the best spectrum function model has 26

trigonometric functions. Figure 11(b) shows the spectrum map of cyclic feature model. We

can see that there are two speciﬁc cycles in the period of 2 hours and 6 hours in 26 cycles.

The 26 cycles are combined to illustrate the daily BEC cyclic features in second building.

Figure 11(c) shows the ﬁrst 500 original BEC data from the second building. Figure 11(d)

presents the remaining stochastic BEC data after removing the cyclic feature.

Besides, the ridge regression, lasso regression and multi-polynomial regression are also

selected as the comparative methods of spectrum function. The conﬁgurations of these three

comparative models are as the ﬁrst experiment.

(b) Conﬁguration of the DEEM

Again, the parameter searching process of the DEEM is constituted by two stages.

In the ﬁrst stage, the structures of the premier and integration ELMs are ﬁxed, while

the numbers of the hidden nodes and layers of the DBN are changed. Figure 12(a) shows

the MAEs of the DEEMs which have diﬀerent numbers of hidden nodes and layers in the

DBN. According to Figure 12(a), the MAE of the DEEM obtains the lowest value when the

DBN has 3 hidden layers and 700 nodes in each hidden layer.

DEEM

−300 −200 −100 0 100 200

0.000 0.005 0.010 0.015

DBN

−300 −200 −100 0 100 200

0.000 0.005 0.010 0.015

ELM

−400 −200 0 100 200

0.000 0.005 0.010 0.015

SVR

−300 −200 −100 0 100 200

0.000 0.005 0.010 0.015

DEEM+CF

−200 −100 0 100 200

0.000 0.005 0.010 0.015

DBN+CF

−200 −100 0 100 200

0.000 0.005 0.010 0.015

ELM+CF

−200 −100 0 100 200 300

0.000 0.005 0.010 0.015

SVR+CF

−200 −100 0 100 200

0.000 0.005 0.010 0.015

Figure 10: The error histograms of the eight forecasting models in the ﬁrst experiment.

10 20 30 40

−5.0e+10 1.0e+11 2.0e+11

(a)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

BIC

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

(b)

0 200 400 600

Amplitude

0 100 200 300 400 500

300 400 500 600 700

(b)

Originsl Data

0 5000 10000 15000

−600 −200 0 200

(c)

Stochastic Data

Figure 11: (a) The performance of the spectrum functions with diﬀerent number of trigonometric waves in

the second experiment, (b) The spectrum map of cycle features in the second experiment , (c) The ﬁrst 500

original BEC data from the second building, (d) The remaining stochastic BEC data after removing the

cyclic feature in the second building.

(a) (b)

Figure 12: (a) The MAEs of the DEEMs with diﬀerent numbers of hidden layers and hidden nodes in DBN

when the ELMs are ﬁxed in the second experiment, (b) The MAEs of the DEEMs with diﬀerent premier

and integration ELMs when the DBN is ﬁxed in the second experiment.

(a) (b)

Figure 13: (a) The MAEs of the DBNs in the second experiment, (b) Fine details of the forecasting perfor-

mance in Figure 13(a).

In the second stage, the DBN in the DEEM is ﬁxed as determined in the ﬁrst stage, and,

we change the numbers of the hidden nodes in the premier and integration ELMs. Figure

12(b) illustrates the forecasting performance of such DEEMs with diﬀerent ELMs. From

this ﬁgure, the best premier ELMs have 60 hidden nodes, and the best integration ELM

have 15 hidden nodes.

In the second experiment, aiming at exploring the best structure of the DBN, we evaluate

the performance of diﬀerent DBNs whose hidden layers and the hidden nodes in each hidden

layer are respectively set from 1 to 7 and from 50 to 800 at the interval of 50. Figure 13(a)

illustrates the MAEs of such DBNs. The best comparative DBN model has two hidden

layers, 300 nodes in each hidden layer, and 30 hidden nodes for regression.

In order to acquire the best structure of the ELM in the second experiment, we change

the number of the hidden nodes from 10 to 500 at the interval of 10. Figure 14 shows the

0 100 200 300 400 500

25 30 35 40

Numbe of the hidden nodes of the ELM

MAE

Polyfit line

(270,24.756)

Figure 14: The MAEs of the ELMs in the second experiment.

Table 3: Performances of the forecasting performance using diﬀerent cyclic feature models when ELM is

selected to be the prediction model in the second experiment.

Model MAE RMSE MAPE(%) r

Lasso 27.380 ±1.143 42.007 ±0.887 5.840 ±0.271 0.965±1.400 ×10−3

Ridge 27.313 ±1.082 41.678 ±2.143 6.492 ±0.205 0.973±2.670 ×10−3

Multi-polynomial 25.146 ±1.082 38.917±1.665 5.330 ±0.342 0.969±2.576 ×10−3

Spectrum function 23.343 ±0.637 34.343 ±1.502 4.935 ±0.346 0.977±3.111 ×10−3

MAEs of such ELMs. According to this ﬁgure, the best ELM has 270 hidden nodes.

Furthermore, the optimal structure exploration procedures for the SVR are as the ﬁrst

experiment. For the SVR, we also utilzie the RBF activation function, and set the penalty

and kernel coeﬃcients to be 0.7 and 0.5 respectively.

4.3.2. Experimental results

In this experiment, the prediction of ELM using diﬀerent cyclic features extracted by

spectrum analysis, lasso regression, ridge regression and multi-polynomial were performed

for 10 times separately again. Table 3 presents the average of MAE, RMSE, MAPE, and r of

the forecasting performance of the ELM using diﬀerent cyclic feature models in the second

experiment.

Besides, the proposed DEEM+CF, the proposed DEEM, the DBN+CF, the DBN, the

ELM+CF, the ELM+CF, the ELM, the SVR+CF and the SVR were also conducted for 10

times again. The MAE, RMSE, MAPE, and r are chosen to be the comparative indices too.

Table 4 lists the comparison results of these models.

The forecasting errors of these eight models are also recorded. The kernel density his-

tograms of the forecasting errors of the eight forecasting models are displayed in Figure 15.

4.4. Comparison and Discussion

From the ﬁgures and tables above, we have the following observations and conclusions.

Table 4: Performances of the forecasting models in the second experiment.

Model MAE RMSE MAPE(%) r

SVR 38.700 ±0.000 50.790 ±0.000 8.260 ±0.000 0.954 ±0.000 ×10−4

SVR+CF 22.124 ±0.000 31.620 ±0.000 5.237 ±0.000 0.980 ±0.000 ×10−4

ELM 26.937 ±2.466 44.522 ±2.760 5.566 ±0.541 0.962 ±4.930 ×10−3

ELM+CF 22.449 ±0.975 32.413 ±1.356 4.755 ±0.243 0.978 ±2.414 ×10−3

DBN 25.219 ±0.855 42.235 ±0.360 5.241 ±0.064 0.966 ±6.069 ×10−4

DBN+CF 20.252 ±0.212 29.683 ±0.170 4.584 ±0.052 0.982 ±2.275 ×10−4

DEEM 23.793 ±0.141 41.032 ±0.228 4.875 ±0.045 0.968 ±3.702 ×10−4

DEEM+CF 19.063 ±0.132 28.685 ±0.122 4.247 ±0.041 0.983 ±1.071 ×10−4

DEEM

−300 −200 −100 0 100 200

0.000 0.005 0.010 0.015 0.020

DBN

−300 −200 −100 0 100 200

0.000 0.005 0.010 0.015 0.020

ELM

−300 −200 −100 0 100 200

0.000 0.005 0.010 0.015 0.020

SVR

−300 −200 −100 0 100 200

0.000 0.002 0.004 0.006 0.008 0.010 0.012

DEEM+CF

−200 −100 0 100

0.000 0.005 0.010 0.015 0.020

DBN+CF

−200 −100 0 100

0.000 0.005 0.010 0.015 0.020

ELM+CF

−200 −100 0 100

0.000 0.005 0.010 0.015 0.020

SVR+CF

−200 −100 0 100

0.000 0.005 0.010 0.015 0.020

Figure 15: The error histograms of the eight forecasting models in the second experiment.

•From Figures 6 (c) (d) and 11 (c) (d), we can see that the original data has clear

stable and periodic feature, but the remaining data is much more stochastic than the

original data.

•From Figures 7(a), 12(a), 8 and 13, it can be clearly seen that, with the increase of the

hidden layers of the DBN, the MAE value of the DEEMs has a downtrend, but the

MAE of the pure DBN model increases rapidly when the number of the hidden layer is

higher than a threshold. Consequently, we can conclude that the full utilization of the

extracted features from diﬀerent layers of the DBN can help to achieve more accurate

forecasting performance.

•According to Figures 7(b) and 12(b), the number of hidden nodes in the premier and

integration ELMs can inﬂuence the performance of the DEEM, and the hidden nodes

in the premier ELMs has greater inﬂuence compared with those in the integration

ELM.

•Table 1 and Table 3 present that the prediction utilizing cyclic feature extracted by

spectrum analysis obtain the best performance. Utilizing ridge regression, lasso regres-

sion, and multi polynomial models to extract the cyclic features can also improve the

forecasting performance of BEC. From our results the multi-polynomial also performs

better than the other two models.

•Table 2 and Table 4 demonstrate that the accuracy of the forecasting models consid-

ering the cyclic feature extracted by spectrum analysis is much better than those that

do not combine the cyclic feature. According to the criteria of MAE, in the ﬁrst exper-

iment, the DEEM+CF, DBN+CF, ELM+CF and SVR+CF are respectively 21.765%,

23.316%, 25.167%, 27.773% better than the DEEM, DBN, ELM and SVR which don’t

consider the cyclic feature. In the second experiment, the DEEM+CF, DBN+CF,

ELM+CF and SVR+CF are respectively 19.880%, 19.695%, 16.661%, 42.832% bet-

ter. In addition, the DEEM+CF has the best accuracy, and it is 3.475%, 6.538%,

10.217% better than the DBN+CF, ELM+CF and SVR+CF respectively in the ﬁrst

experiment, and in the second experiment 5.871%, 15.083%, 13.836% better than these

three comparative models. Furthermore, according to the standard derivations of the

evaluation indices, the forecasting performance of the DEEM+CF is relatively more

stable in contrast to the other comparative models except the SVR.

•From Figures 10 and 15, we can clearly observe that there exist more errors around zero

in the histograms of the models that consider the cyclic feature. This also implies that

the cyclic feature can improve the forecasting accuracy. Again, the error histograms of

the DEEM+CF are the narrowest and highest ones which also imply the most accurate

forecasting performance of this proposed model.

Overall, the prediction models, that consider the cyclic feature, have much higher fore-

casting accuracy than the models that are directly trained by the original data, and the

proposed DEEM+CF performs more stably and accurately compared with the other mod-

els. This proves that the cyclic feature has great inﬂuence on the accuracy promotion of the

BEC forecasting, and the full utilization of the abstracted features from all the layers of the

DBN is also useful and helpful to improve the forecasting performance.

5. Conclusion

Short-term forecasting of BEC is helpful to the real-time building energy-demand re-

sponse, the energy planning and the building management. In this paper, a novel deep

belief network and extreme learning machine based ensemble method considering the cyclic

feature is proposed to promote the accuracy of half hourly BEC forecasting. In the proposed

ensemble model, the stable component – the cyclic feature of the BEC is extracted via the

spectrum analysis, while the remaining stochastic component after removing the stable com-

ponent from the original BEC data is used to construct the DEEM. Two experiments are

performed to prove the eﬀectiveness and superiorities of the proposed DEEM+CF model.

As demonstrated by the experimental results and comparisons, the cyclic feature can im-

prove the prediction performance for about 20% better than those without the utilization of

cyclic feature, and what’s more, the proposed DEEM+CF model has much higher accuracy

than the other comparative models, separately 3%, 6%, 10% better at least than DBN+CF,

ELM+CF and SVR+CF under the criteria of MAE in our experiments.

In this study, we achieve the parameter optimization of the DEEM via ﬁxing and changing

the structures of the ELMs and the DBN alternately. However, the parameter optimization

method is not the best, and it still needs further exploration. Besides, the cyclic features

are greatly related to occupancy which is one of the key factors in BEC forecasting. It is

valuable to study and apply the relationships between them. In the future, the relationships

and their applications will be one of our key researches.

Acknowledgments

This study is partly supported by the National Natural Science Foundation of China

(61573225), the Taishan Scholar Project of Shandong Province (TSQN201812092), the Key

Research and Development Program of Shandong Province (2019GGX101072), the State

Scholarship Fund and the Youth Innovation Technology Project of Higher School in Shan-

dong Province (2019KJN005).

References

[1] Y. Ye, W. Zuo, G. Wang, A comprehensive review of energy-related data for U.S. commercial buildings,

Energy and Buildings 186 (2019) 126–137. doi:10.1016/j.enbuild.2019.01.020.

[2] Y. Lou, W. M. Jayantha, L. Shen, Z. Liu, T. Shu, The application of low-carbon city (lcc) indicatorsa

comparison between academia and practice, Sustainable Cities and Society 51 (2019) 101677. doi:

https://doi.org/10.1016/j.scs.2019.101677.

[3] C. Fan, G. Huang, Y. Sun, A collaborative control optimization of grid-connected net zero energy

buildings for performance improvements at building group level, Energy 164 (2018) 536 – 549. doi:

https://doi.org/10.1016/j.energy.2018.09.018.

[4] P. Huang, Y. Sun, A collaborative demand control of nearly zero energy buildings in response to

dynamic pricing for performance improvements at cluster level, Energy 174 (2019) 911 – 921. doi:

https://doi.org/10.1016/j.energy.2019.02.192.

[5] Y. Ye, K. Hinkelman, J. Zhang, W. Zuo, G. Wang, A methodology to create prototypical building

energy models for existing buildings: A case study on us religious worship buildings, Energy and

Buildings 194 (2019) 351 – 365. doi:https://doi.org/10.1016/j.enbuild.2019.04.037.

[6] Y. Fu, W. Zuo, M. Wetter, J. W. VanGilder, X. Han, D. Plamondon, Equation-based object-oriented

modeling and simulation for data center cooling: A case study, Energy and Buildings 186 (2019) 108–

125. doi:10.1016/j.enbuild.2019.01.018.

[7] J. Chambers, P. Hollmuller, O. Bouvard, A. Schueler, J.-L. Scartezzini, E. Azar, M. K. Patel, Evaluating

the electricity saving potential of electrochromic glazing for cooling and lighting at the scale of the

swiss non-residential national building stock using a monte carlo model, Energy 185 (2019) 136 – 147.

doi:https://doi.org/10.1016/j.energy.2019.07.037.

[8] K. Amasyali, N. M. El-Gohary, A review of data-driven building energy consumption prediction studies,

Renewable and Sustainable Energy Reviews 81 (2018) 1192–1205. doi:10.1016/j.rser.2017.04.095.

[9] Y. Zhou, S. Zheng, G. Zhang, Machine learning-based optimal design of a phase change material

integrated renewable system with on-site pv, radiative cooling and hybrid ventilationsstudy of modelling

and application in ﬁve climatic regions, Energy 192 (2020) 116608. doi:https://doi.org/10.1016/

j.energy.2019.116608.

[10] M. Alizamir, S. Kim, O. Kisi, M. Zounemat-Kermani, A comparative study of several machine learning

based non-linear regression methods in estimating solar radiation: Case studies of the usa and turkey

regions, Energy (2020) 117239doi:https://doi.org/10.1016/j.energy.2020.117239.

[11] H. Liu, B. Xu, D. Lu, G. Zhang, A path planning approach for crowd evacuation in buildings based

on improved artiﬁcial bee colony algorithm, Applied Soft Computing 68 (2018) 360 – 376. doi:https:

//doi.org/10.1016/j.asoc.2018.04.015.

[12] S. Lu, Q. Li, L. Bai, R. Wang, Performance predictions of ground source heat pump system based on

random forest and back propagation neural network models, Energy Conversion and Management 197

(2019) 111864. doi:https://doi.org/10.1016/j.enconman.2019.111864.

[13] H. C. Jung, J. S. Kim, H. Heo, Prediction of building energy consumption using an improved real

coded genetic algorithm based least squares support vector machine approach, Energy and Buildings

90 (2015) 76–84. doi:10.1016/j.enbuild.2014.12.029.

[14] Y. Xu, M. Zhang, L. Ye, Q. Zhu, Z. Geng, Y. He, Y. Han, A novel prediction intervals method

integrating an error & self-feedback extreme learning machine with particle swarm optimization for

energy consumption robust prediction, Energy 164 (2018) 137–146. doi:10.1016/j.energy.2018.08.

180.

[15] Y. Huang, Y. Yuan, H. Chen, J. Wang, Y. Guo, T. Ahmad, A novel energy demand prediction strategy

for residential buildings based on ensemble learning, Energy Procedia 158 (2019) 3411–3416. doi:

10.1016/j.egypro.2019.01.935.

[16] Y. Lv, Y. Duan, W. Kang, Z. Li, F. Wang, Traﬃc ﬂow prediction with big data: A deep learning

approach, IEEE Transactions on Intelligent Transportation Systems 16 (2) (2015) 865–873. doi:

10.1109/TITS.2014.2345663.

[17] X. Xue, Y. Guo, S. Chen, S. Wang, Analysis and controlling of manufacturing service ecosystem: A

research framework based on the parallel system theory, IEEE Transactions on Services Computing

(Early Access)doi:10.1109/TSC.2019.2917445.

[18] X. Xue, S. Wang, L. Zhang, Z. Feng, Y. Guo, Social learning evolution (sle): Computational experiment-

based modeling framework of social manufacturing, IEEE Transactions on Industrial Informatics 15 (6)

(2019) 3343–3355. doi:10.1109/TII.2018.2871167.

[19] X. Xue, H. Han, S. Wang, C. Qin, Computational experiment-based evaluation on context-aware o2o

service recommendation, IEEE Transactions on Services Computingdoi:10.1109/TSC.2016.2638083.

[20] C. Tian, C. Li, G. Zhang, Y. Lv, Data driven parallel prediction of building energy consumption using

generative adversarial nets, Energy and Buildings 186 (2019) 230–243. doi:10.1016/j.enbuild.2019.

01.034.

[21] G. Fu, Deep belief network based ensemble approach for cooling load forecasting of air-conditioning

system, Energy 148 (2018) 269–282. doi:10.1016/j.energy.2018.01.180.

[22] C. Li, Z. Ding, J. Yi, Y. Lv, G. Zhang, Deep belief network based hybrid model for building energy

consumption prediction, Energies 11 (1) (2018) 242. doi:10.3390/en11010242.

[23] C. Fan, Y. Sun, Y. Zhao, M. Song, J. Wang, Deep learning-based feature engineering methods for

improved building energy prediction, Applied Energy 240 (2019) 35 – 45. doi:https://doi.org/10.

1016/j.apenergy.2019.02.052.

[24] M. Ashouri, B. C. Fung, F. Haghighat, H. Yoshino, Systematic approach to provide building occupants

with feedback to reduce energy consumption, Energy 194 (2020) 116813. doi:https://doi.org/10.

1016/j.energy.2019.116813.

[25] J. W. Boland, Generation of synthetic sequences of electricity demand with applications, in: Filar J.,

Haurie A. (eds) Uncertainty and Environmental Decision Making. International Series in Operations

Research & Management Science, Springer, Boston, MA, 2010, pp. 275–314. doi:https://doi.org/

10.1007/978-1-4419-1129- 2_10.

[26] G. E. Hinton, A practical guide to training restricted boltzmann machines, Neural Networks: Tricks of

the Trade 7700 (2012) 599–619. doi:10.1007/978-3-642- 35289-8_32.

[27] T. Tieleman, G. E. Hinton, Using fast weights to improve persistent contrastive divergence, in: Pro-

ceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 1033–1040.

doi:http://doi.acm.org/10.1145/1553374.1553506.

[28] G. Huang, Q. Zhu, C. K. Siew, Extreme learning machine: Theory and applications, Neurocomputing

70 (1) (2006) 489–501. doi:10.1016/j.neucom.2005.12.126.

[29] A. D. Lima, L. F. Silveira, S. X. de Souza, Spectrum sensing with a parallel algorithm for cyclostationary

feature extraction, Computers & Electrical Engineering 71 (2018) 151 – 161. doi:https://doi.org/

10.1016/j.compeleceng.2018.07.016.

[30] J. Kostrzewa, Time series forecasting using clustering with periodic pattern, in: 2015 7th Inter-

national Joint Conference on Computational Intelligence (IJCCI), Vol. 3, 2015, pp. 85–92. doi:

10.5220/0005586900850092.

[31] A. I. Taiwo, T. O. Olatayo, A. F. Adedotun, Modeling and forecasting periodic time series data with

fourier autoregressive model, Iraqi Journal of Science (2019) 1367–1373.

[32] Y. Zou, X. Hua, Y. Zhang, Y. Wang, Hybrid short-term freeway speed prediction methods based

on periodic analysis, Canadian Journal of Civil Engineering 42 (8) (2015) 570–582. doi:10.1139/

cjce-2014-0447.

[33] J. Tang, F. Liu, Y. Zou, W. Zhang, Y. Wang, An improved fuzzy neural network for traﬃc speed

prediction considering periodic characteristic, IEEE Transactions on Intelligent Transportation Systems

18 (9) (2017) 2340–2350.

[34] W. Zhang, Y. Zou, J. J. Tang, Y. Wang, Short-term prediction of vehicle waiting queue at ferry

terminal based on machine learning method, Journal of Marine Science & Technology 21 (4) (2016)

1–13. doi:10.1007/s00773-016-0385-y.

[35] R. Li, P. Jiang, H. Yang, C. Li, A novel hybrid forecasting scheme for electricity demand time series, Sus-

tainable Cities and Society 55 (2020) 102036. doi:https://doi.org/10.1016/j.scs.2020.102036.

[36] C. Liu, K. Gryllias, A semi-supervised support vector data description-based fault detection method for

rolling element bearings based on cyclic spectral analysis, Mechanical Systems and Signal Processing

140 (2020) 106682. doi:https://doi.org/10.1016/j.ymssp.2020.106682.

[37] J. Boland, Characterising seasonality of solar radiation and solar farm output, Energies 13 (2) (2020)

471. doi:https://doi.org/10.3390/en13020471.

[38] J. Boland, A. Grantham, Nonparametric conditional heteroscedastic hourly probabilistic forecasting of

solar radiation, J-Multidisciplinary Scientiﬁc Journal 1 (1) (2018) 174–191. doi:https://doi.org/

10.3390/j1010016.

[39] F. Al-Obeidat, B. Spencer, O. Alfandi, Consistently accurate forecasts of temperature within buildings

from sensor data using ridge and lasso regression, Future Generation Computer Systemsdoi:https:

//doi.org/10.1016/j.future.2018.02.035.

[40] A. Satre-Meloy, Investigating structural and occupant drivers of annual residential electricity con-

sumption using regularization in regression models, Energy 174 (2019) 148 – 168. doi:https:

//doi.org/10.1016/j.energy.2019.01.157.

[41] C. Fan, Y. Ding, Cooling load prediction and optimal operation of HVAC systems using a multiple

nonlinear regression model, Energy and Buildings 197 (2019) 7 – 17. doi:https://doi.org/10.1016/

j.enbuild.2019.05.043.

Augmenting Explainable Data-Driven Models in Energy Systems: A Python Framework for Feature Engineering

Chapter

Full-text available

Jun 2024

Sandra Wilfling

Data-driven modeling is an approach in energy systems modeling that has been gaining popularity. In data-driven modeling, machine learning methods such as linear regression, neural networks or decision-tree based methods are applied. While these methods do not require domain knowledge, they are sensitive to data quality. Therefore, improving data quality in a dataset is beneficial for creating machine learning-based models. The improvement of data quality can be implemented through preprocessing methods. A selected type of preprocessing is feature engineering, which focuses on evaluating and improving the quality of certain features inside the dataset. Feature engineering includes methods such as feature creation, feature expansion, or feature selection. In this work, a Python framework containing different feature engineering methods is presented. This framework contains different methods for feature creation, expansion and selection; in addition, methods for transforming or filtering data are implemented. The implementation of the framework is based on the Python library scikit-learn . The framework is demonstrated on a use case from energy demand prediction. A data-driven model is created including selected feature engineering methods. The results show an improvement in prediction accuracy through the engineered features.

A study of deep learning-based multi-horizon building energy forecasting

Article

Full-text available

Dec 2023
ENERG BUILDINGS

Building energy forecasting facilitates optimizing daily operation scheduling and long-term energy planning. Many studies have demonstrated the potential of data-driven approaches in producing point forecasts of energy use. Despite this, little work has been undertaken to understand uncertainty in energy forecasts. However, many decision-making scenarios require information from a full conditional distribution of forecasts. In addition, recent advances in deep learning have not been fully exploited for building energy forecasting. Motivated by these research gaps, this study contributes in two aspects. First, this study has adapted and applied state-of-the-art deep learning architectures to address the problem of multi-horizon building energy forecasting. Eight different methods, including seven deep learning-based ones, were investigated to develop models to perform both point and probabilistic forecasts. Second, a comprehensive case study was conducted in two public historic buildings with different operating modes, namely the City Museum and the City Theatre, in Norrköping, Sweden. The performance of the developed models was evaluated, and the predictability of different scenarios of energy consumption was studied. The results show that incorporating future information on exogenous factors that determine energy use is critical for making accurate multi-horizon predictions. Furthermore, changes in the operating mode and activities held in a building bring more uncertainty in energy use and deteriorate the prediction accuracy of models. The temporal fusion transformer (TFT) model exhibited strong competitiveness in performing both point and probabilistic forecasts. As assessed by the coefficient of variance of the root mean square error (CV-RMSE), the TFT model outperformed other models in making point forecasts of both types of energy use of the City Museum (CV-RMSE 29.7% for electricity consumption and CV-RMSE 8.7% for heating load). When making probabilistic predictions, the TFT model performed best to capture the central tendency and upper distribution of heating load of the City Museum as well as both types of energy use of the City Theatre. The predictive models developed in this study can be integrated into digital twin models of buildings to discover areas where energy use can be reduced, optimize building operations, and improve overall sustainability and efficiency.

Comprehensive Analysis of Influencing Factors on Building Energy Performance and Strategic Insights for Sustainable Development: A Systematic Literature Review

Article

Full-text available

Jun 2024

A prerequisite for decreasing the intensification of energy in buildings is to evaluate and understand the influencing factors of building energy performance (BEP). These factors include building envelope features and outdoor climactic conditions, among others. Based on the importance of the influencing factors in the development of the building energy prediction model, various researchers are continuously employing different types of factors based on their popularity in academic literature, without a proper investigation of the most relevant factors, which, in some cases, potentially leads to poor model performance. However, this can be due to the absence of an adequate comprehensive analysis or review of all factors influencing BEP ubiquitously. Therefore, this paper conducts a holistic and comprehensive review of studies that have explored the various factors influencing energy use in residential and commercial buildings. In total, 74 research articles were systematically selected from the Scopus, ScienceDirect, and Institute of Electrical Electronics Engineers (IEEE) databases. Subsequently, by means of a systematic and bibliometric analysis, this paper comprehensively analyzed several important factors influencing BEP. The results reveals the important factors (such as windows and roofs) and engendered or shed light on the application of some energy-efficient strategies such as the utilization of a green roof and photovoltaic (PV) window, among others.

Bootstrap aggregation with Christiano–Fitzgerald random walk filter for fault prediction in power systems

Article

Full-text available

Jan 2024
ELECTR ENG

The ability to predict and preempt insulator failures holds the potential to enhance the reliability of electrical power grids. The increase in insulator leakage current is an indication that failures may occur. By harnessing historical data and employing time series forecasting models, it is possible to identify potential faults before they escalate into disruptive failures. In this paper, a hybrid model for time series prediction is proposed by combining the Christiano–Fitzgerald random walk filter for signal denoising with an ensemble bootstrap aggregation model for leakage current forecasting. A comparison between bootstrap aggregation, boosting, random subspace, and stacked generalization ensemble learning models is presented. With a root mean square error of 7.62 ×10-4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 10^{-4}$$\end{document} (in a statistical evaluation), the ensemble bootstrap aggregation model with Christiano–Fitzgerald random walk filter proved to be a promising approach to be applied for time series fault forecasting. The proposed method was shown to be more promising than the original ensemble bootstrap aggregation model and the long short-term memory.

Development of a Standardized Data Acquisition Prototype for Heterogeneous Sensor Environments as a Basis for ML Applications in Pultrusion

Chapter

Full-text available

Jun 2024

Pultrusion of continuous fiber reinforced profiles has been state of the art for several decades. However, pultrusion in the production environment so far has no or only few sensor data in a heterogeneous sensor environment, has shortcomings in data quality (time synchronization, different formats, different sampling rates, sporadic disconnections and resulting data losses), and insufficient data processing methods are used for process control and optimization. Significant efficiency improvements would therefore still be possible in this respect. The question therefore arises as to how a data acquisition system can be designed that provides heterogeneous data in a standardized and reliable manner for data-based process development and optimization. The aim is therefore to develop a digitized and standardized data acquisition prototype for the continuous production of sustainable profile structures so flexible that the various components of the production line can be adaptively combined in a central data acquisition system. We have therefore screened and selected possible standardized transmission formats in combination with suitable low-cost data acquisition systems for a highly reliable and secure application area. The paper shows a profound concept for a data acquisition prototype based on different low-cost control systems for the applicability and testability of the developed requirements regarding data acquisition, processing, and future storage.

A Structured Literature Review and Meta-analysis of Forecasting Methods for Energy Consumption in Smart Buildings

Chapter

May 2024

Behaviour of Machine Learning algorithms in the classification of energy consumption in school buildings

Article

May 2024

The significance of energy efficiency in the development of smart cities cannot be overstated. It is essential to have a clear understanding of the current energy consumption (EC) patterns in both public and private buildings. One way to achieve this is by employing machine learning classification algorithms, which offer a broader perspective on the factors influencing EC. These algorithms can be applied to real data from databases, making them valuable tools for smart city applications. In this paper, our focus is specifically on the EC of public schools in a Portuguese city, as this plays a crucial role in designing a Smart City. By utilizing a comprehensive dataset on school EC, we thoroughly evaluate multiple ML algorithms. The objective is to identify the most effective algorithm for classifying average EC patterns. The outcomes of this study hold significant value for school administrators and facility managers. By leveraging the predictions generated from the selected algorithm, they can optimize energy usage and, consequently, reduce costs. The use of a comprehensive dataset ensures the reliability and accuracy of our evaluations of various ML algorithms for EC classification.

Intelligent deep learning techniques for energy consumption forecasting in smart buildings: a review

Article

Full-text available

Feb 2024
ARTIF INTELL REV

Urbanization increases electricity demand due to population growth and economic activity. To meet consumer’s demands at all times, it is necessary to predict the future building energy consumption. Power Engineers could exploit the enormous amount of energy-related data from smart meters to plan power sector expansion. Researchers have made many experiments to address the supply and demand imbalance by accurately predicting the energy consumption. This paper presents a comprehensive literature review of forecasting methodologies used by researchers for energy consumption in smart buildings to meet future energy requirements. Different forecasting methods are being explored in both residential and non-residential buildings. The literature is further analyzed based on the dataset, types of load, prediction accuracy, and the evaluation metrics used. This work also focuses on the main challenges in energy forecasting due to load fluctuation, variability in weather, occupant behavior, and grid planning. The identified research gaps and the suitable methodology for prediction addressing the current issues are presented with reference to the available literature. The multivariate analysis in the suggested hybrid model ensures the learning of repeating patterns and features in the data to enhance the prediction accuracy.

Predicting the electric power consumption of office buildings based on dynamic and static hybrid data analysis

Article

Dec 2023
ENERGY

A residential load forecasting method for multi-attribute adversarial learning considering multi-source uncertainties

Article

Dec 2023
INT J ELEC POWER

Characterising Seasonality of Solar Radiation and Solar Farm Output

Article

Full-text available

Jan 2020

John W Boland

With the recent rapid increase in the use of roof top photovoltaic solar systems worldwide, and also, more recently, the dramatic escalation in building grid connected solar farms, especially in Australia, the need for more accurate methods of very short-term forecasting has become a focus of research. The International Energy Agency Tasks 46 and 16 have brought together groups of experts to further this research. In Australia, the Australian Renewable Energy Agency is funding consortia to improve the five minute forecasting of solar farm output, as this is the time scale of the electricity market. The first step in forecasting of either solar radiation or output from solar farms requires the representation of the inherent seasonality. One can characterise the seasonality in climate variables by using either a multiplicative or additive modelling approach. The multiplicative approach with respect to solar radiation can be done by calculating the clearness index, or alternatively estimating the clear sky index. The clearness index is defined as the division of the global solar radiation by the extraterrestrial radiation, a quantity determined only via astronomical formulae. To form the clear sky index one divides the global radiation by a clear sky model. For additive de-seasoning, one subtracts some form of a mean function from the solar radiation. That function could be simply the long term average at the time steps involved, or more formally the addition of terms involving a basis of the function space. An appropriate way to perform this operation is by using a Fourier series set of basis functions. This article will show that for various reasons the additive approach is superior. Also, the differences between the representation for solar energy versus solar farm output will be demonstrated. Finally, there is a short description of the subsequent steps in short-term forecasting.

Modeling and Forecasting Periodic Time Series data with Fourier Autoregressive Model

Article

Full-text available

Jun 2019

Most frequently used models for modeling and forecasting periodic climatic time series do not have the capability of handling periodic variability that characterizes it. In this paper, the Fourier Autoregressive model with abilities to analyze periodic variability is implemented. From the results, FAR(1), FAR(2) and FAR(2) models were chosen based on Periodic Autocorrelation function (PeACF) and Periodic Partial Autocorrelation function (PePACF). The coefficients of the tentative model were estimated using a Discrete Fourier transform estimation method. FAR(1) models were chosen as the optimal model based on the smallest values of Periodic Akaike (PAIC) and Bayesian Information criteria (PBIC). The residual of the fitted models was diagnosed to be white noise. The in-sample forecast showed a close reflection of the original rainfall series while the out-sample forecast exhibited a continuous periodic forecast from January 2019 to December 2020 with relatively small values of Periodic Root Mean Square Error (PRMSE), Periodic Mean Absolute Error (PMAE) and Periodic Mean Absolute Percentage Error (PMAPE). The comparison of FAR(1) model forecast with AR(3), ARMA(2,1), ARIMA(2,1,1) and SARIMA()() model forecast indicated that FAR(1) outperformed the other models as it exhibited a continuous periodic forecast. The continuous monthly periodic rainfall forecast indicated that there will be rapid climate change in Nigeria in the coming yearly and Nigerian Government needs to put in place plans to curtail its effects.

Extreme Learning Machine: Theory and Applications

Article

Dec 2006
NEUROCOMPUTING

It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.1

A comparative study of several machine learning based non-linear regression methods in estimating solar radiation: Case studies of the USA and Turkey regions

Article

Apr 2020
ENERGY

In this study, the potential of six different machine learning models, gradient boosting tree (GBT), multilayer perceptron neural network (MLPNN), two types of adaptive neuro-fuzzy inference systems (ANFIS) based on fuzzy c-means clustering (ANFIS-FCM) and subtractive clustering (ANFIS-SC), multivariate adaptive regression spline (MARS), and classification and regression tree (CART) were used for forecasting solar radiation from two stations of two different locations, Turkey and USA. Wind speed, maximum air temperature, minimum air temperature and relative humidity were used as inputs to the developed models. For accurate evaluation of performance of models, four statistical indicators, root mean squared error (RMSE), coefficient of correlation (R), mean absolute error (MAE) and Nash–Sutcliffe efficiency coefficient (NS) were employed to evaluate accuracy of the developed models. Comparison of results showed that the GBT model performed better than the MLPNN, ANFIS, MARS, and CART in modeling solar radiation. The average RMSE of MLPNN, ANFIS-FCM, ANFIS-SC, MARS and CART models was decreased by 0.26%, 1.5%, 0.51%, 2.5%, and 19.34% using GBT model at Fairfield Station, 4%, 1.37%, 0.24%, 4.12%, and 24.4% at Monmouth Station, 11.99%, 48.7%, 41.6%, 8.23%, and 33.41% at Antalya Station, 11%, 54.8%, 51.9%, 19.65%, and 37.1% at Mersin Station, respectively. The overall results indicated that the GBT model could be successfully applied in forecasting solar radiation by using climatic parameters as inputs.

A Semi-supervised Support Vector Data Description-based Fault Detection Method for Rolling Element Bearings based on Cyclic Spectral Analysis

Article

Jan 2020

The early and accurate detection of rolling element bearing faults, closely linked to the timely maintenance and repair before a sudden breakdown, is still one of the key challenges in the area of condition monitoring. Nowadays advanced signal processing techniques are combined with high level machine learning approaches, focusing towards automatic fault diagnosis. A plethora of Health Indicators (HIs) have been proposed to feed in machine learning models in order to track the system degradation. Cyclic Spectral Analysis (CSA), including Cyclic Spectral Correlation (CSC) and Cyclic Spectral Coherence (CSCoh), has been proved as powerful tools in rotation machinery signal processing. Due to the periodic mechanism of bearing fault impacts, the HIs extracted from the Cyclostationary (CS) domain can expose bearing defects even in premature stage. On the other hand, supervised machine learning approaches with labelled training and testing datasets cannot be realistically obtained under industrial conditions. In order to overcome this limitation, a novel semisupervised Support Vector Data Description (SVDD) with negative samples (NSVDD) fault detection approach is proposed in this paper. The NSVDD model utilizes CS indicators to build the feature space, and fits a hyper-sphere to calculate the Euclidean distances in order to isolate the healthy and faulty data. An uniform object generation method is adopted to generate artificial outliers as negative samples for the NSVDD. A systematic fault detection decision strategy is proposed to estimate the bearing status simultaneously with the detection of fault initiation. Furthermore, a multi-level anomaly detection framework is built based on data at i) single sensor level, ii) machine level and ii) entire machine fleet level. Three run-to-failure bearing datasets including signals from twelve bearings are used to implement the proposed fault detection methodology. Results show that, the CS based indicators outperform time and Fast Kurtogram (FK) based Squared Envelope Spectrum (SES) indicators. Moreover, the proposed NSVDD model show superior characteristics in anomaly detection comparing to Back-Propagation Neural Network, random forest and K-Nearest Neighbor.

A novel hybrid forecasting scheme for electricity demand time series

Article

Apr 2020

Electricity demand/load forecasting always plays a vital role in the management and operation of power systems, since it can help develop an optimal action program for power producers, end-consumers and government entities. Inaccurate prediction may cause an additional production or waste of resources due to high operational costs. This paper investigated the benefit of combining data features to produce short-term electricity demand forecast. The nature of the electricity usually presents the complex characteristic and obvious seasonal tendency. In this paper, the advantage of adaptive Fourier decomposition is firstly used to extract the fluctuation characteristics. Then, the condition of the linear and stationary sequence is satisfied and the sub-series are performed to measure and eliminate the seasonal pattern. In the process of seasonal adjustment, the average periodicity length is identified quantitatively. In addition, to realize the generalization performance on real electricity demand data, the sine cosine optimization algorithm is applied to select the penalty and kernel parameters of support vector machine. The empirical study showed that the superior property of the proposed hybrid method profits from the effect of data pretreatment and the findings prove that this hybrid modeling scheme can yield promising prediction results within acceptable computational complexity.

Systematic Approach to Provide Building Occupants with Feedback to Reduce Energy Consumption

Article

Dec 2019
ENERGY

Many technical solutions have been developed to reduce buildings’ energy consumption, but limited efforts have been made to adequately address the role or action of building occupants in this process. Our earlier investigations have shown that occupants play a significant role in buildings’ energy consumption: It was shown that savings of up to 20% could be achieved by modifying occupant behavior thorough direct feedback and recommendations. Studying the role of occupants in building energy consumption requires an understanding of the interrelationships between climatic conditions; building characteristics; and building services and operation. This paper describes the development of a systematic procedure to provide building occupants with direct feedback and recommendations to help them take appropriate action to reduce building energy consumption. The procedure is geared toward developing a Reference Building (RB) (an energy-efficient building) for a specific given building. The RB is then compared against its given building to inform the occupants of the given building how they are using end-use loads and how they can improve them. The RB is generated using a data-mining approach, which involves clustering analysis and neural networks. The framework is based on clustering similar buildings by effects unrelated to occupant behavior. The buildings are then grouped based on their energy consumption, and those with lower consumption are combined to generate the RB. Performance evaluation is determined by comparison of a given building with an RB. This comparison provides feedback that can lead occupants to take appropriate measures (e.g., turning off unnecessary lights or heating, ventilation, and air conditioning (HVAC), etc.) to improve building energy performance. More accurate, scalable, and realistic results are achiveable through current methodology which is shown through comparison with existing literature.

Machine learning-based optimal design of a phase change material integrated renewable system with on-site PV, radiative cooling and hybrid ventilations—study of modelling and application in five climatic regions

Article

Nov 2019
ENERGY

The widespread application of advanced renewable systems with optimal design can promote the cleaner production, reduce the carbon dioxide emission and realise the renewable and sustainable development. In this study, a phase change material integrated hybrid system was demonstrated, involving with advanced energy conversions and multi-diversified energy forms, including solar-to-electricity conversion, active water-based and air-based cooling, and distributed storages. A generic optimization methodology was developed by integrating supervised machine learning and heuristic optimization algorithms. Multivariable optimizations were systematically conducted for widespread application purpose in five climatic regions in China. Results showed that, the energy performance is highly dependent on mass flow rate and inlet cooling water temperature with contribution ratios at around 90% and 7%. Furthermore, compared to Taguchi standard orthogonal array, the machine-learning based optimization can improve the annual equivalent overall output energy from 86934.36 to 90597.32 kWh (by 4.2%) in ShangHai, from 86335.35 to 92719.07 (by 7.4%) in KunMing, from 87445.1 to 91218.3 (by 4.3%) in GuangZhou, from 87278.24 to 88212.83 (by 1.1%) in HongKong, and from 87611.95 to 92376.46 (by 5.4%) in HaiKou. This study presents optimal design and operation of a renewable system in different climatic regions, which are important to realise renewable and sustainable buildings.

Performance predictions of ground source heat pump system based on random forest and back propagation neural network models

Article

Oct 2019
ENERG CONVERS MANAGE

With rapid development of artificial intelligence, data-driven prediction models play an important role in energy prediction, fault detection, and diagnosis. This paper proposes an ensemble approach using random forest (RF) for hourly performance predictions of GSHP system. Two years of in situ data were collected in an educational building situated in severe cold area in China. Prediction models were established for performance indicators, and results indicate that the average error for COPs, COPu, EERs and EERu were all controlled within 5%. The model established by small amount of data can accurately predict long-term performance, thereby reducing time and difficulty of data collection. RF models, trained with different parameter settings were compared, results indicate that model accuracy was not very sensitive to variables numbers. The impact of input variables on prediction performance was analyzed, and importance ranking changed with period and performance indicators. By comparing the variable importance list, it was possible to establish which parameters were abnormal and lists of different periods can reflect whether the energy structure of building has changed. The overall superiority of RF was verified by comparing with back propagation neural network (BPNN) from robustness, interpretability, and efficiency. First, since GSHP system involving multiple indicators, the robustness, measured by average accuracy, was used to evaluate the accuracy level. According to CV-RMSE, robustness of RF is approximately 3.3% higher than that of BPNN. Second, RF is highly interpretive but BPNN is typical black box model. Finally, modeling complexity and training time of BPNN were much greater than RF.

The application of low-carbon city (LCC) indicators – A comparison between academia and practice

Article

Jul 2019

Many cities are pursuing low-carbon practices in order to reduce carbon emissions. In line with this, various low-carbon city (LCC) indicator systems have been established across the world. However, there are only few studies available investigating if the established LCC indicators have been effectively utilized in practice. Through a comprehensive literature review, this study composed a list of LCC indicators (LCCIL), which were classified into eight dimensions, namely, economy, energy use, social aspect, carbon and environment, urban mobility, solid waste, water, and land use. The quotation frequency of LCCIL indicators in 10 LCC indicator systems addressed in academia was reviewed. The application frequency of LCCIL indicators in 21 global cities was then examined. A comparative study was then conducted between academia and practice across these eight dimensions of LCCIL. The results reveal that (1) LCCIL indicators have not been effectively utilized in practice; (2) None of the LCCIL indicators related to social aspect has been used in practice; 3) The indicator “total carbon emission” has extensively been applied in practice, but it has not been used in academia. 4) The most popular LCCIL dimension in academia has been energy use, while urban mobility has mostly been in practice. The findings suggest that the applicability of LCC indicators must be considered when establishing a LCC indicator system. The findings provide important reference for further studies in establishing effective LCC indicators to guide development of low-carbon cities.

Accurate forecasting of building energy consumption via a novel ensembled deep learning method considering the cyclic feature

Abstract

Recommended publications

Daily power demand prediction for buildings at a large scale using a hybrid of physics-based model a...

Data Driven Parallel Prediction of Building Energy Consumption Using Generative Adversarial Nets

Long-Short-Term Memory Network Based Hybrid Model for Short-Term Electrical Load Forecasting

Short-term Load Forecasting for Smart Water and Gas Grids: a comparative evaluation