ArticlePDF Available

Accurate forecasting of building energy consumption via a novel ensembled deep learning method considering the cyclic feature

Authors:

Abstract

Short-term forecasting of building energy consumption (BEC) is significant for building energy reduction and real-time demand response. In this study, we propose a new method to realize half-hourly BEC prediction. In this new method, to fully utilize the existing data features and to further promote the forecasting performance, we divide the BEC data into the stable (cyclic) and stochastic components, and propose a novel hybrid model to model the stable and stochastic components respectively. The cyclic feature (CF) is extracted via the spectrum analysis, while the stochastic component is approximated by a novel Deep Belief Network (DBN) and Extreme Learning Machine (ELM) based ensembled model (DEEM). This novel hybrid model is named DEEM + CF. Furthermore, two real-world BEC experiments are performed to verify the proposed method. Also, to display the superiorities of the proposed DEEM + CF, this model is compared with the DBN, DBN + CF, ELM, ELM + CF, Support Vector Regression (SVR) and SVR + CF. Experimental results indicate that the CF has a great influence on the promotion of forecasting accuracy for approximately 20%, and DEEM + CF performance is the best among the comparative models, with at least 3%, 6%, 10% better accuracy than the DBN + CF, ELM + CF and SVR + CF respectively under the criteria of MAE.
Accurate Forecasting of Building Energy Consumption Via A Novel
Ensembled Deep Learning Method Considering the Cyclic Feature
Guiqing Zhanga, Chenlu Tiana, Chengdong Lia,d,, Jun Jason Zhangb, Wangda Zuoc
aShandong Key Laboratory of Intelligent Buildings Technology, School of Information and Electrical
Engineering, Shandong Jianzhu University, Jinan 250101, China
bSchool of Electrical Engineering and Automation, Wuhan University, Wuhan 430072, China
cDepartment of Civil, Environmental and Architectural Engineering, University of Colorado Boulder,
Boulder, CO 80309, U.S.A
dShandong Co-Innovation Center of Green Building, Jinan 250101, China
Abstract
Short-term forecasting of building energy consumption (BEC) is significant for building
energy reduction and real-time demand response. In this study, we propose a new method to
realize half-hourly BEC prediction. In this new method, to fully utilize the existing data fea-
tures and to further promote the forecasting performance, we divide the BEC data into the
stable (cyclic) and stochastic components, and propose a novel hybrid model to model the
stable and stochastic components respectively. The cyclic feature (CF) is extracted via the
spectrum analysis, while the stochastic component is approximated by a novel Deep Belief
Network (DBN) and Extreme Learning Machine (ELM) based ensembled model (DEEM).
This novel hybrid model is named DEEM+CF. Furthermore, two real-world BEC experi-
ments are performed to verify the proposed method. Also, to display the superiorities of the
proposed DEEM+CF, this model is compared with the DBN, DBN+CF, ELM, ELM+CF,
Support Vector Regression (SVR) and SVR+CF. Experimental results indicate that the CF
has a great influence on the promotion of forecasting accuracy for approximately 20%, and
DEEM+CF performance is the best among the comparative models, with at least 3%, 6%,
10% better accuracy than the DBN+CF, ELM+CF and SVR+CF respectively under the
criteria of MAE.
Keywords: Building energy consumption, Cyclic feature, Deep belief network, Extreme
learning machine, Spectrum analysis
1. Introduction
The building energy consumption (BEC) accounts for about 30% of the whole energy
usage in the world, and it is still increasing in a fast speed [1]. The growing BEC has attracted
Corresponding author
Email addresses: qqzhang@sdjzu.edu.cn (Guiqing Zhang), chenlutian2017@sdjzu.edu.cn (Chenlu
Tian), lichengdong@sdjzu.edu.cn (Chengdong Li), jun.zhang.ee@whu.edu.cn (Jun Jason Zhang),
Wangda.Zuo@Colorado.edu (Wangda Zuo)
Preprint submitted to Elsevier March 2, 2020
Zhang, G., Tian, C., Li, C., Zhang, J., and Zuo, W., 2020
Accurate Forecasting of Building Energy Consumption Via A Novel Ensembled
Deep Learning Method Considering the Cyclic Feature. Energy.
This paper has been accepted by Energy on 03/31/2020.
much attention worldwide due to the environmental degradation [2]. On the other aspect,
recently, lots of advanced information technologies applied in buildings and grid make it
possible to realize end-to-end connection, pushing the building and grid to a new area where
the building’s role is transformed from the pure customer to multiple identical prosumer [3].
In such conditions, the hourly or half hourly short-term prediction of BEC has become a
foundation task in the real-time demand response, building energy optimization, etc., which
play a great role in both building energy reduction and grid operation and management [4].
To achieve short-term forecasting of BEC, lots of researches are conducted using various
methods. The methods applied in this domain mainly include physical models [5, 6], statistic
models [7], and machine learning methods [8]. Among these methods, machine learning
has become one of the most promising methods recently because of its good capacity in
nonlinear approximation without the need for some detailed or unavailable building and
environmental knowledge. Machine learning can be divided into the traditional machine
learning and deep learning. Each machine learning method has specific advantages and
application circumstances [9, 10, 11]. Aiming at improving the prediction performance of
BEC, some traditional machine learning methods are always integrated together according to
the application requirements. For example, in [12], the random forest is combined with the
back propagation neural network to generate a hybrid model for performance forecasting
of the ground source heat pump system. Jung [13] utilized the improved least-squared
Support Vector Regression (SVR) to realize more accurate BEC forecasting. Yuan et al.
[14] adopted the particle swarm optimization in an improved ELM for the robust forecasting
of the BEC. Huang et al. [15] constructed an ensemble forecasting model which combined
the extreme gradient boosting, SVR, ELM, and the multiple linear regression for energy
demand forecasting. These ensemble methods have achieved good results, and ELM is one
of the most popular method for its fast computing and good capability of approximation.
Another popular idea to achieve and improve the forecasting of BEC is to combine the
deep learning model with traditional model, because the deep learning has deeper computing
layers and allow higher levels of feature and relation abstractions [16], while the traditional
machine learning has lower computational complexity. Inspired by the idea of parallel system
and parallel learning [17, 18, 19], Tian et al. [20] utilized GAN to achieve data enhancement
which was applied in some traditional machine learning methods to improve the forecasting
results. Fu [21] presented a hybrid model adopting the empirical mode decomposition and
DBN to realize the forecasting of the building cooling load. Li et al. [22] proposed a modified
DBN utilizing ELM to boost the forecasting accuracy of BEC. However, in existing studies,
the abstracted features from various layers of the deep learning models are not fully utilized.
Even though, people have made lots of contributions to the improvement of machine
learning method in BEC forecasting, one unavoidable problem is that the performance of
machine learning method relies greatly on the input data, thus, it is significant to extract and
utilize the valuable features which are inherent in the original data. Recently, deep learning
began to be applied in feature engineering of BEC forecasting. In [23], Autoencoders and
GAN are used in feature extraction to improve the prediction accuracy of BEC. Some other
deep learning models such as DBN are also professional in feature abstraction via layer-
by-layer processing, but such features from each layer of deep learning model are not fully
2
utilized. Besides the inherent features extracted by some deep learning methods, BEC has
its obvious cyclic feature – the daily periodic feature. People always leave home at about 8-9
am, and go back at 5-7 pm. Even in their work place, they work for a while then have a rest.
Also, the temperature are also periodic in one day [24]. All of these periodic components
combine to lead to the daily cyclic feature of BEC. In [25], the cyclic feature of electricity
demand is analyzed deeply and is utilized to generate the synthetic sequences. However, to
the authors’ knowledge, such cyclic feature has barely been taken into account in the present
algorithms for the BEC forecasting.
In this paper, a novel DBN and ELM based ensembled method considering the cyclic
feature of the observed data, named DEEM+CF, is proposed to achieve half hourly short-
term prediction of BEC. In this new method, the main steps are listed below:
Firstly, the cyclic feature of daily BEC is extracted by spectrum analysis, and the
original data is divided into stable (cyclic) and the stochastic ones.
Secondly, the DEEM is utilized to predict the stochastic ones. In the DEEM, different
layers of the DBN are used to abstract different levels of stochastic data features,
and the new constructed feature sets from each layer of DBN are then used to train
the corresponding ELMs. Such ELMs output the preliminary forecasting results, and
further being integrated by another ELM to generate the final predicted results for
the stochastic components. The DEEM takes full use of all abstracted features from
each layer of DBN.
Thirdly, the predicted results from DEEM are combined with the cyclic feature to give
the final forecasting outputs of BEC.
What’s more, to prove the effectiveness and the superiorities of the proposed DEEM+CF
model, two experiments utilizing two real-world datasets are conducted in this paper, and
comparisons with the pure DBN, the DBN+CF, the ELM, the ELM+CF, the SVR and the
SVR+CF are made. Experimental results and comparisons demonstrate that the utilization
of the cyclic feature can greatly promote the BEC prediction accuracy approximately 20%,
and the DEEM+CF performs at least 3%, 6%, 10% better than the DBN+CF, ELM+CF
and SVR+CF.
The remainder of this paper is organized as follows. Section 2 gives a basic introduction
of the DBN and ELM. In section 3, the DEEM+CF model is proposed and illustrated
in details. In Section 4, two experiments utilizing two real-world datasets are performed to
prove the superiorities and effectiveness of the DEEM+CF. Finally, we draw the conclusions
of this research in Section 5.
2. Methodologies
The DBN and ELM models are the basic components of the proposed DEEM. In this
section, DBN and ELM will be introduced briefly.
3
Figure 1: The architecture of DBN [26]
2.1. Deep Belief Networks (DBN)
DBN is stacked by several Restricted Bolzmann Machines (RBMs) one by one [26] as
depicted in Figure 1. It is expected to extract high levels of features out of the input data
space via layer-by-layer processing.
A single RBM is typically constituted by a hidden layer and a visible layer, and the
nodes of various layers are fully connected. The visible layer nodes are regarded as the
inputs, while the hidden layer nodes are seen as the outputs. The node values in each layer
constitute the binary vector as follows
v={v1, v2,· · · , vi,· · · , vm}T∈ {0,1}m,(1)
h={h1, h2,· · · , hj,· · · , hn}T∈ {0,1}n,(2)
where viis the visible variable in the visible layer, hiis the hidden variable in the hidden
layer, mis the number of the visible layer nodes, and nis the total number of the hidden
layer nodes. The RBM is a model based on energy which is always expected to be lowest.
The energy function could be described as [26]
E(v,h|Θ) = aTvbThvTW h =X
i
aiviX
j
bjhjX
iX
j
viwij hj,(3)
in which Θ = {W,a,b}represents the set of the model parameters, WRI×Jis the
weighting matrix, wij Wis the weighting variable between viand hj,aRIand bRJ
are the bias vectors, aiais the bias of each vi, and bjbis the bias of each hj.
To obtain well trained RBMs, the partial derivative of Θ needs to be computed via Gibbs
sampling, however it is time-consuming to run the Gibbs sampling for many times. To solve
this problem, Hinton [27] proposed the contrastive divergence method to train RBMs, and
this method just needs to run Gibbs sampling for Ktimes. Usually, when K= 1, the RBM
is trained well.
4
In DBN, the hidden layer of the former RBM is the input layer of the next RBM, and
the output of the ultimate RBM is fed into logistic regression. The training process of the
initial DBN is constituted by two stages which are the pre-training and fine-tuning process.
To begin, suppose that there is a training dataset (X
X
X , y
y
y) which has Nsamples {(x
x
xk, yk)}N
k=1
where x
x
xk= [x1
k, x2
k,· · · , xm
k]. The detailed training processes for the DBN are listed below
[26]:
Step 1: Initialize the parameters of DBN including the number of input nodes m,
the number of hidden and output nodes n, and the number of the hidden layers L.
Step 2: Input X
X
Xto the visible layer to train the weighting matrix Θ2
1that connects
the input layer and the second layer. Θ2
1is computed. From this training process, the
node values h
h
h(2) in the second layer of DBN will be obtained.
Step 3: The node values h
h
h(2) in the second layer are then used to determine Θ3
2.
Then, the node values h
h
h(3) in the third layer will be gained.
Step 4: Let l= 3, the node values h
h
h(l) in the lth layer are used to train Θl+1
l, and
the output results h
h
h(l+ 1) of the (l+ 1)th layer in the DBN will be got.
Step 5: Set l=l+ 1, and then the step 4 is iterated until l > L + 1.
Step 6: The outputs from the last hidden layer are fed into a logistic regression part to
generate the final output of the DBN. Then, utilize the training data set (X
X
X , y
y
y) again
to realize the fine tune of all the parameters of the DBN by the backward propagation
algorithm.
2.2. Extreme Learning Machine (ELM)
Suppose that the ELM has nhidden nodes and one output node. The architecture of
the ELM is depicted in Figure 2. For the input x
x
x= [x1, x2,· · · , xm], the output of the ELM
can be presented as
f(x
x
x) =
n
X
j=1
βjg(x
x
x, a
a
aj, bj) (4)
where w
w
wj= (aj
aj
aj, bj)Tis the weighting vector that connects the input and hidden nodes, and
it is randomly given, β
β
βis the output weighting vector that connects the hidden and output
layers, and grepresents the activation function.
In the training process of the ELM, no iteration is needed. For the given training dataset
(X
X
X , y
y
y) which has Nsamples {(x
x
xk, yk)}N
k=1 where x
x
xk= [x1
k, x2
k,· · · , xm
k], we firstly compute
the training matrix as
H
H
H=
g(a
a
a1x
x
x1+b1)... g(a
a
anx
x
x1+bn)
g(a
a
a1x
x
x2+b1)... g(a
a
anx
x
x2+bn)
... ... ...
g(a
a
a1x
x
xN+b1)... g(a
a
anx
x
xN+bn)
,(5)
5
Figure 2: The architecture overview of ELM [28]
in which the parameters a
a
ai, bi(i= 1,· · · , n) are randomly given. Then, the weights β
β
β
connecting the hidden layer and the output layer are directly computed via the least square
estimation method as
β
β
β=H
H
H+y
y
y(6)
where “ + ” means the Moore-Penrose generalized inverse, and y
y
y= [y1, ..., yN]T.
3. The Proposed Forecasting Model Considering the Cyclic Feature
This section presents the proposed DBN and ELM based ensemble method considering
the cyclic feature, named DEEM+CF. For clear elaborations, the scheme of the proposed
forecasting model will be introduced firstly, and then the cyclic feature extraction will be
given, and finally, how to construct the DEEM will be illustrated.
3.1. The Scheme of the Proposed DEEM+CF
The scheme for constructing the proposed DEEM+CF model is shown in Figure 3 and
is briefly illustrated as follows:
Step 1: Extract the cyclic feature which is the stable component of the original BEC
time series data.
Step 2: Generate the stochastic time series data which is the residual part of the
original BEC data after removing the stable component – the cyclic feature. Then,
transform the stochastic time series data to the stochastic training dataset.
Step 3: Utilize the stochastic training dataset to optimize the DEEM to achieve the
optimal forecasting performance for the stochastic data.
Step 4: Integrate the predicted stochastic results with cyclic features to achieve the
final prediction of BEC.
Below, we will give the design details of the proposed DEEM+CF.
6
Figure 3: The proposed forecasting scheme
3.2. The Cyclic Feature Extraction via Spectrum Analysis
3.2.1. Spectrum Analysis
One complicated signal can be transformed to simple waves which have specific cyclic
periods [29, 30]. Spectrum Analysis is able to achieve such decomposition in format of
Fourier series [31, 32] . Recently, this method is always adopted to analyze the inherent
information in many domains such as transportation [33, 34], electricity forecasting [35],
fault detection [36] and solar radiation analysis [37, 38]. Here, the spectrum analysis is
selected to extract the daily cyclic features of BEC series for its good capacity in finding
cyclic components. The following of this part is the definition of cyclic spectrum function .
Assume that f(t) is a periodic series with sampling period T. Then, f(t) could be
expressed to be Fourier series as
f(t) =
ni
X
j=1
cj·ejkwt (7)
where ajis the coefficient, and w=2π
T.
The Fourier series can also be expanded to be trigonometric polynomials series as
f(t) = c0
2+
X
γ=1
cγcos(kwt) +
X
γ=1
dγsin(kwt) (8)
where c0, cγ, dγcan be presented as
c0=T
2ZT
2
T
2
f(t)dt, cγ=T
2ZT
2
T
2
f(t)cos(γwt)dt, dγ=T
2ZT
2
T
2
f(t)sin(γwt)dt. (9)
7
3.2.2. The Extraction of the Cyclic Feature
To get the cyclic features, firstly, the average of daily BEC value ptis calculated and
obtained. The average of the daily BEC is expressed as
pt=1
D
D
X
i=1
vt
i(10)
where ptis the average of the daily values at time t,vt
iis the original value at time ton the
ith day, Dis the number of total days.
Then the spectrum function is utilized to extract the cyclic features of daily BEC. To
obtain more reasonable cyclic features, BIC is adopted to evaluate the performance of spec-
trum function. The number of cyclic components (trigonometric waves) will be increased,
and BIC of each different spectrum function is calculated. We select the spectrum function
which has the lowest BIC as the cyclic model of BEC, and the daily stable components pt
are obtained as
ˆpt=c0+c1sin 2πt
N+d1cos 2πt
N+· · · +cnsin 2t
N+dncos 2nπt
N,(11)
where Nis the number of daily collected data, c0, c1,· · · , cn, d1,· · · , dnare computed via
the least square estimation method as follows
c0
c1
d1
· · ·
cn
dn
=
1 sin 2π
Ncos 2π
N· · · sin 2
Ncos 2
N
1 sin 4π
Ncos 4π
N· · · sin 4
Ncos 4
N
· · · · · · · · · · · · · · · · · ·
1 sin 2Nπ
Ncos 2Nπ
N· · · sin 2N
Ncos 2N
N
+
p1
p2
...
pN
(12)
where “ + ” means the Moore-Penrose generalized inverse.
After the cyclic features are obtained, the stochastic components are computed via get-
ting rid of the stable ones (cyclic features) from the original data. Each original data can
be divided into the stable component and stochastic component as
vt
i= ˆpt+xt
i(13)
where xt
iis the stochastic component at time ton the ith day.
The stable components reflect the trend of the BEC, while the stochastic components
present the specific and random features of the BEC. The stochastic components are com-
bined to 1-D time series data {x1, x2,· · · }. This remaining stochastic BEC data series is then
transformed to the stochastic training dataset (X
X
X0, y
y
y) which has Nsamples {(x
x
x0,k, yk)}N
k=1
where x
x
x0k= [x1
0,k, x2
0,k,· · · , xm
0,k].
8
Figure 4: The architecture overview of the proposed DEEM.
3.3. Remaining Stochastic Data-Driven Design of the DEEM
3.3.1. The Framework of the DEEM
In this section, the DBN and ELMs are integrated in an ensemble forecasting method for
the forecasting of the stochastic BEC data. The architecture of the DEEM is shown in Figure
4, in which the DBN is utilized to generate the new representative feature datasets, and the
ELMs are selected to be the premier and ensemble forecasting models. Firstly, the original
training dataset is input to DBN model, and DBN extracts the new data features from the
stochastic training dataset via layer-by-layer processing. Each layer of DBN outputs one
new feature dataset which combines the target values to be the new training dataset. Then
the new training datasets are utilized to train separate ELMs to get the premier predicted
results of target values. Finally, all of the premier results are integrated and then combined
with the target values again to train another ELM and get the final predicted results.
The construction steps of the DEEM are listed below.
*Input: The stochastic training data sets (X
X
X0, y
y
y), the number of the hidden layers of
the DBN.
*Output: The final predicted result ˆy
ˆy
ˆyfor the stochastic component.
Step 1: Input the stochastic training dataset (X
X
X0, y
y
y) to the DBN model, and train
the DBN model. Suppose that the outputs from the ith hidden layer are X
X
Xi(i=
1,2,· · · , l), then from the ith hidden layer, one new dataset (X
X
Xi, y
y
y) will be generated.
9
Step 2: Input the generated dataset (X
X
Xi, y
y
y) to one corresponding ELM model to train
it and get individual predicted results y
y
yi(i= 1,2,· · · , l). Besides, the initial training
dataset (X
X
X0, y
y
y) is also used to train a single ELM to obtain the predicted results y
y
y0.
Step 3: Integrate all of the individual predicted results yi
yi
yis(i= 0,1,· · · , l) by another
ELM model to generate the final predicted result ˆy
ˆy
ˆy.
In the DEEM model, from each hidden layer of DBN, we will generate one new dataset.
The input stochastic training dataset and the newly constructed datasets will all be used
to participate the forecasting of the BEC. Compared with the conventional deep learning
models, in the DEEM, the initial training dataset and all of the abstracted features from
the hidden layers are fully utilized.
Below, we will explain the details of such steps.
3.3.2. Training Data Generation and Learning of ELMs
The newly generated training datasets are obtained from each layer of the DBN. The
newly constructed training dataset for the ith hidden layer is (X
X
Xi, y
y
y), and X
X
Xiis obtained as
X
X
Xi= ˆgi(X
X
Xi1, W
W
Wi, a
a
ai, b
b
bi) (14)
where ˆgi(·) represents the activation function in the ith hidden layer of the DBN, and
(W
W
Wi, a
a
ai, b
b
bi) is the weighting matrix connecting the (i1)th and ith hidden layers of the
DBN.
The input part X
X
Xiof the newly generated dataset can be finally presented as
Xi
Xi
Xi=
x
x
xi,1
x
x
xi,2
...
x
x
xi,N
=
x1
i,1... xm
i,1
x1
i,2... xm
i,2
... ... ...
x1
i,N ... xm
i,N
(15)
where mis the number of the ith hidden layer nodes in the DBN.
If the DBN has lhidden layers for feature abstraction, we will obtain l+ 1 training
datasets which include lnewly generated training datasets and one original stochastic train-
ing dataset. The l+ 1 training datasets will be utilized to construct l+ 1 corresponding
ELMs and to obtain l+1 premier individual forecasting results yis, which could be computed
as
yi,k =
ni
X
j=1
βi,j gi(a
a
ai,jx
x
xi,k +bi,j ) (16)
where i= 0,1,· · · , l,k= 1,· · · , N ,gi(·) is the activation function in the ith ELM, (ai,j
ai,j
ai,j , bi,j )
is the weighting vector which connects the input layer and the hidden layer of the ith ELM,
and there are nihidden nodes in the ith ELM. β
β
βi= [βi,1,· · · , βi,ni]Tis the weighting vector
10
that connects the hidden and output layers of the ith ELM, and can be determined as
β
β
βi=βi,1, βi,2,· · · , βi,niT
=
gi(a
a
ai,1x
x
xi,1+bi,1)... gi(a
a
ai,nix
x
xi,1+bi,ni)
gi(a
a
ai,1x
x
xi,2+bi,1)... gi(a
a
ai,nix
x
xi,2+bi,ni)
.
.
..
.
..
.
.
gi(a
a
ai,1x
x
xi,N +bi,1)... gi(a
a
ai,nix
x
xi,N +bi,ni)
+
y1
y2
.
.
.
yN
(17)
3.3.3. Design of the Ensemble Part
In the ultimate ensemble part, the l+1 premier predicted results will be firstly combined
to be a new training dataset, and then the newly generated dataset will be utilized to con-
struct the ensemble model which is chosen to be the ELM again due to its low computation
complexity and good capability in nonlinear approximation.
Assume that the integrated training dataset for the final training is (Y
Y
Y , y
y
y), where Y
Y
Ycan
be expressed as
Y
Y
Y=y
y
y0, y
y
y1,· · · , y
y
yl=
y0,1· · · yl,1
y0,2· · · yl,2
.
.
..
.
..
.
.
y0,N · · · yl,N
(18)
in which yi,k is the predicted result for the input data x
x
xi,k, and can be obtained by (16).
Y
Y
Ycan also be expressed as
Y
Y
Y= [y
y
y(0), y
y
y(1),· · · , y
y
y(N)]T(19)
where y
y
y(i)= [y0,i, y1,i,· · · , yl,i]Tfor i= 1,2,· · · , N .
Then, the integrated training data set will be employed to construct another ELM. To
begin, suppose that the ELM in the ensemble part has qhidden nodes and its input-output
mappings can be given as
ˆyi=
q
X
p=1
ˆ
β
ˆ
β
ˆ
βpˆga
ˆa
ˆapy
y
y(i)+ˆ
bp) (20)
where i= 1,2,· · · , N , (ˆa
ˆa
ˆap,ˆ
bp) is the weighting matrix that connects the input and hidden
layers of the integration ELM model, ˆg(·) represents the activation function in the integration
ELM, and ˆ
β
β
βis the weighting vector connecting the hidden and output layers.
To assure the performance of the ensemble ELM, its weighting vector is also obtained
11
by the least square estimation as
ˆ
β
ˆ
β
ˆ
β=ˆ
β1,ˆ
β2,· · · ,ˆ
βqT=
ˆga
ˆa
ˆa1y
y
y(1) +ˆ
b1)· · · ˆga
ˆa
ˆaqy
y
y(1) +ˆ
bq)
ˆga
ˆa
ˆa1y
y
y(2) +ˆ
b1)· · · ˆga
ˆa
ˆaqy
y
y(2) +ˆ
bq)
... ... ...
ˆga
ˆa
ˆa1y
y
y(N)+ˆ
b1)· · · ˆga
ˆa
ˆaqy
y
y(N)+ˆ
bq)
+
y1
y2
...
yN
(21)
4. Experiments and Comparisons
To verify the advantages of the proposed DEEM+CF model, two comparative experi-
mental studies will be conducted in this section.
4.1. Experimental Setting and Applied Datasets
4.1.1. Comparative Methods
For the purpose of showing the advantages of the proposed DEEM+CF method, firstly,
several popular regression models including lasso regression [39], ridge regression [40] and
multi-polynomial [41] are adopted to be the comparative methods of spectrum analysis in
cyclic feature extraction. ELM is utilized to be the prediction model. Secondly, several
popular machine learning models, including the DBN, ELM, and the SVR, are selected to
be the comparative models of DEEM+CP. Besides, to verify the effectiveness of the cyclic
feature furthermore, the hybrid models that combine the cyclic feature with the DBN, ELM
and SVR are respectively constructed to be the comparative models too, i.e. the DBN+CF,
ELM+CF, and SVR+CF are also designed to be the comparative models.
4.1.2. Evaluation Indices
To evaluate the forecasting performance, Mean Absolute Error (MAE), Mean Absolute
Percentage Error (MAPE), Root Mean Squared Error (RMSE), and Pearson Correlation
Coefficient (r) are selected as the evaluation indices. The four comparative indices have
been widely used for forecasting accuracy evaluation and are computed as
MAE =1
M
M
X
m=1
|ˆymym|(22)
M AP E =1
M
M
X
m=1
|ˆymym|
ym
×100% (23)
RMSE =v
u
u
t
1
M
M
X
m=1
(ˆymym)2(24)
r=PM
m=1(ˆymE(ˆym))(ymE(ym))
p(ˆymE(ˆym)2)p(ymE(ym)2)(25)
where ymand ˆymare respectively the observed values and predicted values, E(·) represents
the average of the samples.
12
Besides, to determine the number of cyclic features, Bayesian Information Criterion
(BIC) is adopted for model construction of the spectrum function. BIC can balance param-
eter adding and overfitting, and lower BIC means better model. BIC is calculated as
BI C = ln(M)k2ln(ˆ
L) (26)
where M is the number of data, k is the number of parameters adopted by model, ˆ
Lis the
maximum likelihood function of the model.
When the errors of model are independent and distributed according to normal distri-
bution, BIC can be presented as
BI C =Mln( ˆ
σ2) + kln(M) (27)
where ˆ
σ2is the error variance which is computed as
ˆ
σ2=1
M
M
X
m=1
(ˆymym)2(28)
4.1.3. Applied Dataset
Two buildings are chosen as the testing buildings to prove the effectiveness and superiori-
ties of the DEEM+CF method. The BEC datasets are retrieved from https://trynthink.github.io/
buildingsdatasets/.
The first building is located in Hialeah which is one of the warmest place in America. Its
energy consumption status was collected every 15 minutes from January 1, 2010 to December
31, 2010. There are 34940 samples in the dataset. Comparatively, the energy consumption
in summer is higher than the other seasons in this building. The original data was processed
and aggregated into the 30 minutes interval, and 17470 samples are obtained finally. In the
newly dataset, the value scale of energy consumption in half an hour is between 219 to 1032
kW.
The second building is from Pico Rivera, CA where the climate is comfortable. There
are four collected data points in one hour, and the data from January 1, 2010 to October
31, 2010 were selected. The data collected in summer is more fluctuant than in winter. The
data from this building was integrated into the 30 minutes interval too, and there are 14592
samples at last. The highest BEC in half an hour is 997 kW and the lowest value is 191 kW.
The original daily BEC data of the two buildings in one month is displayed in Figure
5(a) and Figure 5(b). In each experiment, the stochastic data and the original data will be
divided into two parts, we use the first 70% data as training dataset and the left 30% for
testing, and the size of the input sequence is set to be 10.
13
(a) (b)
Figure 5: (a) Daily BEC data (kW) in the first experiment, (b) Daily BEC data (kW) in the second
experiment.
4.2. The First Experiment
4.2.1. Configuration of the Forecasting Models
The proper configuration of parameters is important for the forecasting accuracy of ma-
chine learning models. In this experiment, the optimal parameters of the models, including
the spectrum function, DEEM, and the comparative models are detailed below.
(a) Configuration of the spectrum function
To obtain proper number of the cyclic waves in spectrum functions, the average of the
BEC time series in the first building is computed firstly via (10) and then input to the
spectrum function model. The number of the cyclic waves in the form of trigonometric
functions changes from 1 to 30. The performance of each spectrum function with specific
number of cyclic waves is evaluated via BIC. The lower BIC means more reasonable cyclic
features are extracted without significant overfitting.
Figure 6(a) illustrates the performances of the spectrum functions with different number
of cyclic waves. From this figure, we can see that when the spectrum function has 25 cyclic
waves, it obtains the best performance in this experiment. To show the cyclic features more
clearly, the spectrum map which reflect the amplitude of cyclic waves which have different
frequencies is show in Figure 6(b). It is clear that there are two significant cycles in period
of 3-4 hours and 24 hours, and these two significant cycles are combined with other 23 cycles
to present the daily cyclic feature of BEC. The stable and the stochastic time series data are
then obtained. Figure 6(c) demonstrates the first 500 original BEC data of the first building,
and Figure 6(d) presents all of the remaining stochastic BEC data of the first building.
On the other aspect, to evaluate the performance of spectrum function in cyclic feature
extraction, lasso regression, ridge regression and multi polynomial regression are also used
to model the cyclic features, and ELM is selected to be the prediction model. Here, the
penalty coefficient of lasso regression is set to be 1, and the number of dimensions is set to
be 26. The penalty coefficient and the number of dimensions in ridge regression is set to
be 0.01 and 26 separately. The highest degree of independent variable in multi-polynomial
regression is set to be 10. All of the experiments are conducted for ten times, and the
14
10 20 30 40
−2e+09 0e+00 2e+09
(a)
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
BIC
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
(b)
0 50 150 250 350
Amplitude
0 100 200 300 400 500
200 400 600 800
(c)
Originsl Data
0 5000 10000 15000
−400 0 200 400
(d)
Stochastic Data
Figure 6: (a) The performance of the spectrum functions with different number of trigonometric waves in
the first experiment,(b) The spectrum map of cycle features in the first experiment (c) The former 500
original BEC data in the first experiment, (d) The stochastic BEC data in the first experiment.
predicted results using four cyclic feature models are compared under the criteria of MAE,
MAPE, RMSE and r.
(b) Configuration of the DEEM
The DEEM is composed of the DBN and ELMs, thus, for the purpose of achieving
accurate forecasting, it is significant to determine the proper numbers of the hidden layers
and the nodes in each hidden layer of the DBN and the ELMs. In order to seek the best
structure of the DEEM, the parameter searching experiment of the DEEM is conducted in
two stages. In the first stage, we fix the numbers of the hidden nodes in the ELMs, while
changing the numbers of the hidden layers and the nodes in each hidden layer of the DBN. In
the second stage, the selected best structure for the DBN in the first stage is fixed, while the
numbers of the nodes in the hidden layer of the premier and integration ELMs are changed.
Here, the original data are utilized to determine the best structure of DEEM for evaluation
of the proposed method.
In the first stage, we set the number of hidden layers from 1 to 7, and change the number
of the nodes in each hidden layer of the DBN from 50 to 800 at interval of 50. The predicted
performance of each DEEM with different number of hidden layers and hidden nodes is
evaluated under the criteria of MAE. Figure 7(a) shows the MAEs of different DEEMs. The
lower value of MAE means better forecasting performance of DEEM. It is clear that when
we choose 5 hidden layers and 750 hidden nodes in each layer of the DBN, MAE reaches the
minimal value throughout all of the results.
In the second stage, we fix the structure of the DBN with 5 hidden layers and 750 hidden
15
(a) (b)
Figure 7: (a) The MAEs of the DEEMs with different numbers of hidden layer and hidden nodes in DBN
when the premier and integration ELMs are fixed in the first experiment, (b) The MAEs of the DEEMs
with different numbers of hidden nodes in premier and integration ELMs when the DBN is fixed in the first
experiment.
nodes in each layer, while the number of hidden nodes in the premier ELMs are set from 5 to
50 at the interval of 5, and the number of hidden nodes in the integration ELM is changed
from 10 to 100 at the interval of 10. As the first stage, the performances of all of the DEEMs
with different number of hidden nodes in premier and integration ELMs are compared under
the criteria of MAE. Figure 7(b) shows the MAE comparison of such DEEMs. It can be
seen from this figure that the best premier and integration ELMs have 50 and 35 hidden
nodes respectively.
(c) Configuration of the other comparative models
To achieve the rationality of performance comparison, the optimal parameter searching
processes of the DBN, the ELM and the SVR using the original data are also carried to
accomplish the best performance of the comparative models.
(a) (b)
Figure 8: (a) The MAEs of the DBNs with different numbers of hidden layer and nodes when the regression
part is fixed in the first experiment, (b) Fine details of the forecasting performance in Figure 8(a).
16
0 100 200 300 400 500
35 40 45 50
Number of the hidden nodes of the ELM
MAE
MAE
Polyfit line
(60,33.764)
Figure 9: The MAEs of the ELMs in the first experiment.
In this paper, the adopted DBN is composed of several RBMs and one fully connected
layer for logistic regression. For the DBN, the numbers of the hidden layers and the nodes in
each hidden layer are also key factors affecting the forecasting accuracy. To obtain the best
parameters, the number of the nodes in each hidden layer is also changed from 50 to 800 at
interval of 50, the number of the hidden layers is set from 1 to 7, and the number of hidden
nodes in the regression part changes from 5 to 50 at interval of 5. MAE is selected again
to evaluate the performance of DBNs when the number of hidden layer and the numbers of
nodes in hidden layer and regression part are changed separately. The MAE achieves the
lowest result when the DBN has 2 hidden layers, 650 nodes in each hidden layer, and 35
hidden nodes in the regression part. Figure 8(a) shows the forecasting performance of the
DBNs with different numbers of hidden nodes and layers but fixed regression part. To trace
the important details of Figure 8(a), the key part of Figure 8(a) is zoomed in Figure 8(b).
The best structure of the ELM for comparison is also explored. The number of hidden
nodes in the ELM is set from 10 to 500 at the interval of 10. Figure 9 shows the MAEs of
such ELMs. We can observe that the best ELM has 60 hidden nodes.
For the SVR, we choose the RBF function to be its kernel function again. And, through
testing, we set the penalty and kernel coefficients to be 0.5 and 0.6 respectively.
4.2.2. Experimental Results
Table 1 shows the average values and standard derivations of MAE, RMSE, MAPE,
and r of the forecasting performance using different cyclic feature models when the ELM is
selected to be the prediction model.
Table 2 demonstrates the average values and standard derivations of the MAE, RMSE,
MAPE, and r of the forecasting models considering or without considering the cyclic feature
which is obtained by spectrum analysis. The predicted residential errors of the proposed
DEEM+CF and the other comparative forecasting models are recorded and their kernel
density histograms are shown in Figure 10.
17
Table 1: Forecasting performance using different cyclic feature models when ELM is selected to be the
prediction model in the first experiment.
Model MAE RMSE MAPE(%) r
ELM+Lasso 31.627 ±1.274 44.692 ±1.184 5.044 ±0.189 0.971±1.674 ×103
ELM+Ridge 32.698 ±0.489 44.235±1.141 5.032 ±0.175 0.971±8.751 ×104
ELM+Multi-polynomial 30.468 ±1.230 41.787 ±0.571 4.920 ±0.133 0.975±5.253 ×104
ELM+Spectrum 25.450 ±0.338 39.995 ±0.389 5.121 ±0.137 0.977±3.695 ×104
ELM 34.009 ±1.054 48.165 ±1.159 5.826 ±0.223 0.964±1.701 ×103
Table 2: Performances of the forecasting models in the first experiment (”model+CF” is the model consid-
ering the cyclic feature which is extracted by spectrum analysis).
Model MAE RMSE MAPE(%) r
SVR 36.751 ±0.000 48.065 ±0.000 6.479 ±0.000 0.965±0.000 ×104
SVR+CF 26.544 ±0.000 36.852 ±0.000 4.869 ±0.000 0.982±0.000 ×104
ELM 34.009 ±1.054 48.165 ±1.159 5.826 ±0.223 0.964±1.701 ×103
ELM+CF 25.450 ±0.338 39.995 ±0.389 5.121 ±0.137 0.977±3.695 ×104
DBN 32.197 ±0.593 46.565 ±0.444 5.504 ±0.108 0.966±6.311 ×104
DBN+CF 24.690 ±0.213 34.036 ±0.241 4.497 ±0.058 0.983±2.275 ×104
DEEM 30.462 ±0.450 43.892 ±0.385 5.159 ±0.084 0.970±5.116 ×104
DEEM+CF 23.832 ±0.069 33.259 ±0.109 4.200 ±0.046 0.984±1.071 ×104
4.3. The Second Experiment
4.3.1. Configuration of the Forecasting Models
Similar configuration schemes are utilized in this experiment. Details will be given below.
(a) Configuration of the spectrum function in the second experiment
Figure 11(a) shows the performances of the spectrum functions which have different
number of cyclic waves. According to this figure, the best spectrum function model has 26
trigonometric functions. Figure 11(b) shows the spectrum map of cyclic feature model. We
can see that there are two specific cycles in the period of 2 hours and 6 hours in 26 cycles.
The 26 cycles are combined to illustrate the daily BEC cyclic features in second building.
Figure 11(c) shows the first 500 original BEC data from the second building. Figure 11(d)
presents the remaining stochastic BEC data after removing the cyclic feature.
Besides, the ridge regression, lasso regression and multi-polynomial regression are also
selected as the comparative methods of spectrum function. The configurations of these three
comparative models are as the first experiment.
(b) Configuration of the DEEM
Again, the parameter searching process of the DEEM is constituted by two stages.
In the first stage, the structures of the premier and integration ELMs are fixed, while
the numbers of the hidden nodes and layers of the DBN are changed. Figure 12(a) shows
the MAEs of the DEEMs which have different numbers of hidden nodes and layers in the
DBN. According to Figure 12(a), the MAE of the DEEM obtains the lowest value when the
DBN has 3 hidden layers and 700 nodes in each hidden layer.
18
DEEM
−300 −200 −100 0 100 200
0.000 0.005 0.010 0.015
DBN
−300 −200 −100 0 100 200
0.000 0.005 0.010 0.015
ELM
−400 −200 0 100 200
0.000 0.005 0.010 0.015
SVR
−300 −200 −100 0 100 200
0.000 0.005 0.010 0.015
DEEM+CF
−200 −100 0 100 200
0.000 0.005 0.010 0.015
DBN+CF
−200 −100 0 100 200
0.000 0.005 0.010 0.015
ELM+CF
−200 −100 0 100 200 300
0.000 0.005 0.010 0.015
SVR+CF
−200 −100 0 100 200
0.000 0.005 0.010 0.015
Figure 10: The error histograms of the eight forecasting models in the first experiment.
10 20 30 40
−5.0e+10 1.0e+11 2.0e+11
(a)
3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
BIC
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
(b)
0 200 400 600
Amplitude
0 100 200 300 400 500
300 400 500 600 700
(b)
Originsl Data
0 5000 10000 15000
−600 −200 0 200
(c)
Stochastic Data
Figure 11: (a) The performance of the spectrum functions with different number of trigonometric waves in
the second experiment, (b) The spectrum map of cycle features in the second experiment , (c) The first 500
original BEC data from the second building, (d) The remaining stochastic BEC data after removing the
cyclic feature in the second building.
19
(a) (b)
Figure 12: (a) The MAEs of the DEEMs with different numbers of hidden layers and hidden nodes in DBN
when the ELMs are fixed in the second experiment, (b) The MAEs of the DEEMs with different premier
and integration ELMs when the DBN is fixed in the second experiment.
(a) (b)
Figure 13: (a) The MAEs of the DBNs in the second experiment, (b) Fine details of the forecasting perfor-
mance in Figure 13(a).
In the second stage, the DBN in the DEEM is fixed as determined in the first stage, and,
we change the numbers of the hidden nodes in the premier and integration ELMs. Figure
12(b) illustrates the forecasting performance of such DEEMs with different ELMs. From
this figure, the best premier ELMs have 60 hidden nodes, and the best integration ELM
have 15 hidden nodes.
(c) Configuration of the other comparative models
In the second experiment, aiming at exploring the best structure of the DBN, we evaluate
the performance of different DBNs whose hidden layers and the hidden nodes in each hidden
layer are respectively set from 1 to 7 and from 50 to 800 at the interval of 50. Figure 13(a)
illustrates the MAEs of such DBNs. The best comparative DBN model has two hidden
layers, 300 nodes in each hidden layer, and 30 hidden nodes for regression.
In order to acquire the best structure of the ELM in the second experiment, we change
the number of the hidden nodes from 10 to 500 at the interval of 10. Figure 14 shows the
20
0 100 200 300 400 500
25 30 35 40
Numbe of the hidden nodes of the ELM
MAE
MAE
Polyfit line
(270,24.756)
Figure 14: The MAEs of the ELMs in the second experiment.
Table 3: Performances of the forecasting performance using different cyclic feature models when ELM is
selected to be the prediction model in the second experiment.
Model MAE RMSE MAPE(%) r
Lasso 27.380 ±1.143 42.007 ±0.887 5.840 ±0.271 0.965±1.400 ×103
Ridge 27.313 ±1.082 41.678 ±2.143 6.492 ±0.205 0.973±2.670 ×103
Multi-polynomial 25.146 ±1.082 38.917±1.665 5.330 ±0.342 0.969±2.576 ×103
Spectrum function 23.343 ±0.637 34.343 ±1.502 4.935 ±0.346 0.977±3.111 ×103
MAEs of such ELMs. According to this figure, the best ELM has 270 hidden nodes.
Furthermore, the optimal structure exploration procedures for the SVR are as the first
experiment. For the SVR, we also utilzie the RBF activation function, and set the penalty
and kernel coefficients to be 0.7 and 0.5 respectively.
4.3.2. Experimental results
In this experiment, the prediction of ELM using different cyclic features extracted by
spectrum analysis, lasso regression, ridge regression and multi-polynomial were performed
for 10 times separately again. Table 3 presents the average of MAE, RMSE, MAPE, and r of
the forecasting performance of the ELM using different cyclic feature models in the second
experiment.
Besides, the proposed DEEM+CF, the proposed DEEM, the DBN+CF, the DBN, the
ELM+CF, the ELM+CF, the ELM, the SVR+CF and the SVR were also conducted for 10
times again. The MAE, RMSE, MAPE, and r are chosen to be the comparative indices too.
Table 4 lists the comparison results of these models.
The forecasting errors of these eight models are also recorded. The kernel density his-
tograms of the forecasting errors of the eight forecasting models are displayed in Figure 15.
4.4. Comparison and Discussion
From the figures and tables above, we have the following observations and conclusions.
21
Table 4: Performances of the forecasting models in the second experiment.
Model MAE RMSE MAPE(%) r
SVR 38.700 ±0.000 50.790 ±0.000 8.260 ±0.000 0.954 ±0.000 ×104
SVR+CF 22.124 ±0.000 31.620 ±0.000 5.237 ±0.000 0.980 ±0.000 ×104
ELM 26.937 ±2.466 44.522 ±2.760 5.566 ±0.541 0.962 ±4.930 ×103
ELM+CF 22.449 ±0.975 32.413 ±1.356 4.755 ±0.243 0.978 ±2.414 ×103
DBN 25.219 ±0.855 42.235 ±0.360 5.241 ±0.064 0.966 ±6.069 ×104
DBN+CF 20.252 ±0.212 29.683 ±0.170 4.584 ±0.052 0.982 ±2.275 ×104
DEEM 23.793 ±0.141 41.032 ±0.228 4.875 ±0.045 0.968 ±3.702 ×104
DEEM+CF 19.063 ±0.132 28.685 ±0.122 4.247 ±0.041 0.983 ±1.071 ×104
DEEM
−300 −200 −100 0 100 200
0.000 0.005 0.010 0.015 0.020
DBN
−300 −200 −100 0 100 200
0.000 0.005 0.010 0.015 0.020
ELM
−300 −200 −100 0 100 200
0.000 0.005 0.010 0.015 0.020
SVR
−300 −200 −100 0 100 200
0.000 0.002 0.004 0.006 0.008 0.010 0.012
DEEM+CF
−200 −100 0 100
0.000 0.005 0.010 0.015 0.020
DBN+CF
−200 −100 0 100
0.000 0.005 0.010 0.015 0.020
ELM+CF
−200 −100 0 100
0.000 0.005 0.010 0.015 0.020
SVR+CF
−200 −100 0 100
0.000 0.005 0.010 0.015 0.020
Figure 15: The error histograms of the eight forecasting models in the second experiment.
From Figures 6 (c) (d) and 11 (c) (d), we can see that the original data has clear
stable and periodic feature, but the remaining data is much more stochastic than the
original data.
From Figures 7(a), 12(a), 8 and 13, it can be clearly seen that, with the increase of the
hidden layers of the DBN, the MAE value of the DEEMs has a downtrend, but the
MAE of the pure DBN model increases rapidly when the number of the hidden layer is
higher than a threshold. Consequently, we can conclude that the full utilization of the
extracted features from different layers of the DBN can help to achieve more accurate
forecasting performance.
According to Figures 7(b) and 12(b), the number of hidden nodes in the premier and
integration ELMs can influence the performance of the DEEM, and the hidden nodes
22
in the premier ELMs has greater influence compared with those in the integration
ELM.
Table 1 and Table 3 present that the prediction utilizing cyclic feature extracted by
spectrum analysis obtain the best performance. Utilizing ridge regression, lasso regres-
sion, and multi polynomial models to extract the cyclic features can also improve the
forecasting performance of BEC. From our results the multi-polynomial also performs
better than the other two models.
Table 2 and Table 4 demonstrate that the accuracy of the forecasting models consid-
ering the cyclic feature extracted by spectrum analysis is much better than those that
do not combine the cyclic feature. According to the criteria of MAE, in the first exper-
iment, the DEEM+CF, DBN+CF, ELM+CF and SVR+CF are respectively 21.765%,
23.316%, 25.167%, 27.773% better than the DEEM, DBN, ELM and SVR which don’t
consider the cyclic feature. In the second experiment, the DEEM+CF, DBN+CF,
ELM+CF and SVR+CF are respectively 19.880%, 19.695%, 16.661%, 42.832% bet-
ter. In addition, the DEEM+CF has the best accuracy, and it is 3.475%, 6.538%,
10.217% better than the DBN+CF, ELM+CF and SVR+CF respectively in the first
experiment, and in the second experiment 5.871%, 15.083%, 13.836% better than these
three comparative models. Furthermore, according to the standard derivations of the
evaluation indices, the forecasting performance of the DEEM+CF is relatively more
stable in contrast to the other comparative models except the SVR.
From Figures 10 and 15, we can clearly observe that there exist more errors around zero
in the histograms of the models that consider the cyclic feature. This also implies that
the cyclic feature can improve the forecasting accuracy. Again, the error histograms of
the DEEM+CF are the narrowest and highest ones which also imply the most accurate
forecasting performance of this proposed model.
Overall, the prediction models, that consider the cyclic feature, have much higher fore-
casting accuracy than the models that are directly trained by the original data, and the
proposed DEEM+CF performs more stably and accurately compared with the other mod-
els. This proves that the cyclic feature has great influence on the accuracy promotion of the
BEC forecasting, and the full utilization of the abstracted features from all the layers of the
DBN is also useful and helpful to improve the forecasting performance.
5. Conclusion
Short-term forecasting of BEC is helpful to the real-time building energy-demand re-
sponse, the energy planning and the building management. In this paper, a novel deep
belief network and extreme learning machine based ensemble method considering the cyclic
feature is proposed to promote the accuracy of half hourly BEC forecasting. In the proposed
ensemble model, the stable component – the cyclic feature of the BEC is extracted via the
spectrum analysis, while the remaining stochastic component after removing the stable com-
ponent from the original BEC data is used to construct the DEEM. Two experiments are
23
performed to prove the effectiveness and superiorities of the proposed DEEM+CF model.
As demonstrated by the experimental results and comparisons, the cyclic feature can im-
prove the prediction performance for about 20% better than those without the utilization of
cyclic feature, and what’s more, the proposed DEEM+CF model has much higher accuracy
than the other comparative models, separately 3%, 6%, 10% better at least than DBN+CF,
ELM+CF and SVR+CF under the criteria of MAE in our experiments.
In this study, we achieve the parameter optimization of the DEEM via fixing and changing
the structures of the ELMs and the DBN alternately. However, the parameter optimization
method is not the best, and it still needs further exploration. Besides, the cyclic features
are greatly related to occupancy which is one of the key factors in BEC forecasting. It is
valuable to study and apply the relationships between them. In the future, the relationships
and their applications will be one of our key researches.
Acknowledgments
This study is partly supported by the National Natural Science Foundation of China
(61573225), the Taishan Scholar Project of Shandong Province (TSQN201812092), the Key
Research and Development Program of Shandong Province (2019GGX101072), the State
Scholarship Fund and the Youth Innovation Technology Project of Higher School in Shan-
dong Province (2019KJN005).
References
[1] Y. Ye, W. Zuo, G. Wang, A comprehensive review of energy-related data for U.S. commercial buildings,
Energy and Buildings 186 (2019) 126–137. doi:10.1016/j.enbuild.2019.01.020.
[2] Y. Lou, W. M. Jayantha, L. Shen, Z. Liu, T. Shu, The application of low-carbon city (lcc) indicatorsa
comparison between academia and practice, Sustainable Cities and Society 51 (2019) 101677. doi:
https://doi.org/10.1016/j.scs.2019.101677.
[3] C. Fan, G. Huang, Y. Sun, A collaborative control optimization of grid-connected net zero energy
buildings for performance improvements at building group level, Energy 164 (2018) 536 – 549. doi:
https://doi.org/10.1016/j.energy.2018.09.018.
[4] P. Huang, Y. Sun, A collaborative demand control of nearly zero energy buildings in response to
dynamic pricing for performance improvements at cluster level, Energy 174 (2019) 911 – 921. doi:
https://doi.org/10.1016/j.energy.2019.02.192.
[5] Y. Ye, K. Hinkelman, J. Zhang, W. Zuo, G. Wang, A methodology to create prototypical building
energy models for existing buildings: A case study on us religious worship buildings, Energy and
Buildings 194 (2019) 351 – 365. doi:https://doi.org/10.1016/j.enbuild.2019.04.037.
[6] Y. Fu, W. Zuo, M. Wetter, J. W. VanGilder, X. Han, D. Plamondon, Equation-based object-oriented
modeling and simulation for data center cooling: A case study, Energy and Buildings 186 (2019) 108–
125. doi:10.1016/j.enbuild.2019.01.018.
[7] J. Chambers, P. Hollmuller, O. Bouvard, A. Schueler, J.-L. Scartezzini, E. Azar, M. K. Patel, Evaluating
the electricity saving potential of electrochromic glazing for cooling and lighting at the scale of the
swiss non-residential national building stock using a monte carlo model, Energy 185 (2019) 136 – 147.
doi:https://doi.org/10.1016/j.energy.2019.07.037.
[8] K. Amasyali, N. M. El-Gohary, A review of data-driven building energy consumption prediction studies,
Renewable and Sustainable Energy Reviews 81 (2018) 1192–1205. doi:10.1016/j.rser.2017.04.095.
24
[9] Y. Zhou, S. Zheng, G. Zhang, Machine learning-based optimal design of a phase change material
integrated renewable system with on-site pv, radiative cooling and hybrid ventilationsstudy of modelling
and application in five climatic regions, Energy 192 (2020) 116608. doi:https://doi.org/10.1016/
j.energy.2019.116608.
[10] M. Alizamir, S. Kim, O. Kisi, M. Zounemat-Kermani, A comparative study of several machine learning
based non-linear regression methods in estimating solar radiation: Case studies of the usa and turkey
regions, Energy (2020) 117239doi:https://doi.org/10.1016/j.energy.2020.117239.
[11] H. Liu, B. Xu, D. Lu, G. Zhang, A path planning approach for crowd evacuation in buildings based
on improved artificial bee colony algorithm, Applied Soft Computing 68 (2018) 360 – 376. doi:https:
//doi.org/10.1016/j.asoc.2018.04.015.
[12] S. Lu, Q. Li, L. Bai, R. Wang, Performance predictions of ground source heat pump system based on
random forest and back propagation neural network models, Energy Conversion and Management 197
(2019) 111864. doi:https://doi.org/10.1016/j.enconman.2019.111864.
[13] H. C. Jung, J. S. Kim, H. Heo, Prediction of building energy consumption using an improved real
coded genetic algorithm based least squares support vector machine approach, Energy and Buildings
90 (2015) 76–84. doi:10.1016/j.enbuild.2014.12.029.
[14] Y. Xu, M. Zhang, L. Ye, Q. Zhu, Z. Geng, Y. He, Y. Han, A novel prediction intervals method
integrating an error & self-feedback extreme learning machine with particle swarm optimization for
energy consumption robust prediction, Energy 164 (2018) 137–146. doi:10.1016/j.energy.2018.08.
180.
[15] Y. Huang, Y. Yuan, H. Chen, J. Wang, Y. Guo, T. Ahmad, A novel energy demand prediction strategy
for residential buildings based on ensemble learning, Energy Procedia 158 (2019) 3411–3416. doi:
10.1016/j.egypro.2019.01.935.
[16] Y. Lv, Y. Duan, W. Kang, Z. Li, F. Wang, Traffic flow prediction with big data: A deep learning
approach, IEEE Transactions on Intelligent Transportation Systems 16 (2) (2015) 865–873. doi:
10.1109/TITS.2014.2345663.
[17] X. Xue, Y. Guo, S. Chen, S. Wang, Analysis and controlling of manufacturing service ecosystem: A
research framework based on the parallel system theory, IEEE Transactions on Services Computing
(Early Access)doi:10.1109/TSC.2019.2917445.
[18] X. Xue, S. Wang, L. Zhang, Z. Feng, Y. Guo, Social learning evolution (sle): Computational experiment-
based modeling framework of social manufacturing, IEEE Transactions on Industrial Informatics 15 (6)
(2019) 3343–3355. doi:10.1109/TII.2018.2871167.
[19] X. Xue, H. Han, S. Wang, C. Qin, Computational experiment-based evaluation on context-aware o2o
service recommendation, IEEE Transactions on Services Computingdoi:10.1109/TSC.2016.2638083.
[20] C. Tian, C. Li, G. Zhang, Y. Lv, Data driven parallel prediction of building energy consumption using
generative adversarial nets, Energy and Buildings 186 (2019) 230–243. doi:10.1016/j.enbuild.2019.
01.034.
[21] G. Fu, Deep belief network based ensemble approach for cooling load forecasting of air-conditioning
system, Energy 148 (2018) 269–282. doi:10.1016/j.energy.2018.01.180.
[22] C. Li, Z. Ding, J. Yi, Y. Lv, G. Zhang, Deep belief network based hybrid model for building energy
consumption prediction, Energies 11 (1) (2018) 242. doi:10.3390/en11010242.
[23] C. Fan, Y. Sun, Y. Zhao, M. Song, J. Wang, Deep learning-based feature engineering methods for
improved building energy prediction, Applied Energy 240 (2019) 35 – 45. doi:https://doi.org/10.
1016/j.apenergy.2019.02.052.
[24] M. Ashouri, B. C. Fung, F. Haghighat, H. Yoshino, Systematic approach to provide building occupants
with feedback to reduce energy consumption, Energy 194 (2020) 116813. doi:https://doi.org/10.
1016/j.energy.2019.116813.
[25] J. W. Boland, Generation of synthetic sequences of electricity demand with applications, in: Filar J.,
Haurie A. (eds) Uncertainty and Environmental Decision Making. International Series in Operations
Research & Management Science, Springer, Boston, MA, 2010, pp. 275–314. doi:https://doi.org/
10.1007/978-1-4419-1129- 2_10.
25
[26] G. E. Hinton, A practical guide to training restricted boltzmann machines, Neural Networks: Tricks of
the Trade 7700 (2012) 599–619. doi:10.1007/978-3-642- 35289-8_32.
[27] T. Tieleman, G. E. Hinton, Using fast weights to improve persistent contrastive divergence, in: Pro-
ceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 1033–1040.
doi:http://doi.acm.org/10.1145/1553374.1553506.
[28] G. Huang, Q. Zhu, C. K. Siew, Extreme learning machine: Theory and applications, Neurocomputing
70 (1) (2006) 489–501. doi:10.1016/j.neucom.2005.12.126.
[29] A. D. Lima, L. F. Silveira, S. X. de Souza, Spectrum sensing with a parallel algorithm for cyclostationary
feature extraction, Computers & Electrical Engineering 71 (2018) 151 – 161. doi:https://doi.org/
10.1016/j.compeleceng.2018.07.016.
[30] J. Kostrzewa, Time series forecasting using clustering with periodic pattern, in: 2015 7th Inter-
national Joint Conference on Computational Intelligence (IJCCI), Vol. 3, 2015, pp. 85–92. doi:
10.5220/0005586900850092.
[31] A. I. Taiwo, T. O. Olatayo, A. F. Adedotun, Modeling and forecasting periodic time series data with
fourier autoregressive model, Iraqi Journal of Science (2019) 1367–1373.
[32] Y. Zou, X. Hua, Y. Zhang, Y. Wang, Hybrid short-term freeway speed prediction methods based
on periodic analysis, Canadian Journal of Civil Engineering 42 (8) (2015) 570–582. doi:10.1139/
cjce-2014-0447.
[33] J. Tang, F. Liu, Y. Zou, W. Zhang, Y. Wang, An improved fuzzy neural network for traffic speed
prediction considering periodic characteristic, IEEE Transactions on Intelligent Transportation Systems
18 (9) (2017) 2340–2350.
[34] W. Zhang, Y. Zou, J. J. Tang, Y. Wang, Short-term prediction of vehicle waiting queue at ferry
terminal based on machine learning method, Journal of Marine Science & Technology 21 (4) (2016)
1–13. doi:10.1007/s00773-016-0385-y.
[35] R. Li, P. Jiang, H. Yang, C. Li, A novel hybrid forecasting scheme for electricity demand time series, Sus-
tainable Cities and Society 55 (2020) 102036. doi:https://doi.org/10.1016/j.scs.2020.102036.
[36] C. Liu, K. Gryllias, A semi-supervised support vector data description-based fault detection method for
rolling element bearings based on cyclic spectral analysis, Mechanical Systems and Signal Processing
140 (2020) 106682. doi:https://doi.org/10.1016/j.ymssp.2020.106682.
[37] J. Boland, Characterising seasonality of solar radiation and solar farm output, Energies 13 (2) (2020)
471. doi:https://doi.org/10.3390/en13020471.
[38] J. Boland, A. Grantham, Nonparametric conditional heteroscedastic hourly probabilistic forecasting of
solar radiation, J-Multidisciplinary Scientific Journal 1 (1) (2018) 174–191. doi:https://doi.org/
10.3390/j1010016.
[39] F. Al-Obeidat, B. Spencer, O. Alfandi, Consistently accurate forecasts of temperature within buildings
from sensor data using ridge and lasso regression, Future Generation Computer Systemsdoi:https:
//doi.org/10.1016/j.future.2018.02.035.
[40] A. Satre-Meloy, Investigating structural and occupant drivers of annual residential electricity con-
sumption using regularization in regression models, Energy 174 (2019) 148 – 168. doi:https:
//doi.org/10.1016/j.energy.2019.01.157.
[41] C. Fan, Y. Ding, Cooling load prediction and optimal operation of HVAC systems using a multiple
nonlinear regression model, Energy and Buildings 197 (2019) 7 – 17. doi:https://doi.org/10.1016/
j.enbuild.2019.05.043.
26
... The area of feature engineering covers a wide number of methods, such as feature creation, feature expansion [ 5] or feature selection [ 3]. Feature creation includes encodings of time-based features, such as cyclic features [ 19], or categorical encoding [ 11]. Similarly, feature expansion is the method of creating new features based on existing features. ...
... -Cyclic Features: Cyclic features can be used to model time values through periodic functions [ 19]. In the implementation, sinusoidal signals.x ...
... Such factors include thermal characteristics and Heating, Ventilation, Air Conditioning and Cooling (HVAC) system behavior [ 13]. Additionally, building energy demand may be dependent on occupancy [ 8] or subject to seasonal trends [ 19]. Many of these factors show non-linear or dynamic behavior, which makes it difficult to address them through a purely linear model. ...
Chapter
Full-text available
Data-driven modeling is an approach in energy systems modeling that has been gaining popularity. In data-driven modeling, machine learning methods such as linear regression, neural networks or decision-tree based methods are applied. While these methods do not require domain knowledge, they are sensitive to data quality. Therefore, improving data quality in a dataset is beneficial for creating machine learning-based models. The improvement of data quality can be implemented through preprocessing methods. A selected type of preprocessing is feature engineering, which focuses on evaluating and improving the quality of certain features inside the dataset. Feature engineering includes methods such as feature creation, feature expansion, or feature selection. In this work, a Python framework containing different feature engineering methods is presented. This framework contains different methods for feature creation, expansion and selection; in addition, methods for transforming or filtering data are implemented. The implementation of the framework is based on the Python library scikit-learn . The framework is demonstrated on a use case from energy demand prediction. A data-driven model is created including selected feature engineering methods. The results show an improvement in prediction accuracy through the engineered features.
... Many studies have employed deep learning to predict the energy consumption of buildings. As shown in Table 1, the architectures used include RNN and its variants [2,[17][18][19]47,48], convolutionbased [6,48], attention-based [20,21], deep belief network (DBN) [43,45], and hybrid models [9,42,44] that combine multiple architectures. Both one-step ahead [3,17,18,[42][43][44][45] and multi-horizon forecasting [2,6,9,[19][20][21]43,48] were investigated. ...
... As shown in Table 1, the architectures used include RNN and its variants [2,[17][18][19]47,48], convolutionbased [6,48], attention-based [20,21], deep belief network (DBN) [43,45], and hybrid models [9,42,44] that combine multiple architectures. Both one-step ahead [3,17,18,[42][43][44][45] and multi-horizon forecasting [2,6,9,[19][20][21]43,48] were investigated. However, most studies only performed point forecasts. ...
Article
Full-text available
Building energy forecasting facilitates optimizing daily operation scheduling and long-term energy planning. Many studies have demonstrated the potential of data-driven approaches in producing point forecasts of energy use. Despite this, little work has been undertaken to understand uncertainty in energy forecasts. However, many decision-making scenarios require information from a full conditional distribution of forecasts. In addition, recent advances in deep learning have not been fully exploited for building energy forecasting. Motivated by these research gaps, this study contributes in two aspects. First, this study has adapted and applied state-of-the-art deep learning architectures to address the problem of multi-horizon building energy forecasting. Eight different methods, including seven deep learning-based ones, were investigated to develop models to perform both point and probabilistic forecasts. Second, a comprehensive case study was conducted in two public historic buildings with different operating modes, namely the City Museum and the City Theatre, in Norrköping, Sweden. The performance of the developed models was evaluated, and the predictability of different scenarios of energy consumption was studied. The results show that incorporating future information on exogenous factors that determine energy use is critical for making accurate multi-horizon predictions. Furthermore, changes in the operating mode and activities held in a building bring more uncertainty in energy use and deteriorate the prediction accuracy of models. The temporal fusion transformer (TFT) model exhibited strong competitiveness in performing both point and probabilistic forecasts. As assessed by the coefficient of variance of the root mean square error (CV-RMSE), the TFT model outperformed other models in making point forecasts of both types of energy use of the City Museum (CV-RMSE 29.7% for electricity consumption and CV-RMSE 8.7% for heating load). When making probabilistic predictions, the TFT model performed best to capture the central tendency and upper distribution of heating load of the City Museum as well as both types of energy use of the City Theatre. The predictive models developed in this study can be integrated into digital twin models of buildings to discover areas where energy use can be reduced, optimize building operations, and improve overall sustainability and efficiency.
... According to [31], irrespective of the method employed to improve building energy efficiency, whether applying an energy policy regulation, developing building energy prediction model (BEPM), or creating a green certification program, an essential prerequisite is the comprehension of the factors influencing energy consumed in the building stock. BEPM is measured as the most promising solution for improving energy efficiency [32][33][34]. Researchers have developed various building energy prediction models (BEPM) using several factors such as the building envelope (i.e., wall and roof), among others [7,32,35]. ...
Article
Full-text available
A prerequisite for decreasing the intensification of energy in buildings is to evaluate and understand the influencing factors of building energy performance (BEP). These factors include building envelope features and outdoor climactic conditions, among others. Based on the importance of the influencing factors in the development of the building energy prediction model, various researchers are continuously employing different types of factors based on their popularity in academic literature, without a proper investigation of the most relevant factors, which, in some cases, potentially leads to poor model performance. However, this can be due to the absence of an adequate comprehensive analysis or review of all factors influencing BEP ubiquitously. Therefore, this paper conducts a holistic and comprehensive review of studies that have explored the various factors influencing energy use in residential and commercial buildings. In total, 74 research articles were systematically selected from the Scopus, ScienceDirect, and Institute of Electrical Electronics Engineers (IEEE) databases. Subsequently, by means of a systematic and bibliometric analysis, this paper comprehensively analyzed several important factors influencing BEP. The results reveals the important factors (such as windows and roofs) and engendered or shed light on the application of some energy-efficient strategies such as the utilization of a green roof and photovoltaic (PV) window, among others.
... The core idea behind time series forecasting is to identify patterns and trends within the data and then use these patterns to make predictions about future values [61]. Ensemble learning methods can be highly effective for time series forecasting tasks as they combine multiple models' predictions to improve overall accuracy and robustness [62]. ...
Article
Full-text available
The ability to predict and preempt insulator failures holds the potential to enhance the reliability of electrical power grids. The increase in insulator leakage current is an indication that failures may occur. By harnessing historical data and employing time series forecasting models, it is possible to identify potential faults before they escalate into disruptive failures. In this paper, a hybrid model for time series prediction is proposed by combining the Christiano–Fitzgerald random walk filter for signal denoising with an ensemble bootstrap aggregation model for leakage current forecasting. A comparison between bootstrap aggregation, boosting, random subspace, and stacked generalization ensemble learning models is presented. With a root mean square error of 7.62 ×10-4\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 10^{-4}$$\end{document} (in a statistical evaluation), the ensemble bootstrap aggregation model with Christiano–Fitzgerald random walk filter proved to be a promising approach to be applied for time series fault forecasting. The proposed method was shown to be more promising than the original ensemble bootstrap aggregation model and the long short-term memory.
Chapter
Full-text available
Pultrusion of continuous fiber reinforced profiles has been state of the art for several decades. However, pultrusion in the production environment so far has no or only few sensor data in a heterogeneous sensor environment, has shortcomings in data quality (time synchronization, different formats, different sampling rates, sporadic disconnections and resulting data losses), and insufficient data processing methods are used for process control and optimization. Significant efficiency improvements would therefore still be possible in this respect. The question therefore arises as to how a data acquisition system can be designed that provides heterogeneous data in a standardized and reliable manner for data-based process development and optimization. The aim is therefore to develop a digitized and standardized data acquisition prototype for the continuous production of sustainable profile structures so flexible that the various components of the production line can be adaptively combined in a central data acquisition system. We have therefore screened and selected possible standardized transmission formats in combination with suitable low-cost data acquisition systems for a highly reliable and secure application area. The paper shows a profound concept for a data acquisition prototype based on different low-cost control systems for the applicability and testability of the developed requirements regarding data acquisition, processing, and future storage.
Article
The significance of energy efficiency in the development of smart cities cannot be overstated. It is essential to have a clear understanding of the current energy consumption (EC) patterns in both public and private buildings. One way to achieve this is by employing machine learning classification algorithms, which offer a broader perspective on the factors influencing EC. These algorithms can be applied to real data from databases, making them valuable tools for smart city applications. In this paper, our focus is specifically on the EC of public schools in a Portuguese city, as this plays a crucial role in designing a Smart City. By utilizing a comprehensive dataset on school EC, we thoroughly evaluate multiple ML algorithms. The objective is to identify the most effective algorithm for classifying average EC patterns. The outcomes of this study hold significant value for school administrators and facility managers. By leveraging the predictions generated from the selected algorithm, they can optimize energy usage and, consequently, reduce costs. The use of a comprehensive dataset ensures the reliability and accuracy of our evaluations of various ML algorithms for EC classification.
Article
Full-text available
Urbanization increases electricity demand due to population growth and economic activity. To meet consumer’s demands at all times, it is necessary to predict the future building energy consumption. Power Engineers could exploit the enormous amount of energy-related data from smart meters to plan power sector expansion. Researchers have made many experiments to address the supply and demand imbalance by accurately predicting the energy consumption. This paper presents a comprehensive literature review of forecasting methodologies used by researchers for energy consumption in smart buildings to meet future energy requirements. Different forecasting methods are being explored in both residential and non-residential buildings. The literature is further analyzed based on the dataset, types of load, prediction accuracy, and the evaluation metrics used. This work also focuses on the main challenges in energy forecasting due to load fluctuation, variability in weather, occupant behavior, and grid planning. The identified research gaps and the suitable methodology for prediction addressing the current issues are presented with reference to the available literature. The multivariate analysis in the suggested hybrid model ensures the learning of repeating patterns and features in the data to enhance the prediction accuracy.
Article
Full-text available
With the recent rapid increase in the use of roof top photovoltaic solar systems worldwide, and also, more recently, the dramatic escalation in building grid connected solar farms, especially in Australia, the need for more accurate methods of very short-term forecasting has become a focus of research. The International Energy Agency Tasks 46 and 16 have brought together groups of experts to further this research. In Australia, the Australian Renewable Energy Agency is funding consortia to improve the five minute forecasting of solar farm output, as this is the time scale of the electricity market. The first step in forecasting of either solar radiation or output from solar farms requires the representation of the inherent seasonality. One can characterise the seasonality in climate variables by using either a multiplicative or additive modelling approach. The multiplicative approach with respect to solar radiation can be done by calculating the clearness index, or alternatively estimating the clear sky index. The clearness index is defined as the division of the global solar radiation by the extraterrestrial radiation, a quantity determined only via astronomical formulae. To form the clear sky index one divides the global radiation by a clear sky model. For additive de-seasoning, one subtracts some form of a mean function from the solar radiation. That function could be simply the long term average at the time steps involved, or more formally the addition of terms involving a basis of the function space. An appropriate way to perform this operation is by using a Fourier series set of basis functions. This article will show that for various reasons the additive approach is superior. Also, the differences between the representation for solar energy versus solar farm output will be demonstrated. Finally, there is a short description of the subsequent steps in short-term forecasting.
Article
Full-text available
Most frequently used models for modeling and forecasting periodic climatic time series do not have the capability of handling periodic variability that characterizes it. In this paper, the Fourier Autoregressive model with abilities to analyze periodic variability is implemented. From the results, FAR(1), FAR(2) and FAR(2) models were chosen based on Periodic Autocorrelation function (PeACF) and Periodic Partial Autocorrelation function (PePACF). The coefficients of the tentative model were estimated using a Discrete Fourier transform estimation method. FAR(1) models were chosen as the optimal model based on the smallest values of Periodic Akaike (PAIC) and Bayesian Information criteria (PBIC). The residual of the fitted models was diagnosed to be white noise. The in-sample forecast showed a close reflection of the original rainfall series while the out-sample forecast exhibited a continuous periodic forecast from January 2019 to December 2020 with relatively small values of Periodic Root Mean Square Error (PRMSE), Periodic Mean Absolute Error (PMAE) and Periodic Mean Absolute Percentage Error (PMAPE). The comparison of FAR(1) model forecast with AR(3), ARMA(2,1), ARIMA(2,1,1) and SARIMA()() model forecast indicated that FAR(1) outperformed the other models as it exhibited a continuous periodic forecast. The continuous monthly periodic rainfall forecast indicated that there will be rapid climate change in Nigeria in the coming yearly and Nigerian Government needs to put in place plans to curtail its effects.
Article
It is clear that the learning speed of feedforward neural networks is in general far slower than required and it has been a major bottleneck in their applications for past decades. Two key reasons behind may be: (1) the slow gradient-based learning algorithms are extensively used to train neural networks, and (2) all the parameters of the networks are tuned iteratively by using such learning algorithms. Unlike these conventional implementations, this paper proposes a new learning algorithm called extreme learning machine (ELM) for single-hidden layer feedforward neural networks (SLFNs) which randomly chooses hidden nodes and analytically determines the output weights of SLFNs. In theory, this algorithm tends to provide good generalization performance at extremely fast learning speed. The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn thousands of times faster than conventional popular learning algorithms for feedforward neural networks.1
Article
In this study, the potential of six different machine learning models, gradient boosting tree (GBT), multilayer perceptron neural network (MLPNN), two types of adaptive neuro-fuzzy inference systems (ANFIS) based on fuzzy c-means clustering (ANFIS-FCM) and subtractive clustering (ANFIS-SC), multivariate adaptive regression spline (MARS), and classification and regression tree (CART) were used for forecasting solar radiation from two stations of two different locations, Turkey and USA. Wind speed, maximum air temperature, minimum air temperature and relative humidity were used as inputs to the developed models. For accurate evaluation of performance of models, four statistical indicators, root mean squared error (RMSE), coefficient of correlation (R), mean absolute error (MAE) and Nash–Sutcliffe efficiency coefficient (NS) were employed to evaluate accuracy of the developed models. Comparison of results showed that the GBT model performed better than the MLPNN, ANFIS, MARS, and CART in modeling solar radiation. The average RMSE of MLPNN, ANFIS-FCM, ANFIS-SC, MARS and CART models was decreased by 0.26%, 1.5%, 0.51%, 2.5%, and 19.34% using GBT model at Fairfield Station, 4%, 1.37%, 0.24%, 4.12%, and 24.4% at Monmouth Station, 11.99%, 48.7%, 41.6%, 8.23%, and 33.41% at Antalya Station, 11%, 54.8%, 51.9%, 19.65%, and 37.1% at Mersin Station, respectively. The overall results indicated that the GBT model could be successfully applied in forecasting solar radiation by using climatic parameters as inputs.
Article
The early and accurate detection of rolling element bearing faults, closely linked to the timely maintenance and repair before a sudden breakdown, is still one of the key challenges in the area of condition monitoring. Nowadays advanced signal processing techniques are combined with high level machine learning approaches, focusing towards automatic fault diagnosis. A plethora of Health Indicators (HIs) have been proposed to feed in machine learning models in order to track the system degradation. Cyclic Spectral Analysis (CSA), including Cyclic Spectral Correlation (CSC) and Cyclic Spectral Coherence (CSCoh), has been proved as powerful tools in rotation machinery signal processing. Due to the periodic mechanism of bearing fault impacts, the HIs extracted from the Cyclostationary (CS) domain can expose bearing defects even in premature stage. On the other hand, supervised machine learning approaches with labelled training and testing datasets cannot be realistically obtained under industrial conditions. In order to overcome this limitation, a novel semisupervised Support Vector Data Description (SVDD) with negative samples (NSVDD) fault detection approach is proposed in this paper. The NSVDD model utilizes CS indicators to build the feature space, and fits a hyper-sphere to calculate the Euclidean distances in order to isolate the healthy and faulty data. An uniform object generation method is adopted to generate artificial outliers as negative samples for the NSVDD. A systematic fault detection decision strategy is proposed to estimate the bearing status simultaneously with the detection of fault initiation. Furthermore, a multi-level anomaly detection framework is built based on data at i) single sensor level, ii) machine level and ii) entire machine fleet level. Three run-to-failure bearing datasets including signals from twelve bearings are used to implement the proposed fault detection methodology. Results show that, the CS based indicators outperform time and Fast Kurtogram (FK) based Squared Envelope Spectrum (SES) indicators. Moreover, the proposed NSVDD model show superior characteristics in anomaly detection comparing to Back-Propagation Neural Network, random forest and K-Nearest Neighbor.
Article
Electricity demand/load forecasting always plays a vital role in the management and operation of power systems, since it can help develop an optimal action program for power producers, end-consumers and government entities. Inaccurate prediction may cause an additional production or waste of resources due to high operational costs. This paper investigated the benefit of combining data features to produce short-term electricity demand forecast. The nature of the electricity usually presents the complex characteristic and obvious seasonal tendency. In this paper, the advantage of adaptive Fourier decomposition is firstly used to extract the fluctuation characteristics. Then, the condition of the linear and stationary sequence is satisfied and the sub-series are performed to measure and eliminate the seasonal pattern. In the process of seasonal adjustment, the average periodicity length is identified quantitatively. In addition, to realize the generalization performance on real electricity demand data, the sine cosine optimization algorithm is applied to select the penalty and kernel parameters of support vector machine. The empirical study showed that the superior property of the proposed hybrid method profits from the effect of data pretreatment and the findings prove that this hybrid modeling scheme can yield promising prediction results within acceptable computational complexity.
Article
Many technical solutions have been developed to reduce buildings’ energy consumption, but limited efforts have been made to adequately address the role or action of building occupants in this process. Our earlier investigations have shown that occupants play a significant role in buildings’ energy consumption: It was shown that savings of up to 20% could be achieved by modifying occupant behavior thorough direct feedback and recommendations. Studying the role of occupants in building energy consumption requires an understanding of the interrelationships between climatic conditions; building characteristics; and building services and operation. This paper describes the development of a systematic procedure to provide building occupants with direct feedback and recommendations to help them take appropriate action to reduce building energy consumption. The procedure is geared toward developing a Reference Building (RB) (an energy-efficient building) for a specific given building. The RB is then compared against its given building to inform the occupants of the given building how they are using end-use loads and how they can improve them. The RB is generated using a data-mining approach, which involves clustering analysis and neural networks. The framework is based on clustering similar buildings by effects unrelated to occupant behavior. The buildings are then grouped based on their energy consumption, and those with lower consumption are combined to generate the RB. Performance evaluation is determined by comparison of a given building with an RB. This comparison provides feedback that can lead occupants to take appropriate measures (e.g., turning off unnecessary lights or heating, ventilation, and air conditioning (HVAC), etc.) to improve building energy performance. More accurate, scalable, and realistic results are achiveable through current methodology which is shown through comparison with existing literature.
Article
The widespread application of advanced renewable systems with optimal design can promote the cleaner production, reduce the carbon dioxide emission and realise the renewable and sustainable development. In this study, a phase change material integrated hybrid system was demonstrated, involving with advanced energy conversions and multi-diversified energy forms, including solar-to-electricity conversion, active water-based and air-based cooling, and distributed storages. A generic optimization methodology was developed by integrating supervised machine learning and heuristic optimization algorithms. Multivariable optimizations were systematically conducted for widespread application purpose in five climatic regions in China. Results showed that, the energy performance is highly dependent on mass flow rate and inlet cooling water temperature with contribution ratios at around 90% and 7%. Furthermore, compared to Taguchi standard orthogonal array, the machine-learning based optimization can improve the annual equivalent overall output energy from 86934.36 to 90597.32 kWh (by 4.2%) in ShangHai, from 86335.35 to 92719.07 (by 7.4%) in KunMing, from 87445.1 to 91218.3 (by 4.3%) in GuangZhou, from 87278.24 to 88212.83 (by 1.1%) in HongKong, and from 87611.95 to 92376.46 (by 5.4%) in HaiKou. This study presents optimal design and operation of a renewable system in different climatic regions, which are important to realise renewable and sustainable buildings.
Article
With rapid development of artificial intelligence, data-driven prediction models play an important role in energy prediction, fault detection, and diagnosis. This paper proposes an ensemble approach using random forest (RF) for hourly performance predictions of GSHP system. Two years of in situ data were collected in an educational building situated in severe cold area in China. Prediction models were established for performance indicators, and results indicate that the average error for COPs, COPu, EERs and EERu were all controlled within 5%. The model established by small amount of data can accurately predict long-term performance, thereby reducing time and difficulty of data collection. RF models, trained with different parameter settings were compared, results indicate that model accuracy was not very sensitive to variables numbers. The impact of input variables on prediction performance was analyzed, and importance ranking changed with period and performance indicators. By comparing the variable importance list, it was possible to establish which parameters were abnormal and lists of different periods can reflect whether the energy structure of building has changed. The overall superiority of RF was verified by comparing with back propagation neural network (BPNN) from robustness, interpretability, and efficiency. First, since GSHP system involving multiple indicators, the robustness, measured by average accuracy, was used to evaluate the accuracy level. According to CV-RMSE, robustness of RF is approximately 3.3% higher than that of BPNN. Second, RF is highly interpretive but BPNN is typical black box model. Finally, modeling complexity and training time of BPNN were much greater than RF.
Article
Many cities are pursuing low-carbon practices in order to reduce carbon emissions. In line with this, various low-carbon city (LCC) indicator systems have been established across the world. However, there are only few studies available investigating if the established LCC indicators have been effectively utilized in practice. Through a comprehensive literature review, this study composed a list of LCC indicators (LCCIL), which were classified into eight dimensions, namely, economy, energy use, social aspect, carbon and environment, urban mobility, solid waste, water, and land use. The quotation frequency of LCCIL indicators in 10 LCC indicator systems addressed in academia was reviewed. The application frequency of LCCIL indicators in 21 global cities was then examined. A comparative study was then conducted between academia and practice across these eight dimensions of LCCIL. The results reveal that (1) LCCIL indicators have not been effectively utilized in practice; (2) None of the LCCIL indicators related to social aspect has been used in practice; 3) The indicator “total carbon emission” has extensively been applied in practice, but it has not been used in academia. 4) The most popular LCCIL dimension in academia has been energy use, while urban mobility has mostly been in practice. The findings suggest that the applicability of LCC indicators must be considered when establishing a LCC indicator system. The findings provide important reference for further studies in establishing effective LCC indicators to guide development of low-carbon cities.