ArticlePDF Available

Evolutionary Deep Learning Based Energy Consumption Prediction for Buildings

Authors:

Abstract and Figures

Today’s energy resources are closer to consumers due to sustainable energy and advanced technology. To that end, ensuring a precise prediction of energy consumption at the buildings level is vital and significant to manage the consumed energy efficiently using a robust predictive model. Growing concern about reducing the energy consumption of buildings makes it necessary to predict future energy consumption precisely using an optimizable predictive model. Most of the previously proposed methods for energy consumption prediction are conventional prediction methods that are normally designed based on the developer’s knowledge about the hyper-parameters. However, the time lag inputs and the network’s hyper-parameters of learning methods need to be adjusted to have a more accurate prediction. This article proposes a novel hybrid prediction approach based on evolutionary deep learning method that is combining genetic algorithm with Long Short-Term Memory and optimizing its objective function with time window lags and the network’s hidden neurons. The performance of the presented optimization predictive model is investigated using public building datasets of residential and commercial buildings for very short-term prediction and the results indicate that evolutionary deep learning models have better performance than conventional and regular prediction models.
Content may be subject to copyright.
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2887023, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
Evolutionary Deep Learning Based
Energy Consumption Prediction for
Buildings
ABDULAZIZ ALMALAQ, (Member, IEEE), JUN JASON ZHANG (Senior Member, IEEE)
The Department of Electrical and Computer Engineering, University of Denver, Denver, CO, 80237 USA
Corresponding author: Abdulaziz Almalaq (e-mail: abdulaziz.almalaq@du.edu).
ABSTRACT Today’s energy resources are closer to consumers due to sustainable energy and advanced
technology. To that end, ensuring a precise prediction of energy consumption at the buildings level is
vital and significant to manage the consumed energy efficiently using a robust predictive model. Growing
concern about reducing the energy consumption of buildings makes it necessary to predict future energy
consumption precisely using an optimizable predictive model. Most of the previously proposed methods
for energy consumption prediction are conventional prediction methods that are normally designed based
on the developer’s knowledge about the hyper-parameters. However, the time lag inputs and the network’s
hyper-parameters of learning methods need to be adjusted to have a more accurate prediction. This article
proposes a novel hybrid prediction approach based on evolutionary deep learning method that is combining
genetic algorithm with Long Short-Term Memory and optimizing its objective function with time window
lags and the network’s hidden neurons. The performance of the presented optimization predictive model
is investigated using public building datasets of residential and commercial buildings for very short-term
prediction and the results indicate that evolutionary deep learning models have better performance than
conventional and regular prediction models.
INDEX TERMS Energy consumption, evolutionary computation, genetic algorithms, machine learning,
predictive models, recurrent neural networks
I. INTRODUCTION
THE microgrid is a recent power scenario that proposes
closer power generation to consumers using renewable
resources e.g., rooftop PV panels at buildings and local
energy storages. By utilizing the renewable energy at the
consumer level such as buildings, the consumption will be
cheaper and cleaner; however, there will be some energy
consumed in buildings from the local grid which needs
to be adjusted and predicted efficiently to reduce the con-
sumption cost and environmental impacts. Nowadays, energy
consumption in buildings accounts for a large proportion
of the primary energy worldwide and plays a vital role in
carbon emission. Therefore, precision prediction of energy
consumption at building level has become a crucial topic and
it is necessary to develop a reliable optimization predictive
model, to reduce energy costs and improve environmental
buildings.
Generally, it is challenging to predict a building’s energy
consumption precisely due to the many influential factors
correlated with energy usages, such as weather conditions,
geographical location, building structure, occupancy, etc. The
energy consumption prediction problems have been inves-
tigated widely during the last two decades, where many
researchers have contributed to this topic in some way. There
are two major techniques of energy consumption prediction
that have been applied on buildings, such as physical methods
in [1] [2] [3] and statistical methods e.g., Auto-Regressive
Integrated Moving Average (ARIMA) in [4] [5] [6]. The arti-
ficial intelligence and machine learning (ML) have been con-
ducted to solve the problem of energy prediction in buildings
such as Artificial Neural Network (ANN) in [1] [7] [8] [9]
[10] [11], Support Vector Machine (SVM) in [12] [13] [14]
[15], Decision Tree in [16] [17] [18] and k-nearest neighbor
in (kNN) [19] [20]. The ANN and its developments were the
most applied method for energy consumption prediction in
buildings with different techniques, such as input variable
selection, network hyper-parameters tuning and training al-
gorithm improvement. The ANN approach based on input
VOLUME 4, 2016 1
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2887023, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
variable selection, as in [21] and [22], utilized to analyze and
select all potential relevant input variables.
Recently, deep learning (DL) approaches, which are ad-
vanced ML method by adding multi-hidden layers to the
standard ML neural network, have received a wide attention
across a range of disciplines, e.g., image recognition [23],
natural language processing [24], and time series prediction
[25]. The DL methods enhanced the prediction and the
classification accuracies in various problems such as stock
market forecasting [26] [27], solar irradiance forecasting
[28] [29] wind speed prediction [30] [31]. Moreover, the
DL approaches have been utilized for energy consump-
tion prediction using Convolutional Neural Network (CNN),
Recurrent Neural Network (RNN) and Long Short-Term
Memory (LSTM). In [32], the CNN method is utilized for
hourly energy load prediction in the smart grid using bagging
forecasting models. Authors compared their proposed model
to several conventional methods. The results showed the
effectiveness of the CNN in comparison with conventional
prediction models. In addition, the CNN method is applied to
an individual residential building in [33]. The results showed
that the CNN outperformed the other compared methods in
the paper. Another method applied to energy consumption
prediction in a household is the RNN in [34]. Authors
proposed pooling-based deep RNN to batches a group of
load’s profiles into a pool of inputs. The results showed that
their proposed method outperformed the compared methods
including ARIMA and Support Vector Regression. In [35],
an overview study for different types of the RNN including
LSTM applied for time series prediction. Authors compared
the various architectures of the RNN and their performances
in short-term prediction. In [36], authors applied the LSTM
for short-term prediction in a residential load. The results
showed that the LSTM outperformed traditional methods.
Another technique for LSTM used for a residential load
prediction is the LSTM-based sequence to sequence in [37].
Authors claimed that the results of the LSTM and LSTM-
based sequence to sequence are comparable results with other
DL methods used for energy consumption prediction in the
literature. An extensive review of the DL methods applied to
solve energy prediction problem can be found in [38].
In the last decade, many intelligent evolutionary computa-
tions based on optimization methods have been applied to the
problem of energy consumption in buildings, e.g., Genetic
Algorithm (GA), Particle Swarm Optimization (PSO) and
Evolution Strategies (ES). These methods are types of meta-
heuristic optimization techniques that are nature inspired in
mathematical optimization processes. In terms of forecasting
chaotic time series, the PSO method improved the results of
the ANN predictive model in [39] [40] [41]. For the problem
of energy consumption prediction, PSO-ANN, and GA-ANN
hybrid prediction methods applied with principal component
analysis to select relevant input energy variables in [40]. The
hybrid approaches resulted in better performance than regular
ANN, where they had the same accuracy level. In addition,
the GA was employed to improve Adaptive Network-based
Fuzzy Inference Systems using two building datasets of Great
building Energy Predictor Shootout and a library building in
[42]. The optimization population-based research found the
better performance of hybrid predictive models than regular
ones. For the problem of time series, the ES was used to
improve the ANN training models and converges faster to
optimal solution [43].
Commonly, many hyper-parameters of the DL network,
such as the number of hidden layers, the number of hidden
neurons, activation function, etc., are influential factors in
the energy prediction model. If the selected hyper-parameters
of the predictive DL model are unsuccessful, the model
performs poorly and will lead to local optimum results. In
addition, the predictive window size or time lags of the input
variables play another big role in terms of finding optimum
prediction value. Selecting the right hyper-parameters and the
fine window size is an optimization process that improves
the accuracy of the prediction model. In [44], a literature
review shows that the evolutionary computation concepts are
used to improve ML algorithm prediction, such as ANN and
Fuzzy logic. Thus, there is a need to be employed to the DL
algorithms, such as for the LSTM since it has proven better
prediction performance in the literature.
The modeling technique presented in this paper is based on
evolutionary DL method which utilizes the GA optimization
method to improve the accuracy prediction levels of the
LSTM method for the energy consumption in buildings.
The proposed approach is compared with the results of
conventional predictive models in the literature, e.g, ARIMA,
Decision Tree, kNN, multilayer perceptron (MLP), which is a
type of ANN with a potential of the deep neural network, and
LSTM with different deep architectures. The optimization
investigation is modeled by searching for the fine window
size and the right number of hidden neurons. The GA-
LSTM model is trained and tested with two different building
datasets for residential and commercial buildings for very
short-term prediction.
The motivation of this work is to develop an optimization
predictive DL model using GA, and the research objective
is to find a global or near-global optimum prediction error
in the problem of building’s energy consumption prediction
by searching in a population base of the LSTM hyper-
parameters and window size. This work contributed to the
solution of precision energy prediction at the building level
by using the GA-LSTM model to optimize the objective
function.
The contents of the paper are organized as follows: prob-
lem formulation is firstly presented in Section II. In Section
III, we elaborate the method of LSTM network and GA
optimization method. Then, we reformulate the optimization
problem in the case study to find the optimal predictive GA-
LSTM model in Section IV. The prediction results are eval-
uated and compared with regular DL predictive models and
conventional models in Section V. Finally, some conclusions
and future work are presented in Section VI.
2VOLUME 4, 2016
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2887023, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
II. PROBLEM FORMULATION
The energy consumption in a building is a time series prob-
lem that has a sequence of observations at time-space as xi=
{x1, x2, ...}where each observation in xiRcorresponding
to a particular time step i. The predicted time series is defined
as yiR, which is the energy consumption prediction.
The DL model is trained and tested as a supervised learning
problem for future time step predictions, where a predictor
function hpredicts a next step energy consumption value
yield as yi+1. In general, the utilized sliding window method
for multiple steps prediction (τ) is defined as:
yi+τ=h(xi+τ, xi1+τ, ...xiw+τ)(1)
where wis the window size. If the window size w= 1, the
prediction function will be yi+1 =h(xi).
The optimization technique used with objective function
or the loss function is expressed as:
arg minv
u
u
t
1
m
m
X
i=1
(xi+τyi+τ)2yyi(2)
subject to. xiw+τxiw+τxiw+τ,(3)
where mrepresents the total number of data points in the time
series, xi+τand yi+τare the real and the predicted energy
consumption of future steps, respectively, and xiw+τand
xiw+τare constraints of window size. The objective of the
optimizer is to minimize the energy consumption prediction
error with a sliding window and a number of hidden neurons
in the DL network architecture. The solutions space is de-
fined as Rfor the minimization fitness function. The task of
the optimization problem is to find a solution xRsuch
that:
h=h(x)h(x)xxi(4)
where his a global optimum fitness and xis the minimum
location in the solutions space.
III. METHODS
A. LONG SHORT-TERM MEMORY
An extension of MLP with feedback connections is defined
as a recurrent neural network (RNN) [45]. The RNN network
is a sequential data neural network processor because it has
internal memory to update the state of each neuron in the
network with previous inputs as in Fig. 1. The RNN is usually
trained with the back-propagation algorithm, but it fails with
vanishing gradient descent for long-term of training. The
LSTM, which is one type of RNN, is designed to provide
a longer-term memory where internal self-loops are used for
storing information to overcome the vanishing of the gradient
descent in the RNN [45]. There are five crucial elements in
the computational graph of the LSTM: 1) input gate, 2) forget
gate, 3) output gate, 4) cell and 5) state output, as shown
in the Fig. 2. The gate operations, such as reading, writing,
and erasing, are performed to change cell memory states. The
Input Layer Hidden Layer Output Layer
FIGURE 1. An example of the RNN with one hidden layer. .
𝐶(𝑡−1)
(𝑡−1)
𝜎
𝑥𝑖
𝑓
𝑡
𝑡
𝜎
𝑖𝑡
𝐶𝑡
×
tanh
𝑈
+
×
𝜎
𝑜𝑡×
tanh
FIGURE 2. An illustration of the LSTM scheme showing the input gate, forget
gate and output gate. .
following equations show the mathematical representation of
the LSTM model:
it=σ(xiWi,n +h(t1)Wi,m +bi),(5)
ft=σ(xiWf,n +h(t1)Wf,m +bf),(6)
ot=σ(xiWo,n +h(t1)Wo,m +bo),(7)
U= tanh(xiWU,n +h(t1)WU,m +bU),(8)
Ct=ft×Ct1+it×U, (9)
ht=ot×tanh(U),(10)
where σdenotes the sigmoid activation function, xiis the
input vector, itis the input of the input gate where the
subscript means input, ftis the input of the forget gate where
the subscript means forget, otis the input of the output gate
where the subscript means output, Uis the update signal, Ct
is the state value at the time tof computation and htis the
output of the LSTM cell. W(.)and b(.)are the weight matrices
and bias vectors, respectively. The weights correspond to the
current state values of a particular variable are denoted as
W(.),n and previous state signal as W(.),m. The memory state
can be modified by the decision of the input gate using a
sigmoid function with an on/off state. If the value of the input
gate is minimal and close to zero, there will be no change in
the state cell memory Ct.
VOLUME 4, 2016 3
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
1 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0
1 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0
Parent 1 Parent 2
Offspring 1 Offspring 2
FIGURE 3. One point crossover operation.
B. GENETIC ALGORITHM
The GA is a common nonlinear optimization algorithm
which solves constrained and unconstrained optimization
problems and provides an optimal or near-optimal solution
through searching in a complex space. It is, found by Holland
in 1975, an adaptive global optimization search based on nat-
ural selection of Darwinian analogy and genetic biology [46]
and utilizes crossover and mutation probabilities to guide
the search of an optimum solution (individual) in the fitness
function. The GA is based on a population search where a set
of candidate solutions (individuals) of the fitness function are
obtained after a series of iterative computations. One of the
advantages of the GA is less sensitive to initialization due to
the nature of mutation and crossover probabilities, however,
it is not the best method for online implementation due to its
slow convergence in a complex space [46].
The individuals are composed of chromosomes, which
are candidate solutions, based on the Darwinian principle of
survival of the fitness value. The fitness function determines
the living ability and living quality of each individual as
depending on the evolutionary process of the GA.
There are three major operators of the evolutionary process
in the GA, which are the crossover operator, the mutation
operator, and the selection operator. These operators directly
affect the fitness value searching process, and find the most
optimum solution. Another strategy in the GA that pledges
the convergence of the fitness value to the optimum is
elitism selection which means copying the best individual
in the generation to next generation [46]. Nevertheless, the
chromosome length and crossover method, such as one-point
crossover, two-point crossover, etc., are important techniques
to find the optimum value in the efficient process.
The operation of crossover, which is the most important
operation in the GA algorithm, is a random exchange of
two chromosomes that are genotyped in a binary gene’s base
using one of the crossover methods as Fig. 3. The mutation
operation is the random alteration in one gene or more from
1 to 0 or vice versa. The selection operation is the process
of selecting the highest fitness value among the population’s
individuals by using a selection method, e.g., the roulette
wheel and tournament selection.
Moreover, The population size and number of generation
are important factors that influence computation complexity.
Select GA parameters LSTM predictive
model & fitness value
Create random
population
Mutation SelectionCrossover
Generate new
population Stop? Output results & best
child fitness value
No
Yes
FIGURE 4. The GA algorithm operation scheme.
If the population size, which implies the number of the
solution in each generation, is too large, the GA algorithm
will cost large computation quantity and the probability of
plunged local optimum is low. If the population size is small,
the algorithm complexity will be reduced and the likelihood
of falling in a local optimum is high.
The convergence of the evolutionary process in the GA
algorithm is found with iterative steps, where the termination
criterion is pre-defined with the maximum number of itera-
tion. Fig. 4 shows an illustration of the GA iteration process
and the basic process of the GA steps is as follows:
1) Generate initial population randomly.
2) Evaluate the fitness value of each individual in the
population.
3) Perform the crossover operation.
4) Perform the mutation operation.
5) Perform the selection method.
6) Stop the GA algorithm if the termination criterion is
satisfied, otherwise, return to number (2).
IV. DATASETS AND DESIGN MODELING
A. DATASETS
1) Residential Building
The public dataset of a single residential building is named
as individual household electric power consumption in [47].
The dataset consists of historical energy consumption in
kW from December 2006 to November 2010 with one-
minute resolution. The model in this paper used only the
active power consumption of the household from the dataset.
The total number of samples in the dataset is more than 2
million time-steps. Fig. 5 (a) shows the variation of power
consumption with different seasons and days and Fig. 5 (b)
shows a heat map illustration of the averaged daily power
consumption for one month. It is worth noting from the heat
map that the residential building has a large volatility of
consumption for each day during one month.
2) Commercial Building
The energy dataset of a single commercial building, which
is a primary or secondary school in Denver, Colorado, USA,
4VOLUME 4, 2016
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2887023, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
2007-06 2007-12 2008-06 2008-12 2009-06 2009-12 2010-06
Time (Days)
0.5
1.0
1.5
2.0
2.5
3.0
Global active power (kW)
Daily power active consumption in the residential building (kW)
(a) Line graph.
0 5 10 15 20 25 30
Time (Days)
0
5
10
15
20
Time (Hours)
Heat map of hourly averaged power consumption in the residential building for one month (January 2007)
1
2
3
4
5
6
(b) Heat map.
FIGURE 5. The daily average power consumption of residential building.
is randomly chosen from a list of publicly published com-
mercial buildings datasets in [48] with the name 213.csv. The
data contains energy consumption values in kW/h of one year
in 2012 with five minutes resolution where the data size is
105408 time-steps. Fig. 6 (a) shows the line graph of daily
averaged energy consumption and Fig. 6 (b) shows the heat
map of averaged daily energy consumption for one month.
From the heat map, the commercial building has a consistent
high consumption during the working hours. However, the
consumption is the lowest in the weekend days.
B. DESIGN MODELING
The proposed model in this research is utilized to optimize
the prediction error of the LSTM as in Fig. 7. The hybrid
model of the GA-LSTM is designed with a couple of hid-
den layers and an optimizable number of hidden neurons
besides an optimizable window size. The optimization model
schemes of GA-LSTM is shown in Fig. 8. The first step of
the model is preprocessing the input dataset through normal-
ization method as:
x0
i=ximin
max min (11)
where xiis the original value of the input dataset, x0
iis
the normalized value scaled to the range [0,1],max is the
maximum value of the features, and min is the minimum
value of the features. Normalizing the dataset features avoids
2012-02 2012-04 2012-06 2012-08 2012-10 2012-12
Time (Days)
2
4
6
8
10
Energy consumption (kW/h)
Daily energy consumption in the commercial building (kW/h)
(a) Line graph.
0 5 10 15 20 25 30
Time (Days)
0
5
10
15
20
Time (Hours)
Heat map of hourly averaged energy consumption in the commercial building for one month (January 2012)
2
4
6
8
10
12
(b) Heat map.
FIGURE 6. The daily average energy consumption of commercial building.
the problem of dominating the large number ranges and helps
the algorithm to perform accurately.
The second step is to select the appropriate time lags
or window size of the dataset observations and convert the
data to a supervised learning form. Then, splitting the data
into two main datasets of a training dataset and a testing
dataset with the first 70% of the dataset and the last 30%
of the dataset, respectively. To evaluate the performance
of our proposed model properly, the training data is only
utilized separately for the training process in the LSTM and
the testing data is used for evaluating the predictive model.
For instance, we utilized the first 33 months of residential
building data with the one-minute resolution for training the
proposed model and 14 months of data for the testing process.
Similarly, we used 73785 time-steps of commercial building
data for training and the rest is used for testing.
The fourth step is training the model with an initial window
size and a number of hidden neurons in the first hidden
layer. Then, testing the model by testing set with the selected
window size and the number of hidden neurons is performed
to calculate the prediction accuracy of the loss function using
mean squared error, and the optimizer is stochastic gradient
descent (SGD). The total number of epochs of all learning
models is 300 epochs when one epoch is a complete pass
through the training dataset. An illustration of the LSTM
hyper-parameters hybrid with GA are demonstrated in Table
1. The window size, and the number of hidden neurons are
VOLUME 4, 2016 5
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2887023, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
used to construct a fitness function as in equation (2). The
ending condition must be satisfied when the operation ends,
otherwise, it will proceed and find a better solution in the
next generation. When the condition is satisfied in the first
LSTM model with one hidden layer, the model may need
to be improved by adding a second hidden layer to the
next LSTM model. The best window size and the number
of hidden neurons in the first LSTM with one hidden layer
will be held and added to the second LSTM model with two
hidden layers. The GA process is done in the second LSTM
model by only optimizing the number of hidden neurons in
the second hidden layer at the second LSTM model.
The evolution base operation, e.g., GA as in Fig. 8, is a
system to search for better solutions by using evolutionary
concepts, including crossover, mutation and selection. Gener-
ating new chromosomes of window size and number hidden
layers by integrating new behavior of the model to strengthen
searching dynamics and improve the prediction accuracy.
One of the important features of chromosomes in the GA is
genotyping which is the binary coding of the features, and the
phenotype refers to decoding parameters to variable values
in order to be fed back to the model. The chosen parameters
in our experiment, e.g., crossover probability Pcx, mutation
probability PM, number of generations M, size of population
in each generation N, and the length of the chromosome lare
represented in Table 2.
C. MODELING TOOLS
The used platforms in our modeling are Intel Core i5 2.7 GHz
CPU and an external NVIDIA graphics driver with GTX1080
using mocOS High Sierra operating system. The develop-
ment environment of our system is Python 2.7 where the
DL models were implemented with the Keras deep learning
framework [49], the GA model was achieved with DEAP
framework [50], and ML models were performed with scikit-
learn framework [51].
Fitness calculation
Optimal Prediction
Optimization of
LSTM by GA
Energy consumption
estimation by LSTM
Stop?
FIGURE 7. The evolutionary DL algorithm scheme.
V. RESULTS AND DISCUSSIONS
Finding the optimal or near optimal number of time lags and
the number of hidden neurons in each layer in the LSTM
network is a non-deterministic polynomial (NP) problem
which is not easy to solve. The GA algorithm is a promising
metaheuristic method which tends to solve such NP problems
TABLE 1. The LSTM model hyper-parameters.
Hyper-parameter Selection
Number of hidden layers (Nl) 1-3
Number of hidden neurons in each layer (Nnp) Optimizable with GA
Window size (Nt) Optimizable with GA
Optimizer (opt) SGD
Loss function Mean squared error
Number of epochs (Nep) 300
TABLE 2. The GA model parameters.
Parameter Selection
Crossover probability (Pcx) 0.7
Mutation probability (PM) 0.015
Selection Tournament selection
Population Size (N) 20
Number of Generations (M) 20
Fitness Function Root mean square error
for good optimal solutions sometimes near to global opti-
mum as found in these studies for time series lags [52] and
[53]. Therefore, the number of time lags and the number of
neurons are a potent combination of dependencies that affect
the prediction process such as model overfitting problem and
computation complexity. The selected range of window size
or time lags in this experiment is (1-64) time lags and the
range of number of hidden neurons in each layer is (1-1024)
neurons. The results found in this section are solutions to the
NP problem in each LSTM model.
For the prediction models, several different evaluation
criteria are utilized to evaluate the prediction performance
results in the literature. The first criterion is directly using
the 30% testing dataset to examine the performance of the
prediction model. The second criterion of model performance
evaluation is the metrics calculation where the conventional
methods are the root-mean-squared error (RMSE), the per-
centage of coefficient of variance and the mean absolute error
(MAE) defined as follows:
RM SE =v
u
u
t
1
m
m
X
i=1
(xiyi)2(12)
CV =RMSE
¯y×100% (13)
MAE =1
m
m
X
i=1
|xiyi|(14)
where mrepresents the total number of data points in the
time series, xiis the real measured time series in the original
scale of the dataset, yiis the predicted output of the time
series, and ¯yis the average of the actual values of energy
consumption. The model is benchmarked with conventional
6VOLUME 4, 2016
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Input dataset
Data preprocessing
End?
Select window size
Train LSTMTes t
Fitness and accuracy
evaluation
Population
Convert genotype to
phenotype
Genetic operation
End?
Selected window size
Train LSTMTes t
Fitness and accuracy
evaluation
Population
Convert genotype to
phenotype
Genetic operation
End?
Selected window size
Train LSTMTes t
Fitness and accuracy
evaluation
Population
Convert genotype to
phenotype
Genetic operation
Optimized fitness function & optimal prediction
One hidden layer Two hidden layers Three hidden layers
No No No
Yes Ye s Ye s
Can the
model be
improved?
Keep the selected window size and number of
neurons for the current LSTM networks
Yes
No
FIGURE 8. The GA-LSTM optimization architecture with three hidden layers.
prediction methods such as ARIMA, Decision Tree regres-
sion, and kNN. In addition, the model is compared with a
hybrid prediction model, which is GA-ANN, used for tuning
the neural network parameters. To evaluate the proposed
approach with traditional DL models, the model is compared
with MLP and LSTM which were designed with 10 hidden
neurons in the first, 5 neurons in the second and two in the
third hidden layer.
The last criterion to examine the performance of the
proposed model is cross-validation which splits the dataset
sets into k-fold subsets to estimate the general performance
of the prediction model and gives an insight on how the
model generalizes the independent variables throughout the
datasets. The method repeats the process of splitting the
dataset into training and testing portions for k-times where
the size of the testing data remains fixed but moving through
the original dataset and the remainder used as training dataset
every fold as in Fig. 9.
Applying this method to the proposed model produces
a robust averaged estimation of the prediction when each
observation in the dataset is used for training and testing
at each fold. We utilized 10-fold cross-validation in our
experiment for the best parameters of the proposed model in
each case study of the residential and commercial buildings
using time series cross-validator [51].
1st
iteration
2nd
iteration
3rd
iteration
10th
iteration
Data
Train Te s t
FIGURE 9. Cross validation method with kth folds.
A. PREDICTING RESIDENTIAL BUILDING POWER
CONSUMPTION
Table 3 illustrates how the performance of the proposed GA-
LSTM model compares with those conventional prediction
models for the first case study in residential building power
consumption. In the table, there are different architectures of
regular DL models e.g., MLP-1 with one hidden layer and
MLP-2 with two hidden layers. The obtained results show
that the proposed model outperformed other models in met-
rics evaluations. From the table, we find that the two models
MLP and LSTM performed in a similar way to the opposite
of the proposed method, which overtook them significantly.
It is noted that the prediction accuracies get worse when the
VOLUME 4, 2016 7
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2887023, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
networks get deeper because of the dependencies of the of the
network hyper-parameters. In addition, the statistical model
ARIMA and the kNN produced the worst prediction errors in
comparison with other learning methods, however, the Deci-
sion Tree regression performed better than other conventional
models and obtained prediction error close to the DL models.
The conventional hybrid model GA-ANN performed better
than all conventional methods and traditional DL methods
for predicting residential energy consumption, however, the
proposed approach outperformed the conventional hybrid
model.
Table 4 shows the optimal parameters of GA-LSTM-1,
GA-LSTM-2 and GA-LSTM-3 and the percentage of reduc-
tion in comparison with the LSTM models. We can see the
window size is the same for all hidden layers because it is
used as an input for the next hidden layer. It is worth noticing
that the best percentage of reduction with the regular LSTM-
1 model is 17.319 % in terms of RMSE value. In addition, the
deeper networks performed good percentages of reduction in
terms RMSE values.
Table 5 shows the 10-k fold results of the proposed model
GA-LSTM-1 that achieved the best prediction from Table 3.
The prediction error results in each fold are different because
the training dataset (Dtr) size and testing dataset (Dts ) size
are shuffled during the process of cross-validation and the
final prediction error is averaged over the 10 folds. This
validation process of the model increases the confidence of
the prediction efficiency because the tested data is different
and unseen during the training operation.
TABLE 3. The comparison with conventional methods over one minute
resolution for the residential building.
Method RMSE (kW) CV (%) MAE (kW)
ARIMA 0.264 24.170 0.095
Decision Tree 0.233 21.321 0.085
kNN 0.258 23.672 0.111
GA-ANN 0.223 20.158 0.072
MLP-1 0.232 20.934 0.083
MLP-2 0.231 20.844 0.081
MLP-3 0.231 20.844 0.079
LSTM-1 0.235 21.205 0.084
LSTM-2 0.233 21.025 0.084
LSTM-3 0.238 21.476 0.086
GA-LSTM-1 0.1943 17.526 0.062
GA-LSTM-2 0.217 19.581 0.071
GA-LSTM-3 0.225 20.303 0.074
Fig. 10 shows a prediction comparison of the residential
active power consumption for very short term prediction. The
comparison is made for all prediction models given in Table
3. From the graph, we can note that the proposed model is
superior to the other two DL models benchmarked in this
study i.e., MLP and LSTM. The GA-LSTM-1 was the best
prediction line graph followed the original data line graph.
It is worth noting that the GA-ANN is a skillful model that
TABLE 4. The best parameters GA-LSTM models for the residential building
and the percentage of reduction with benchmark LSTM.
Proposed Method Benchmark RMSE % of reduction
NlNnp Nt- Percentage (%)
1 139 23 LSTM-1 17.319
2 139 & 43 23 LSTM-2 6.866
3 139 & 43 & 64 23 LSTM-3 5.462
TABLE 5. The 10-fold cross-validation results of GA-LSTM-1 for the first case
study.
Fold No. Dtr Dts RMSE CV (%) MAE
1 188668 188659 0.221 20.238 0.082
2 377327 188659 0.237 21.703 0.085
3 565986 188659 0.220 20.146 0.082
4 754645 188659 0.212 19.413 0.071
5 943304 188659 0.219 20.054 0.073
6 1131963 188659 0.213 19.505 0.071
7 1320622 188659 0.203 18.589 0.069
8 1509281 188659 0.212 19.413 0.071
9 1697940 188659 0.202 18.498 0.069
10 1886599 188659 0.197 18.936 0.066
Mean - - 0.213 19.560 0.074
SD - - 0.012 1.057 0.007
follows the proposed approach. We can see that the GA-
LSTM outperform the models used to predict consumed
energy.
0 5 10 15 20
Time (one minute)
1.15
1.20
1.25
1.30
1.35
1.40
Power consumption (kW)
Energy consumption prediction for residential building over one minute resolution
Original
ARIMA
Decision Tree
kNN
GA
-ANN
MLP-1
MLP-2
MLP-3
LSTM-1
LSTM-2
LSTM-3
GA_LSTM-1
GA_LSTM-2
GA_LSTM-3
FIGURE 10. Prediction comparison between the proposed model with
different conventional prediction models for very short term prediction.
B. PREDICTING COMMERCIAL BUILDING ENERGY
CONSUMPTION
The second case study is predicting commercial building
energy consumption as in Table 6 which shows how the ef-
fectiveness of the proposed GA-LSTM model in comparison
with those conventional prediction models. The results from
the table show that the proposed method outperformed other
methods in prediction accuracies, however, both MLP and
LSTM results are close to each other. It is noticeable that
the prediction accuracies failed with the deeper network in
8VOLUME 4, 2016
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2887023, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
the conventional methods due to dependencies of the network
hyper-parameters. As noted from the first case study and the
second case study, the statistical model ARIMA and the kNN
were the worst prediction errors in comparison with other
learning methods and the Decision Tree regression obtained
prediction error close to the DL models. Similarly, the con-
ventional hybrid model GA-ANN obtained better predictions
than conventional models and DL models for predicting
commercial energy consumption, however, the proposed ap-
proach is a superior model to all compared methods.
The optimal parameters of GA-LSTM are given in Table
7 where the window size is fixed for all hidden layers
because it is used as an input to the next hidden layer in
the proposed method. From the table, the percentage of
reduction comparison is illustrated and the best percentage is
10.669 % in comparison with LSTM-1. The other two deeper
networks performed close to each other in their percentages
of reduction.
The 10-k fold results of the best prediction GA-LSTM-1
from Table 6 are shown in Table 8. From the table, the shuffle
operation of the 10-fold cross-validation produced different
prediction errors due to the different size of training and
testing in each fold. when the tested data is different in each
fold and unseen during the training process, the validation
technique promotes the certainty of the prediction efficiency
of the proposed model.
TABLE 6. The comparison with conventional methods over five minutes
resolution for the commercial building.
Method RMSE (kW/h) CV (%) MAE (kW/h)
ARIMA 0.539 10.462 0.297
Decision Tree 0.482 9.353 0.273
kNN 0.544 10.561 0.326
GA-ANN 0.469 9.145 0.268
MLP-1 0.495 9.615 0.305
MLP-2 0.490 9.507 0.295
MLP-3 0.478 9.271 0.271
LSTM-1 0.478 9.283 0.276
LSTM-2 0.486 9.430 0.286
LSTM-3 0.480 9.312 0.276
GA-LSTM-1 0.427 8.303 0.238
GA-LSTM-2 0.451 8.755 0.256
GA-LSTM-3 0.449 8.716 0.263
TABLE 7. The best parameters of GA-LSTM models for the commercial
building and the percentage of reduction with benchmark LSTM.
Proposed Method Benchmark RMSE % of reduction
NlNnp Nt- Percentage (%)
1 459 42 LSTM-1 10.669
2 459 & 187 23 LSTM-2 7.201
3 459 & 187 & 82 23 LSTM-3 6.458
The prediction performance in Fig. 11 shows a comparison
between the proposed GA-LSTM and conventional methods
TABLE 8. The 10-fold cross-validation results of GA-LSTM-1 for the second
case study.
Fold No. Dtr Dts RMSE CV (%) MAE
1 9586 9582 0.199 3.859 0.147
2 19168 9582 0.194 3.765 0.111
3 28750 9582 0.384 7.456 0.217
4 38332 9582 0.460 8.940 0.251
5 47914 9582 0.401 7.785 0.251
6 57496 9582 0.647 12.570 0.373
7 67078 9582 0.763 14.806 0.431
8 76660 9582 0.617 11.985 0.354
9 86242 9582 0.357 6.935 0.221
10 95824 9582 0.291 5.653 0.198
Mean - - 0.43 8.38 0.26
SD - - 0.18 3.53 0.10
of the commercial building for each prediction model. It is
noticed from the graph that the proposed model performed
better than the other models in this study and followed the
original dataset for very short term prediction. The proposed
GA-LSTM proofed its strength over the other compared
methods.
FIGURE 11. Prediction comparison between the proposed model with
different conventional prediction models for very short term prediction.
C. OPTIMIZATION RESULTS DISCUSSIONS
Hybridizing LSTM with GA produced more accurate pre-
diction as seen from the tables and figures above. As the
NP problem, it was not easy to find the best window size
and number of hidden neurons in each layer because of the
suitable combination of these parameters in each layer is a
huge probabilistic task.
Fig. 12 (a) and (b) shows scatter plots of the best or survive
offsprings in each generation at GA optimization problem of
residential energy prediction, and comparisons between the
number of hidden neurons and window size versus the CV
score in percent. Fig. 12 (a) illustrates the performance of
the GA-LSTM model while searching the best individual of
hidden neurons which is 139 with 17.5% prediction accuracy.
It is noticeable from the figure that the model converged
with the number of neurons more than 100 and less than 150
VOLUME 4, 2016 9
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2887023, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
neurons, however, the larger number failed to produce precise
predictions. Similarly, Fig. 12 (b) presents the searching
process of the proposed model to find best window size which
is 23-time lags. From the figure, we can see that between
20 to 40 time lags the model performed the best results in
comparison with smaller and larger time lags. Therefore, the
GA-LSTM model converged to optimum results in the range
of (100-150) neurons and the window size in the range of
(20-40) time lags.
(a) Number of hidden neurons vs CV(%).
(b) Window size or time lags vs CV(%).
FIGURE 12. Scatter plots of window size and number of hidden neurons
individuals in the GA optimization process for the residential energy prediction
model.
The scatter plots of the second case study in the commer-
cial building are given in Fig. 13 (a) and (b). The scatter
plot of the number of neurons versus the CV in Fig 13
(a) has a wider distribution than previous scatter plot of
neurons in the residential building. There are a couple of local
optimum individuals in the figure where the best offspring
was 459 neurons with 8.3% prediction. Fig. 13 (b) shows
the convergence results between 40 and 50-time lags where
the smaller time lags are the worst prediction accuracy in the
experiment. The best individual is 42 with CV 8.3%. Thus,
the proposed model GA-LSTM led to optimum parameters
of the number of hidden neurons and the window size in the
commercial energy prediction.
(a) Number of hidden neurons vs CV(%).
(b) Window size or time lags vs CV(%).
FIGURE 13. Scatter plots of GA-LSTM optimization process for the
commercial energy prediction model.
VI. CONCLUSION
Recently, the energy prediction in buildings has been a vital
problem of energy conservation and cost-effectiveness due
to the increase of energy consumption globally. There were
many attempts to predict the energy consumption efficiently
using physical models and statistical models. One of those
attempts was the DL methods that obtained a promising
prediction result with deeper neural network architectures.
This paper proposed an evolutionary-based development to
the DL prediction models in order to improve prediction
accuracy and network architecture.
The proposed approach combines the GA with the LSTM
method by evolving the window size prediction and number
of hidden neurons and examining a couple of hidden layers.
The implementation of the prediction system was applied to
two public datasets of residential and commercial buildings.
The proposed model presented better performance than the
compared conventional prediction methods such as ARIMA,
10 VOLUME 4, 2016
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2887023, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Decision Tree, kNN, GA-ANN, MLP and LSTM. The best
percentage of reduction in comparison with the regular
LSTM for the residential building case study is 17.319 % and
for the commercial building case study is 10.669 %.
The reasoning behind the evolutionary learning concept
is that for DL algorithms, it is faster and efficient to find
the optimized window size and the optimized number of
hidden neurons than to find them based the developer’s
knowledge and experimental trials. Although the evolution-
ary DL concept is more demanding regarding computational
requirements, it notably outperformed the best conventional
prediction models.
Since the proposed approach is an optimization-based
technique for energy consumption prediction using the GA
and the LSTM, the computational complexity of this tech-
nique depends on several operators, that affect computing
time, including time input lags in the LSTM, number of hid-
den neurons and layers in the LSTM, number of generations
in the GA, population size in the GA, etc.. These factors can
create an NP computational time problem in the approach.
For instance, if the first individual in the first generation has
14 input lags and 200 number of hidden neurons and the
second individual has 14 input lags and 250 number of hidden
neurons, the computation time of the second individual is
higher than the first. To that end, considering a parallel
computational technique, such as MapReduce, can reduce
the time consumption of the proposed model using one of
the computational frameworks, e.g., Apache Hadoop and
Apache Spark. In addition, applying the parallel computation
technique to the proposed model can provide a real-time
prediction paradigm, e.g., real-time power forecasting, that
can train the historical inputs variables offline and update and
test the recent input variables online.
In real-world applications, the energy consumption load
in buildings has a relationship with several underlying fac-
tors, such as temperature, humidity, work time, holidays,
occupants, etc.. These factors can provide more information
about the energy consumption variability and uncertainty.
Thus, the proposed approach is modeled to handle multiple
input parameters and big data non-linear prediction. If these
factors considered, the proposed model can result in better
prediction accuracies. In future work, there will be a study
of the effectiveness of using other DL methods such as GRU
and CNN which are not implemented in this study due to the
high computational complexity.
REFERENCES
[1] K. Amarasinghe, D. Wijayasekara, H. Carey, M. Manic, D. He, and W. P.
Chen, “Artificial neural networks based thermal energy storage control
for buildings,” in IECON 2015 - 41st Annual Conference of the IEEE
Industrial Electronics Society, Nov 2015, pp. 005421–005 426.
[2] A. I. Dounis, “Artificial intelligence for energy conservation in buildings,
Advances in Building Energy Research, vol. 4, no. 1, pp. 267–299, 2010.
[3] A. M. Khudhair and M. M. Farid, “A review on energy conservation
in building applications with thermal storage by latent heat using phase
change materials,” Energy conversion and management, vol. 45, no. 2, pp.
263–275, 2004.
[4] N. Amjady, “Short-term hourly load forecasting using time-series mod-
eling with peak load estimation capability, IEEE Transactions on Power
Systems, vol. 16, no. 3, pp. 498–505, Aug 2001.
[5] M. T. Hagan and S. M. Behr, “The time series approach to short term load
forecasting,” IEEE Transactions on Power Systems, vol. 2, no. 3, pp. 785–
791, Aug 1987.
[6] J. Contreras, R. Espinola, F. J. Nogales, and A. J. Conejo, “Arima models
to predict next-day electricity prices,” IEEE Transactions on Power Sys-
tems, vol. 18, no. 3, pp. 1014–1020, Aug 2003.
[7] S. L. Wong, K. K. Wan, and T. N. Lam, “Artificial neural networks for
energy analysis of office buildings with daylighting, Applied Energy,
vol. 87, no. 2, pp. 551–557, 2010.
[8] S. A. Kalogirou, “Artificial neural networks in energy applications in
buildings,” International Journal of Low-Carbon Technologies, vol. 1,
no. 3, pp. 201–216, 2006.
[9] C. Roldán-Blay, G. Escrivá-Escrivá, C. Álvarez-Bel, C. Roldán-Porta, and
J. Rodríguez-García, “Upgrade of an artificial neural network prediction
method for electrical consumption forecasting using an hourly temperature
curve model,” Energy and Buildings, vol. 60, pp. 38–46, 2013.
[10] J. G. Jetcheva, M. Majidpour, and W.-P. Chen, “Neural network model
ensembles for building-level electricity load forecasts, Energy and Build-
ings, vol. 84, pp. 214–223, 2014.
[11] M. De Felice and X. Yao, “Short-term load forecasting with neural network
ensembles: A comparative study [application notes], IEEE Computational
Intelligence Magazine, vol. 6, no. 3, pp. 47–56, 2011.
[12] B. Dong, C. Cao, and S. E. Lee, “Applying support vector machines
to predict building energy consumption in tropical region, Energy and
Buildings, vol. 37, no. 5, pp. 545–553, 2005.
[13] Q. Li, Q. Meng, J. Cai, H. Yoshino, and A. Mochida, Applying support
vector machine to predict hourly cooling load in the building, Applied
Energy, vol. 86, no. 10, pp. 2249–2256, 2009.
[14] L. Ghelardoni, A. Ghio, and D. Anguita, “Energy load forecasting us-
ing empirical mode decomposition and support vector regression,” IEEE
Transactions on Smart Grid, vol. 4, no. 1, pp. 549–556, 2013.
[15] B.-J. Chen, M.-W. Chang et al., “Load forecasting using support vector
machines: A study on eunite competition 2001,” IEEE transactions on
power systems, vol. 19, no. 4, pp. 1821–1830, 2004.
[16] Q. Ding, “Long-term load forecast using decision tree method,” in 2006
IEEE PES Power Systems Conference and Exposition, Oct 2006, pp.
1541–1543.
[17] M. A. Al-Gunaid, M. V. Shcherbakov, D. A. Skorobogatchenko, A. G.
Kravets, and V. A. Kamaev, “Forecasting energy consumption with the
data reliability estimatimation in the management of hybrid energy system
using fuzzy decision trees,” in 2016 7th International Conference on
Information, Intelligence, Systems Applications (IISA), July 2016, pp. 1–
8.
[18] Y. yuan Chen, Y. Lv, Z. Li, and F. Wang, “Long short-term memory model
for traffic congestion prediction with online open data,” in 2016 IEEE 19th
International Conference on Intelligent Transportation Systems (ITSC),
Nov 2016, pp. 132–137.
[19] R. Zhang, Y. Xu, Z. Y. Dong, W. Kong, and K. P. Wong, A composite
k-nearest neighbor model for day-ahead load forecasting with limited
temperature forecasts,” in 2016 IEEE Power and Energy Society General
Meeting (PESGM), July 2016, pp. 1–5.
[20] W. Kong, Z. Y. Dong, D. J. Hill, F. Luo, and Y. Xu, “Short-term residential
load forecasting based on resident behaviour learning,” IEEE Transactions
on Power Systems, vol. 33, no. 1, pp. 1087–1088, Jan 2018.
[21] S. Ding, H. Li, C. Su, J. Yu, and F. Jin, “Evolutionary artificial neural
networks: a review,” Artificial Intelligence Review, vol. 39, no. 3, pp. 251–
260, 2013.
[22] S. Karatasou, M. Santamouris, and V. Geros, “Modeling
and predicting building’s energy use with artificial neu-
ral networks: Methods and results,” Energy and Buildings,
vol. 38, no. 8, pp. 949 958, 2006. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0378778805002161
[23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in 2016 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), June 2016, pp. 770–778.
[24] N. Majumder, S. Poria, A. Gelbukh, and E. Cambria, “Deep learning-based
document modeling for personality detection from text,” IEEE Intelligent
Systems, vol. 32, no. 2, pp. 74–79, Mar 2017.
[25] P. Jiang, C. Chen, and X. Liu, “Time series prediction for evolutions of
complex systems: A deep learning approach,” in 2016 IEEE International
VOLUME 4, 2016 11
2169-3536 (c) 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See
http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2018.2887023, IEEE Access
Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS
Conference on Control and Robotics Engineering (ICCRE), April 2016,
pp. 1–6.
[26] D. L. Minh, A. Sadeghi-Niaraki, H. D. Huy, K. Min, and H. Moon, “Deep
learning approach for short-term stock trends prediction based on two-
stream gated recurrent unit network,” IEEE Access, vol. 6, pp. 55 392–
55 404, 2018.
[27] N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj, and A. Iosifidis, “Tem-
poral bag-of-features learning for predicting mid price movements using
high frequency limit order book data,” IEEE Transactions on Emerging
Topics in Computational Intelligence, pp. 1–12, 2018.
[28] H. Lee and B. Lee, “Bayesian deep learning-based confidence-aware
solar irradiance forecasting system,” in 2018 International Conference on
Information and Communication Technology Convergence (ICTC), Oct
2018, pp. 1233–1238.
[29] A. Alzahrani, P. Shamsi, M. Ferdowsi, and C. Dagli, “Solar irradiance
forecasting using deep recurrent neural networks,” in 2017 IEEE 6th
International Conference on Renewable Energy Research and Applications
(ICRERA), Nov 2017, pp. 988–994.
[30] M. Khodayar, J. Wang, and M. Manthouri, “Interval deep generative neural
network for wind speed forecasting,” IEEE Transactions on Smart Grid,
pp. 1–1, 2018.
[31] M. Khodayar and J. Wang, “Spatio-temporal graph deep neural network
for short-term wind speed forecasting,” IEEE Transactions on Sustainable
Energy, pp. 1–1, 2018.
[32] X. Dong, L. Qian, and L. Huang, A cnn based bagging learn-
ing approach to short-term load forecasting in smart grid,” in 2017
IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted
Computed, Scalable Computing Communications, Cloud Big Data
Computing, Internet of People and Smart City Innovation (Smart-
World/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), Aug 2017, pp. 1–6.
[33] K. Amarasinghe, D. L. Marino, and M. Manic, “Deep neural networks for
energy load forecasting,” in 2017 IEEE 26th International Symposium on
Industrial Electronics (ISIE), June 2017, pp. 1483–1488.
[34] H. Shi, M. Xu, and R. Li, “Deep learning for household load forecasting -
a novel pooling deep rnn, IEEE Transactions on Smart Grid, vol. 9, no. 5,
pp. 5271–5280, Sept 2018.
[35] F. M. Bianchi, E. Maiorino, M. C. Kampffmeyer, A. Rizzi, and R. Jenssen,
“An overview and comparative analysis of recurrent neural networks for
short term load forecasting,” arXiv preprint arXiv:1705.04378, 2017.
[36] D. Gan, Y. Wang, N. Zhang, and W. Zhu, “Enhancing short-term proba-
bilistic residential load forecasting with quantile long-short-term memory,
The Journal of Engineering, vol. 2017, no. 14, pp. 2622–2627, 2017.
[37] D. L. Marino, K. Amarasinghe, and M. Manic, “Building energy load
forecasting using deep neural networks,” in Industrial Electronics Society,
IECON 2016-42nd Annual Conference of the IEEE. IEEE, 2016, pp.
7046–7051.
[38] A. Almalaq and G. Edwards, “A review of deep learning methods applied
on load forecasting,” in 2017 16th IEEE International Conference on
Machine Learning and Applications (ICMLA), Dec 2017, pp. 511–516.
[39] L. Song, H. Qing, Y. Ying-ying, and L. Hao-ning, “Prediction for chaotic
time series of optimized bp neural network based on modified pso,” in The
26th Chinese Control and Decision Conference (2014 CCDC), May 2014,
pp. 697–702.
[40] H. Chenglei, L. Kangji, L. Guohai, and P. Lei, “Forecasting building
energy consumption based on hybrid pso-ann prediction model,” in 2015
34th Chinese Control Conference (CCC), July 2015, pp. 8243–8247.
[41] A. Afram, F. Janabi-Sharifi, A. S. Fung, and K. Raahemifar,
“Artificial neural network (ann) based model predictive control
(mpc) and optimization of hvac systems: A state of the art
review and case study of a residential hvac system, Energy
and Buildings, vol. 141, pp. 96 113, 2017. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0378778816310799
[42] K. Li, H. Su, and J. Chu, “Forecasting building energy
consumption using neural networks and hybrid neuro-fuzzy
system: A comparative study,” Energy and Buildings, vol. 43,
no. 10, pp. 2893 2899, 2011. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0378778811003124
[43] M. D. Sulistiyo, R. N. Dayawati, and Nurlasmaya, “Evolution strate-
gies for weight optimization of artificial neural network in time series
prediction,” in 2013 International Conference on Robotics, Biomimetics,
Intelligent Computational Systems, Nov 2013, pp. 143–147.
[44] J. Zhang, Z. Zhan, Y. Lin, N. Chen, Y. Gong, J. Zhong, H. S. H. Chung,
Y. Li, and Y. Shi, “Evolutionary computation meets machine learning: A
survey, IEEE Computational Intelligence Magazine, vol. 6, no. 4, pp. 68–
75, Nov 2011.
[45] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press,
2016, http://www.deeplearningbook.org.
[46] O. Kramer, Machine learning for evolution strategies. Springer, 2016,
vol. 20.
[47] D. Dheeru and E. Karra Taniskidou, “UCI machine learning repository,
2017. [Online]. Available: http://archive.ics.uci.edu/ml
[48] “Buildings datasets,” 2012. [Online]. Available:
https://trynthink.github.io/buildingsdatasets/
[49] F. Chollet et al., “Keras, https://github.com/fchollet/keras, 2015.
[50] F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. Parizeau, and
C. Gagné, “DEAP: Evolutionary algorithms made easy, Journal of Ma-
chine Learning Research, vol. 13, pp. 2171–2175, jul 2012.
[51] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel,
V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, R. Layton, J. Van-
derPlas, A. Joly, B. Holt, and G. Varoquaux, “API design for machine
learning software: experiences from the scikit-learn project,” in ECML
PKDD Workshop: Languages for Data Mining and Machine Learning,
2013, pp. 108–122.
[52] Z.-L. Sun, D.-S. Huang, C.-H. Zheng, and L. Shang, “Optimal selection
of time lags for tdsep based on genetic algorithm,” Neurocomputing,
vol. 69, no. 7, pp. 884 887, 2006, new Issues in Neurocomputing: 13th
European Symposium on Artificial Neural Networks. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0925231205002146
[53] K. Lukoseviciute and M. Ragulskis, “Evolutionary algorithms for
the selection of time lags for time series forecasting by fuzzy
inference systems,” Neurocomputing, vol. 73, no. 10, pp. 2077
2088, 2010, subspace Learning / Selected papers from the
European Symposium on Time Series Prediction. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0925231210001554
ABDULAZIZ ALMALAQ received the B.S. de-
gree in Electrical Engineering from College of En-
gineering, University of Hail, Hail, Saudi Arabia,
in 2011, the M.S. degree in electrical engineering
from the Electrical and Computer Engineering De-
partment, University of Denver, Denver, Colorado,
USA, in 2015. He is currently a Ph.D. candidate
at Electrical and Computer Engineering Depart-
ment, University of Denver. His research interests
include signal processing, machine learning, deep
learning, and intelligent power system applications.
JUN JASON ZHANG received the B.E. and M.E.
degrees in electrical engineering from Huazhong
University of Science and Technology, Wuhan,
China, in 2003 and 2005, respectively, and the
Ph.D. degree in electrical engineering from Ari-
zona State University, USA, in 2008. He is cur-
rently an associate professor of electrical and com-
puter engineering at the University of Denver.
He authored/coauthored over 70 peer reviewed
publications and he is the Technical Co-Chair for
the 48th North American Power Symposium (NAPS2016). His research
interests include sensing theory, signal processing and implementation, time-
varying system modeling, and their applications in intelligent power and
energy systems.
12 VOLUME 4, 2016
... LSTM is a popular improved model of artificial recurrent neural networks (RNN) that can model sequential or temporal aspects of data. Unlike the original RNN model, LSTM can solve the problem of gradient disappearance or explosion that often occurs when training continuous data [7,2,12,15,4]. LSTM networks-based models are proposed to capture periodicity in energy load consumption prediction. ...
... On the other hand, the output gate can control the state of the memory cell on whether it can alter the state of the other memory cell. In addition, the forget gate can choose to remember or forget its previous state [7,2]. Figure 4 shows an LSTM unit, which describes how the value of each gate is updated. ...
Preprint
Full-text available
Hybrid energy systems, which integrate diverse energy sources including solar power plants, supercapacitors, UPS batteries, generators, hydrogen cells, and the grid, represent sophisticated yet highly promising approaches to enhancing energy efficiency, reducing operational costs, and supporting renewable and grid-independent initiatives. The inherent complexity of these systems necessitates the energy management strategy (EMS) capable of judiciously allocating resources in line with demand forecasts. A critical component of devising an effective task scheduling system within this framework is the ability to generate precise forecasts of energy production from renewable sources, solar power in this case. This paper showcases the deployment and comparative evaluation of two advanced deep learning models, Long Short-term Memory Recurrent Neural Networks (LSTMs) and Bidirectional Long Short-term Memory Networks (BiLSTMs), and our proposed Ensemble model, which averages the forecasts from LSTM and BiLSTM models, developed at our Laboratory for Energy Management ({LabE). Our primary goal is to predict solar power output for three days at 15-minute intervals. Incorporating thirteen weather features, our findings reveal that the proposed models perform well in predicting energy production data, with the Ensemble predictions showing the best performance for 15-minute interval forecasts spanning three days.
... ) employs reinforcement learning (RL) and LSTM models to forecast the building energy consumption. The modelling method offered in (Almalaq et al. 2019) is established upon deep learning and GA for enhancing the prediction accuracy of LSTM. (Vijayan et al. 2022) compares the results of various machine learning and deep learning models, then establishes the dependencies between energy consumption and parameters such as temperature and wind speed. ...
... So genetic algorithm (GA) is introduced to solve the aforementioned problem. GA is a randomized search method based on the principles of biological evolution (Almalaq et al. 2019), known for its simplicity and strong scalability. It involves the following five main steps to solve the optimization described by (1)-(3) through GA (Luo et al. 2020). ...
Article
Full-text available
The accurate prediction of building energy consumption is a crucial prerequisite for demand response (DR) and energy efficiency management of buildings. Nevertheless, the thermal inertia and probability distribution characteristics of energy consumption are frequently ignored by traditional prediction methods. This paper proposes a building energy consumption prediction method based on Bayesian regression and thermal inertia correction. The thermal inertia correction model is established by introducing an equivalent temperature variable to characterize the influence of thermal inertia on temperature. The equivalent temperature is described as a linear function of the actual temperature, and the key parameters of the function are optimized through genetic algorithm (GA). Using historical energy usage, temperature, and date type as inputs and future building energy comsuption as output, a Bayesian regression prediction model is established. Through Bayesian inference, combined with prior information on building energy usage data, the posterior probability distribution of building energy usage is inferred, thereby achieving accurate forecast of building energy consumption. The case study is conducted using energy consumption data from a commercial building in Nanjing. The results of the case study indicate that the proposed thermal inertia correction method is effective in narrowing the distribution of temperature data from a range of 24.5°C to 36.5°C to a more concentrated range of 26.5°C to 34°C, thereby facilitating a more focused and advantageous data distribution for predictions. Upon applying the thermal inertia correction method, the relative errors of the Radial Basis Function (RBF) and Deep Belief Network (DBN) decreases by 2.0% and 3.1% respectively, reaching 10.9% and 7.0% correspondingly. Moreover, with the utilization of Bayesian regression, the relative error further decreases to 4.4%. Notably, the Bayesian regression method not only achieves reduced errors but also provides probability distribution, demonstrating superiority over traditional methods.
... Based on a comprehensive review of the literature, we have identified these diverse ML models as effective tools for forecasting energy consumption. Nevertheless, neural network-based models were omitted due to their computational intensity, rendering them incompatible with Wrapper FS, particularly when dealing with ML models involving thousands of feature combinations [41]. ...
Article
Full-text available
The study presents a novel framework integrating feature selection (FS) and machine learning (ML) techniques to forecast inland national energy consumption (EC) in the United Kingdom across all energy sources. This innovative framework strategically combines three FS approaches with five interpretable ML models using Shapley Additive Explanations (SHAP), with the dual goal of enhancing accuracy and transparency in EC predictions. By meticulously selecting the most pertinent features from diverse features-including meteorological conditions, socioeconomic parameters, and historical consumption patterns of different primary fuels-the proposed framework enhances the robustness of the forecasting model. This is achieved through benchmarking three FS approaches: ensemble filter, wrapper, and a hybrid ensemble filter-wrapper. In addition, we introduce a novel ensemble filter FS, synthesizing outcomes from multiple base FS methods to make well-informed decisions about feature retention. Experimental results underscore the efficacy of integrating both wrapper and ensemble filter-wrapper FS approaches with interpretable ML models, ensuring the forecasting process remains comprehensible and interpretable while utilizing a manageable number of features (four to eight). In addition, experimental results indicate that different feature subsets are usually selected for each combined FS approach and ML model. This study not only demonstrates the framework's capability to provide accurate forecasts but also establishes it as a valuable tool for policymakers and energy analysts.
... Based on a comprehensive review of the literature, we have identified these diverse ML models as effective tools for forecasting energy consumption. Nevertheless, neural network-based models were omitted due to their computational intensity, rendering them incompatible with Wrapper FS, particularly when dealing with ML models involving thousands of feature combinations [41]. ...
Article
Purpose This proposal aims to forecast energy consumption in residential buildings based on the effect of opening and closing windows by the deep architecture approach. In this task, the developed model has three stages: (1) collection of data, (2) feature extraction and (3) prediction. Initially, the data for the closing and opening frequency of the window are taken from the manually collected datasets. After that, the weighted feature extraction is performed in the collected data. The attained weighted feature is fed to predict energy consumption. The prediction uses the efficient hybrid multi-scale convolution networks (EHMSCN), where two deep structured architectures like a deep temporal context network and one-dimensional deep convolutional neural network. Here, the parameter optimization takes place with the hybrid algorithm named jumping rate-based grasshopper lemur optimization (JR-GLO). The core aim of this energy consumption model is to predict the consumption of energy accurately based on the effect of opening and closing windows. Therefore, the offered energy consumption prediction approach is analyzed over various measures and attains an accurate performance rate than the conventional techniques. Design/methodology/approach An EHMSCN-aided energy consumption prediction model is developed to forecast the amount of energy usage during the opening and closing of windows accurately. The emission of CO 2 in indoor spaces is highly reduced. Findings The MASE measure of the proposed model was 52.55, 43.83, 42.01 and 36.81% higher than ANN, CNN, DTCN and 1DCNN. Originality/value The findings of the suggested model in residences were attained high-quality measures with high accuracy, precision and variance.
Preprint
Full-text available
With the increasing decentralization of energy supply, the need to generate and use electricity locally is growing. Energy management systems at building level can be used for this purpose. Thermal and electrical load forecasts are needed as a basis for this. The paper "Overview of the current state of research on load forecasts in the building sector" provides an introduction to the topic of load forecasts in the building sector. For this purpose, 80 scientific articles were quantitatively examined, and focal points were examined with regard to properties, data basis and methods. This current elaboration builds on the previous publication and provides chronological evaluations of the papers examined to show trends and developments for the period from 2014 onwards. This paper starts by briefly summarising the main findings of the previous paper. Subsequently, the respective focal points are examined and, insofar as temporal developments are recognizable, are presented. Thus, it becomes apparent that forecasts are increasingly being made for a specific form of energy and that research interest in forecasts for other forms of energy is declining. The investigation of the granularity of forecasts shows that the dominance of one-hour intervals is decreasing. At the same time, data sets used for research are becoming increasingly larger and are recorded over longer periods of time. This may be related to the growing research interest in methods from the field of machine learning. Especially in the area of artificial neural networks the research interest in recurrent neural networks and deep learning is increasing. Finally, the current state of research on load forecasting in the building sector is defined on the basis of identified focal points as well as trends and developments and emerging research questions are outlined.
Article
Full-text available
Financial news has been proven to be a crucial factor which causes fluctuations in stock prices. However, previous studies heavily relied on analyzing shallow features and ignored the structural relation among words in a sentence. Several sentiment analysis researches have tried to point out the relationship between investors’ reaction and news events. However, the sentiment dataset was usually constructed from the lingual dataset which is unrelated to the financial sector and led to poor-performance. This paper proposes a novel framework to predict the directions of stock prices by using both financial news and sentiment dictionary. The original contributions of this study include the proposal of a novel two-stream Gated Recurrent Unit Network and Stock2Vec - a sentiment word embedding trained on financial news dataset and Harvard IV-4. Two main experiments are conducted: the first experiment predicts S&P 500 index stock price directions using the historical S&P 500 prices and the articles crawled from Reuters and Bloomberg, the second experiment forecasts the price trends of VN-index using VietStock news and stock prices from cophieu68. Results show that (1) Two-stream GRU outperforms state-of-the-art models; (2) Stock2Vec is more efficient in dealing with financial datasets; (3) Applying the model, a simulation scenario proves that our model is effective for the stock sector.
Book
Full-text available
The key component in forecasting demand and consumption of resources in a supply network is an accurate prediction of real-valued time series. Indeed, both service interruptions and resource waste can be reduced with the implementation of an effective forecasting system. Significant research has thus been devoted to the design and development of methodologies for short term load forecasting over the past decades. A class of mathematical models, called Recurrent Neural Networks, are nowadays gaining renewed interest among researchers and they are replacing many practical implementations of the forecasting systems, previously based on static methods. Despite the undeniable expressive power of these architectures, their recurrent nature complicates their understanding and poses challenges in the training procedures. Recently, new important families of recurrent architectures have emerged and their applicability in the context of load forecasting has not been investigated completely yet. This work performs a comparative study on the problem of Short-Term Load Forecast, by using different classes of state-of-the-art Recurrent Neural Networks. The authors test the reviewed models first on controlled synthetic tasks and then on different real datasets, covering important practical cases of study. The text also provides a general overview of the most important architectures and defines guidelines for configuring the recurrent networks to predict real-valued time series.
Article
Full-text available
In the study of load forecasting, short term load forecasting in the horizon of individuals is prone to manifest non-stationary and stochastic features compared to predicting the aggregated loads. Hence better methodologies should be proposed to forecast short-term residual loads more accurately, and refined representation of forecasting results should be reconsidered to make the prediction more reliable. This paper offers a format of short-term probabilistic forecasting results in terms of quantiles, which can better describe the uncertainty of residual loads, and a deep-learning based method, quantile long-short-term-memory (Q-LSTM), to implement probabilistic residual load forecasting. Experiments are conducted on an open dataset. Results show that the proposed method overrides traditional methods significantly in terms of average quantile score (AQS).
Article
Time-series forecasting has various applications in a wide range of domains, e.g., forecasting stock markets using limit order book data. Limit order book data provide much richer information about the behavior of stocks than its price alone, but also bear several challenges, such as dealing with multiple price depths and processing very large amounts of data of high dimensionality, velocity, and variety. A well-known approach for efficiently handling large amounts of high-dimensional data is the bag-of-features (BoF) model. However, the BoF method was designed to handle multimedia data such as images. In this paper, a novel temporal-aware neural BoF model is proposed tailored to the needs of time-series forecasting using high frequency limit order book data. Two separate sets of radial basis function and accumulation layers are used in the temporal BoF to capture both the short-term behavior and the long-term dynamics of time series. This allows for modeling complex temporal phenomena that occur in time-series data and further increase the forecasting ability of the model. Any other neural layer, such as feature transformation layers, or classifiers, such as multilayer perceptrons, can be combined with the proposed deep learning approach, which can be trained end-to-end using the back-propagation algorithm. The effectiveness of the proposed method is validated using a large-scale limit order book dataset, containing over 4.5 million limit orders, and it is demonstrated that it greatly outperforms all the other evaluated methods.
Article
In recent years, wind speed forecasting is considered as a challenging task required for the prediction of wind energy resources. As a highly varying data, wind speed time series requires highly nonlinear temporal features for the prediction tasks. However, most forecasting approaches apply shallow supervised features extracted using architectures with few nonlinear hidden layers. Moreover, the exact features captured in such methodologies cannot decrease the wind data uncertainties. In this paper, an interval probability distribution learning (IPDL) model is proposed based on Restricted Boltzmann Machines and Rough Set Theory to capture unsupervised temporal features from the wind speed data. The proposed model contains a set of interval latent variables tuned to capture the probability distribution of wind speed time series data using contrastive divergence with Gibbs sampling. A real-valued interval deep belief network (IDBN) is further designed employing a stack of IPDLs with a fuzzy type II inference system (FT2IS) for the supervised regression of future wind speed values. In order to automatically learn meaningful unsupervised features from the underlying wind speed data, real-valued input units are designed inside IDBN to better approximate the wind speed probability distribution function compared to classic DBNs. The high generalization capability of our unsupervised feature learning model incorporated with the robustness of IPDLs and FT2IS leads to accurate predictions. Simulation results on the Western Wind Dataset reveal significant performance improvement in 1-hr up to 24-hr ahead predictions compared to single-model approaches including both shallow and deep architectures, as well as recently proposed hybrid methodologies. IEEE
Article
Wind speed forecasting is still a challenge due to the stochastic and highly varying characteristics of wind. In this paper, a graph deep learning model is proposed to learn the powerful spatiotemporal features from the wind speed and wind direction data in neighboring wind farms. The underlying wind farms are modeled by an undirected graph where each node corresponds to a wind site. For each node, temporal features are extracted using a Long Short-Term Memory (LSTM) Network. A scalable graph convolutional deep learning architecture (GCDLA) motivated by the localized first-order approximation of spectral graph convolutions, leverages the extracted temporal features to forecast the wind speed time series of the whole graph nodes. The proposed GCDLA captures spatial wind features as well as deep temporal features of the wind data at each wind site. To further improve the prediction accuracy and capture robust latent representations, Rough Set Theory is incorporated with the proposed graph deep network by introducing upper and lower bound parameter approximations in the model. Simulation results show the advantages of capturing deep spatial and temporal interval features in the proposed framework compared to the state-of-the-art deep learning models as well as shallow architectures in the recent literature.
Conference Paper
The utility industry has invested widely in smart grid (SG) over the past decade. They considered it the future electrical grid while the information and electricity are delivered in two-way flow. SG has many Artificial Intelligence (AI) applications such as Artificial Neural Network (ANN), Machine Learning (ML) and Deep Learning (DL). Recently, DL has been a hot topic for AI applications in many fields such as time series load forecasting. This paper introduces the common algorithms of DL in the literature applied to load forecasting problems in the SG and power systems. The intention of this survey is to explore the different applications of DL that are used in the power systems and smart grid load forecasting. In addition, it compares the accuracy results RMSE and MAE for the reviewed applications and shows the use of convolutional neural network CNN with k-means algorithm had a great percentage of reduction in terms of RMSE.