Content uploaded by Guoheng Huang
Author content
All content in this area was uploaded by Guoheng Huang on Jun 23, 2020
Content may be subject to copyright.
IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING
IEEJ Trans 2020
Published online in Wiley Online Library (wileyonlinelibrary.com). DOI:10.1002/tee.23088
Paper
Electricity Consumption Prediction Based on LSTM with
Attention Mechanism
Zhifeng Lin*,Non-menber
Lianglun Cheng**,Non-member
Guoheng Huang**a,Non-menber
Power data analysis in power system, such as electricity consumption prediction, has always been the basis for the power
department to adjust electricity price, substation regulation, total load prediction and peak avoidance management. In this paper,
a short-term time-phased electricity consumption prediction model based on Long Short-Term Memory (LSTM) with an attention
mechanism is proposed. First, the attention mechanism is used to assign weight coefficients to the input sequence data. Then,
the output value of every cell of LSTM is calculated according to the forward propagation method, and the error between the
real value and the predicted value is calculated using the back-propagation method. The gradient of each weight is calculated
according to the corresponding error term, and the weight of the model is updated by the gradient descent direction to make
the error smaller. Using modeling and predicting experiments on different types of electricity consumption, the results show that
the prediction accuracy of the model proposed increased by 6.5% compared to the state-of-the-art model. The model has a good
effect on electricity consumption prediction. Not only can it be close to actual results numerically, but it can also better predict
the development trend of data. ©2020 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
Keywords: electricity consumption prediction; attention mechanism; LSTM; error optimization
Received 12 April 2019; Revised 18 September 2019
1. Introduction
Electricity consumption prediction is one of the core technolo-
gies in the construction of smart grid, also known as grid 2.0. It
also plays important roles in electricity development planning and
business planning. Currently, the electricity industry is developing
rapidly based on the electricity transmission and public electric-
ity resources provided by the State Grid. The electricity generated
by a smart grid can be used not only for electricity supply in
the region but also for transmission to other regions through the
national grid line. In addition, some studies have shown that elec-
tricity prediction can also help improve the efficiency of electricity
distribution in smart grids, especially in electricity stations. As
the electricity transmission cost from the electricity station to the
substation or the user is very high in the electricity grid, unneces-
sary costs can be reduced through electricity consumption analysis
and prediction before the planned construction of the power trans-
mission network infrastructure [1]. Therefore, accurately analyzing
and predicting electricity consumption is not only the key to ensur-
ing the smooth operation of the national or regional social and
economic systems—the need to ensure the development of the
electricity industry is also required.
In the field of time series, there are still many shortcomings
in the current study of electricity consumption prediction, such
as the prediction of different types of electricity consumption.
Many uncertain impact factors make the prediction difficult and
complex, such as irregular data fluctuations, measurement error,
and so on [2,3]. Therefore, a new method should be proposed
aCorrespondence to: Guoheng Huang. E-mail: kevinwong@gdut.edu.cn
*Laboratory of Cyber-Physical System, Department of Computer Science
and Technology, School of Computes, Guangdong University of Technol-
ogy, Guangzhou, China
**School of Computes, Guangdong University of Technology, Guangzhou,
China
to solve such problems. Currently, there are some traditional
statistical-based models, such as Holt-Winters model [4,5] and
the Auto Regressive Integrated Moving Average (ARIMA) model
[6,7]. The Holt-Winters model is used to predict the electricity
consumption of the data centers, which can remarkably increase
the energy efficiency of data centers [8]. The ARIMA model is
used to predict the electricity consumption of medical institutions,
but smoothing of the data is needed at the beginning [9]. These
models are required to enhance the smoothness and quantity of
data. If these requirements are not met, the prediction effects of
these models are relatively poor. Moreover, for different types of
data, it is always necessary to manually adjust the parameters,
which makes it difficult to generalize the model. In order to
solve the problem of generalization, some machine learning-based
prediction methods have been proposed, such as Support Vector
Machine (SVM) [10,11], neural network [12,13] and so on. These
methods have different variants depending on the application
scenario. Among them, Long Short-Term Memory (LSTM) and
Gated Recurrent Unit (GRU) are two different variants of recurrent
neural networks (RNNs), which have better predictive effects on
time series prediction. The method is proposed to utilize the LSTM
network, which takes a sequence of past consumption profiles to
perform a month-ahead electricity consumption prediction as a
sequence [14]. The multilayer GRU is used to construct the model
to predict the electricity consumption [15]. However, these two
methods are not accurate enough to extract the features of the
training data to obtain the best prediction results.
Due to the particularity and variability of electricity consump-
tion data, if the model cannot purposefully learn from key data,
the prediction accuracy of the model will be relatively poor. Elec-
tricity consumption data can be divided into different categories
according to the type of electricity consumption, such as residen-
tial electricity, commercial electricity, large industrial electricity,
agricultural electricity, and so on. Electricity consumption of vari-
ous types has different trends and features. For example, regarding
©2020 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.
Z. LIN, L. CHENG, AND G. HUANG
Fig. 1. Smart grid framework
residential electricity, electricity consumption is relatively low and
is affected by the region, the season, and so on [16]. Regarding
business electricity consumption, it varies from region to region
as business practices in different regions have different character-
istics. It is also the same for industrial and agricultural electricity.
Thus, how to accurately extract the specific features of electricity
consumption sequence in different types is the key to improv-
ing the prediction effect. However, the methods above perform
indiscriminate learning on the time series, which results in poor
prediction.
In order to perform features extraction on electricity consump-
tion data more efficiently, we propose a module based on LSTM
network with an attention mechanism for electricity consumption
prediction. The contributions of our study are shown as follows:
1. Apply LSTM as a basic model to the field of electricity
consumption prediction and achieve better results.
2. The attention mechanism is used to assign weight coeffi-
cients to the input sequence data so that the specific features
can be accurately extract.
3. Effectively improve the accuracy of electricity consumption
prediction based on real-world datasets and have reference
values in the construction of smart grid. The location of our
proposed scheme in the smart grid construction framework
is shown in Fig. 1.
The content of this paper is mainly divided into four sections.
The second section introduces the overall structure and electric-
ity consumption of electricity consumption prediction, the third
section introduces the experimental process and experimental
results, and the last section is the conclusion and future work.
2. Electricity Consumption Prediction
with Attention-LSTM
The overall architecture of the model is shown in Fig. 2, includ-
ing input and output modules, sequence attention mechanism,
LSTM network and weight optimization module.
The input data are the electricity consumption sequence data for
a period of time, and the output data are the predicted electricity
consumption sequence data for a period of time after that. The
attention mechanism, which will be introduced in detail in Section
2.1, is used to weight the input training data to make it easier
to learn. Layers in the LSTM network include an input layer, an
output layer and a hidden layer. The values of the input sequence
form the input layer, which is the first layer, and the output layer is
the final layer that contains the predicted result. The hidden layer
exists between the input and output layers. To reduce learning
errors of the weights of the attention mechanism and LSTM, the
errors from the previous iteration are fed back into the network, and
the weights are optimized; more training details will be introduced
in Section 2.2.
2.1. Data weighted with attention mechanism The
attention mechanism stems from the study of human vision.
Fig. 2. LSTM with attention mechanism
Fig. 3. Graphical illustration of attention mechanism
Attention is used in the field of machine translation; when
seeking attention distribution probability distribution, it gives a
probability to any word in the input sentence [17]. Then, the soft
attention model and hard attention model are proposed. The soft
attention model is a fully differentiable deterministic mechanism
that spreads to other parts of the network while propagating
through the attention mechanism. The hard attention model is
a stochastic process in which the system randomly samples an
implicit state instead of using all implicit states for decoding [18].
As the gradient can be directly calculated rather than estimated
by a random process and can be effectively integrated with the
prediction algorithm, we choose soft attention as the attention
mechanism.
In order to make rational use of limited visual information
processing resources, humans need to select a specific part of
the visual area and then focus on it. Likewise, in order to allow
the model to focus on sequence segments that can represent key
features of the whole sequence, the soft attention mechanism
is used to improve the system performance of the electricity
consumption sequence learning task. The graphical illustration of
the attention proposed is shown in Fig. 3.
In the attention mechanism, the weights ak
tof input sequence
is computed according to the previous hidden state ht−1and the
previous cell state ct−1and then feed the computed
Xtinto the
LSTM unit.
In the process of electricity consumption prediction, data
need to be preprocessed to meet the input requirements of
the attention mechanism. Given an electricity consumption
sequence Seq ={s1,s2,s3,...,sN}, we divide it into training
sequence Seqtrain ={s1,s2,s3,...,sM}and testing sequence
Seqtest ={sM+1,sM+2,sM+3,...,sN},whereNis the length of
the sequence, and Mis the length of the training sequence. Then,
divide the training sequence into nsequence segments. The value
of ncan be calculated using the following formula:
n=M−T
k+1(1)
2IEEJ Trans (2020)
ELECTRICITY CONSUMPTION PREDICTION WITH ATTENTION-LSTM
Fig. 4. LSTM network
where Tis the length of each sequence segment and also represents
the number of LSTM cells, and kis the step size that the data need
to move backward each time the data is segmented.
The set of sequence segments is Xo=Seqtrain ={x1,x2,x3,
...,xn},wherex2represents the second sequence segment, x3
is the third sequence segment, and so on. x1
tas an element of
Xt={x1
t,x2
t,x3
t,... ,xn
t},t={1, 2, 3, ... ,T}represents the value
of the first sequence segment at time t. The attention mechanism
can be constructed via an input Xoby referring to the previous
hidden state ht−1and the cell state ct−1in the LSTM unit with:
ek
t=Vetanh(We[ht−1;ct−1]+Uexk
t+Be)(2)
and
ak
t=exp(ek
t)
n
i=1exp(ei
t)(3)
where Ve,We,Ueare parameters to learn; Beis the bias terms;
and ak
t,k={1, 2, 3, ... ,n}is the attention weight measuring the
importance of the input electricity consumption sequence at time
t. SoftMax function is applied to ek
t,k={1, 2, 3, ... ,n}to ensure
all the attention weights sum to 1. The training sequence segment
is weighted by (2)–(3), and the segment that has a greater influence
on the prediction effect will be given a greater weight. In electricity
consumption prediction, the model with an attention mechanism
will focus more on periods that include peak electricity and sudden
changes in electricity rather than treating all time periods equally.
The attention mechanism is a feedforward network that can be
jointly trained with other components of the LSTM. With these
attention weights, the sequence can be adaptively extracted with:
Xt=(a1
tx1
t,a2
tx2
t,a3
tx3
t,... ,an
txn
t)(4)
2.2. Prediction with LSTM LSTM is an improved
RNN network that is good at exploiting nonlinear relationships
between time series data. It replaces the hidden layer of RNN
cells with LSTM cells, which adds a state for long-term memory,
as shown in Fig. 4.
Compared with RNN, LSTM cell has one more state c,which
enables LSTM to have long-term memory. The LSTM network
consists of TLSTM cells in an orderly manner in the electricity
consumption prediction; each LSTM cell can be constructed via
the hidden state ht−1; the cell state ct−1from the upper layer of
cells; and input sequence
Xt, which is the output of the attention
mechanism.
LSTM enhances the control of data weights by introducing the
concept of a gate to control long-term state c. The forget gate
controls the hidden state of the upper layer, the input gate controls
the input data, and the output gate controls the output data of the
layer. Detailed architecture of LSTM cell is shown in Fig. 5.
Given an electricity consumption sequence weighted by
the attention mechanism
Xt=(a1
tx1
t,a2
tx2
t,a3
tx3
t,... ,an
txn
t),
Fig. 5. LSTM cell structure
maximum–minimum normalization is used to process data as
follows:
X
t=
Xt−min(
Xt)
max(
Xt)−min(
Xt)(5)
The output of the forget gate is calculated using the following
formula:
ft=σ(wxf ak
txk
t+whf ht−1+bf)(6)
where xk
tis an element of the
Xt,k={1, 2, 3, ...,n},wxf is the
weight coefficient matrix of the input xto the forget gate f,whf is
the weight coefficient matrix of the hidden state of the upper layer
ht−1to the forget gate f,andbfis the bias of the forget gate. σ
is the sigmoid activation function.
The cell state ctcan be updated with:
it=σ(wxi ak
txk
t+whi ht−1+bi)
ct=ftct−1+ittanh(wxc ak
txk
t+whc ht−1+bc)(7)
where itis the output of the input gate; wxi is the weight coefficient
matrix of the input xto the input gate i;whi is the weight
coefficient matrix of the hidden state of the upper layer ht−1to
the input gate i;wxi is the weight coefficient matrix of the input
xto the input gate i;wxc is the weight coefficient matrix of the
inputxto the candidate cell status c;whc is the weight coefficient
matrix of the hidden state of the upper layer ht−1to the candidate
cell status c;andand tanh are an elementwise multiplication
and the hyperbolic tangent activation function, respectively.
The hidden state htcan be updated with:
ot=σ(wxo ak
txk
t+who ht−1+bo)
ht=ottanh(ct)(8)
where otis the output of the output gate, wxo is the weight
coefficient matrix of the input xto the output gate o,andwho
is the weight coefficient matrix of the hidden state of the upper
layer ht−1to the output gate o.
The predicted value of the model can be calculated with:
pt=σ(Vht+ct)(9)
In the process of electricity prediction, only the value of the last
LSTM cell is output at a time.
The root mean squard error (RMSE) is used as the error
calculation formula for the predicted value and the true value.
The loss function is defined as:
loss =n
i=1(pi−yi)2
n(10)
3IEEJ Trans (2020)
Z. LIN, L. CHENG, AND G. HUANG
where piis the predicted data, and yiis the true data.
Given the network initialization random number seed, learning
rate ηand training iterations steps, the Back Propagation Through
Time (BPTT) method is used to minimize the error. The weights
are iteratively optimized in the meanwhile to obtain the attention
mechanism of the training and the LSTM network. BPTT is a time-
based back-propagation algorithm. First, the data are predicted
based on the forward calculation method, and then, the error
between the actual value and the predicted value are propagated
backward. Unlike the conventional back propagation, the error
term calculates the gradient of each weight according to the
corresponding error term and updates the weight of the model
by the direction of the gradient descent.
Algorithm 1
Input: The input sequence Seq
The number of training data M
The length of each sequence segment T
The LSTM parameters Sstate ,seed,steps
Output: Predicted sequence Po
Evaluation parameter R2
1: N=Len(Seq )
2: Get Seqtrain,Seq test from Seq by M
3: Generate Xofrom Seqtrain by T
4: For each t∈[1, T]
5: Get Xtfrom Xoat time t
6: Xt
=Max_Min(Xt)
7:
Xt=Attention_Mechanism(
Xt)
8: Append
Xwith
Xt
9: End
10: Create LSTMcell by Sstate(c,h)
11: Connect LSTMnet by LSTMcell
12: Initialize LSTMnet by seed
13: For each step ∈[1, steps]
14: P=LSTMnet(
X)
15: Weight updated by using BPTT with Loss and η
16: End
17: Get LSTM∗
net
18: Get Te 1from Seqtrain
19: For each i∈[1, (N−M)]
20: pi=LSTM∗
net(Te i)
21: Get Te i+1from Te iand pi
22: Append P0with pi
23: End
24: Output P
o=de_Max_Min(Po),R2(Po,Seq test )
In the testing session, the trained LSTM network with an atten-
tion mechanism is represented as LSTM∗
net. According to the
LSTM network, each iteration can predict the data at the next
point in time, that is, by giving the initial input sequence Te 1=
{sM−T+1,sM−T+2,sM−T+3,...,sM},wheresare the last T
elements of the training sequence, and the prediction result of the
next moment p1=LSTM∗
net(Te 1), and then, p2=LSTM∗
net(Te 2),
where Te 2is composed of {sM−T+2,sM−T+3,...,sM,p1}.
In this way, we can obtain a set of predicted sequences
Po={P1,P2,P3,...,PN−M}.
Then, the prediction sequence is denormalized by the following
formula:
P
o=[Po−min(Po)]×[max(
Xt)−min(
Xt)] (11)
Finally, given Seqtest ={sM+1,s
M+2,s
M+3,...,s
N}, the elec-
tricity consumption prediction model is evaluated by calculating
the coefficient of determination R2of the real sequence and the
predicted value of the testing sequence with:
SS tot =(pi−s)2
SS res =(si−pi)2
R2=1−SS res
SS tot
(12)
where SS tot is the error between real data and average, pi
represents predicted data, and sis the average of real data. SS res is
the sum of squares of the residuals of real data and the predicted
data, and siis the real data. If the predicted data are closer to the
true data, the value of R2will be closer to 1.
Algorithm 1 is listed in order to make the model training and
evaluation algorithm flow more concisely.
3. Experiment
This experiment combines the actual situation of China Southern
Power Grid and applies the method proposed to predict electricity
consumption.
The running environment of the experiment is Python3.6, Linux
14.04; the CPU configuration is Inter Core i5-7300HQ; and the
GPU configuration is Nvidia GeForce GTX 1050.
The experimental datasets consist of numbers and uses the data
of the first power supply station of China Southern Power Grid
from May 20 to June 20, 32 days in total. The types of electricity
collected belong to four categories, including ‘Residential’, ‘Large
Industrial electricity’, ‘Business’ and ‘Agricultural’. Each electric-
ity type contains 768 electricity consumption records. The range of
the training set for each electricity type is 1∼M,M=768*0.8,
and the range of the testing set is (M+1) ∼N,N=768. Adam
optimizer is used to train the model, the input layer Tis 8, the out-
put layer is 1, the hidden layer contains four neurons, the random
seed is 1, the number of iterations steps is 500, and the learning
rate η=0.01.
3.1. Experiments for four different types of electricity
In order to verify the generalization of the proposed method, we
performed four sets of experiments with different electricity types.
The conditions of each set of experiments are the same except
for the training data. For example, the normalization process and
evaluation model are set the same in each set.
The power consumption prediction results for each type of
electricity are shown in Fig. 6. The x-axis represents time in hours,
and the y-axis represents hourly electricity consumption prediction
in KW/h. There are two curves in each of the figure: one represents
real data, and the other represents forecast data. The experimental
results of residential, large industrial, business, and agricultural
electricity consumption show that the predicted curve using the
proposed method has a high degree of fit to the real curve. Not
only can it accurately predict the electricity consumption peaks,
but it also predict trends. The model is evaluated by (12), and the
evaluation results are shown in Table I.
The model performs better for large industrial electricity and
business electricity consumption prediction, which benefits from
the use of attention mechanisms to focus on the rules of learning
data. However, the accuracy of residential and agricultural electric-
ity consumption prediction is low. The reason for this phenomenon
might be the high mutation rate of these two types of electricity
consumption. Thus, the electricity consumption trend might not be
well learned from the training data for a whole month.
3.2. Effect of the attention mechanism To verify
the validity of the attention mechanism, we performed two
experiments; the LSTM model with attention mechanism is used to
4IEEJ Trans (2020)
ELECTRICITY CONSUMPTION PREDICTION WITH ATTENTION-LSTM
145
True Predict
(a)
140
135
130
125
LW/h
120
115
110
0 20406080
Hours
100 120 140
True Predict
(b)
40 000
30 000
20 000
LW/h
10 000
0
0 20406080
Hours
100 120 140
True Predict
(c)
350
300
250
200
LW/h
150
100
50
0
0 20406080
Hours
100 120 140
True Predict
(d)
1400
1600
1800
1200
1000
LW/h
800
600
02040
60 80
Hours
100 120 140
Fig. 6. The prediction results for different types of electricity: (a) residential, (b) large industrial electricity, (c) business and (d) agricultural
Tab l e I. R2score of electricity consumption for different types of
electricity consumption
Electricity type R2score
Residential 0.87
Large industrial electricity 0.99
Business 0.98
Agricultural 0.88
predict electricity consumption in the first experiment, and LSTM
method without attention mechanism [14] is used in the other
experiment. The result is shown in Fig. 7.
It can be clearly seen in Fig. 7 that, in the sudden change
of electricity consumption, the yellow curve (predict without
attention) cannot accurately predict the arrival of the sudden
change point, but the green curve (predict with attention) can
accurately predict this.
The introduction of the attention mechanism helps the model to
learn the salient features of the sequence by giving weight to the
sequence segments, reducing the interference factors and obtaining
better prediction results.
3.3. Comparative experiment First, traditional meth-
ods based on statistics are used to predict large industrial electricity
consumption data. The methods used are Holt–Winter [8] and
ARIMA [9]. The experimental result is shown in Fig. 8. Different
seasonal period (SP) values are set in the Holt–Winter method.
There is a big gap between the prediction results of Holt–Winter
with different SPs. When SP is set to 24, the result is perfect. But
when another value is set, the result is very bad. The generalization
of Holt–Winter is very poor, which takes a long time to adjust the
parameters for different data, and much prior knowledge is needed
in the parameter adjustment process. Time series data should be
True Predict without attention
Predict with attention
40 000
30 000
LW/h
20 000
10 000
0
0 20406080
Hours
100 120 140
Fig. 7. Predicted result of LSTM with attention and LSTM
Without attention
stationary or stable after differential processing when ARIMA is
used. If the requirement cannot be met, the prediction effect will
be poor.
Then, machine learning-based methods are used to predict the
large industrial electricity consumption data. The methods used
are SVM [11] and Neural Networks. The kernel function used by
the SVM method is Radial Basis Function; the configuration of
the neural network is 3, 4 and 5 layers of fully connected layers;
and Relu acts as an activation function. The experimental result is
showninFig.9.
The prediction effects of machine learning-based methods are
better than the prediction effects of traditional methods. Because
the correlation of data over time is not considered, the prediction
effect of neural networks with different numbers of hidden layers
is not very different.
The results of the experiments in Sections 3.2 and 3.3 are
summarized as Table II, which shows that the proposed LSTM
method with attention mechanism has the highest prediction
5IEEJ Trans (2020)
Z. LIN, L. CHENG, AND G. HUANG
Predict with ARIMA
Predict with Holt_Winter
Sp=24
Predict with Holt_Winter
Sp=23
True
Predict with Holt_Winter
Sp=22
40 000
50 000
60 000
70 000
30 000
LW/h
20 000
10 000
0
0 20406080
Hours
100 120 140
Fig. 8. Predicted result of traditional methods
Predict with SVM
Predict with NN_5Predict with NN_4
True
Predict with NN_3
25 000
30 000
35 000
40 000
15 000
20 000
LW/h
10 000
5000
0
0 20406080
Hours
100 120 140
Fig. 9. Predicted result of machine learning-based methods
Table II. R2score of electricity consumption for large industrial
electricity consumption between different methods
Method R2score
Holt–Winter [8] (SP =22) −5.6
Holt–Winter [8] (SP =23) −0.44
Holt–Winter [8] (SP =24) 0.98
ARIMA [9] 0.53
SVM [11] 0.69
Neural Network (3 hidden layers) 0.70
Neural Network (4 hidden layers) 0.72
Neural Network (5 hidden layers) 0.71
LSTM [14] (without attention) 0.91
LSTM (with attention, proposed) 0.99
accuracy. Prediction accuracy increased by 6.5% compared to
state-of-the-art model (LSTM without attention mechanism).
4. Conclusion
In this paper, we propose an LSTM network with an attention
mechanism to predict the electricity consumption data. First, the
attention mechanism is used to process the training data, so that
the LSTM training can focus on the correct sequence segment,
and then, the weight coefficient of the attention mechanism and
the LSTM are updated by back-propagation and gradient descent
to minimize the RMSE. Finally, we use four sets of data to
evaluate the predicted effect of the proposed method and compare
it with other methods. From comparison with some state-of-the-
art algorithms, the method performs on the best prediction effect.
It does not only learn the law of actual change of electricity
consumption more accurately but also improves the accuracy of
the prediction model.
In the future, we will focus on long-sequence predictions and
incorporate more power-influencing factors into the model.
Acknowledgments
This work was supported by National Natural Science Foundation of
China Youth Science Fund Project (Research on Service Composition
Optimization Model and Optimization Algorithm for Manufacturing IoT
Collaborative Perception, No. 61502110).
References
(1) Yeliz Y, ¨
Onen A, Muyeen SM, Vasilakos AV, irfan Alan. Enhancing
smart grid with microgrids: Challenges and opportunities. Renewable
and Sustainable Energy Reviews 2017; 72:205 – 214.
(2) Colak I, Sagiroglu S, Fulli G, Yesilbudak M, Covrig CF. A survey
on the critical issues in smart grid technologies. Renewable and
Sustainable Energy Reviews 2016; 54:396 – 405.
(3) Bouzid AM, Guerrero JM, Cheriti A, Bouhamida M, Sicard P,
Benghanem M. A survey on control of electric power distributed
generation systems for microgrid applications. Renewable and Sus-
tainable Energy Reviews 2015; 44:751 – 766.
(4) Yang Y-M, Yu H, Sun Z. Aircraft failure rate forecasting method
based on Holt-Winters seasonal model. 2017 IEEE 2nd International
Conference on Cloud Computing and Big Data Analysis (ICCCBDA),
2017; 520– 524.
(5) Zheng T, Zhang Y, Fan C. Research on hospital operation index
prediction method based on PSO-Holt-winters model. Proceedings
of the 2nd International Conference on Computer Science and
Application Engineering, 23, 2018.
(6) Kumar SV, Vanajakshi L. Short-term traffic flow prediction using
seasonal ARIMA model with limited input data. European Transport
Research Review 2015; 7(3):21.
(7) Guarnaccia C, Mastorakis NE, Quartieri J, Tepedino C, Kaminaris
SD. Development of seasonal ARIMA models for traffic noise
forecasting. MATEC Web of Conferences, 05013, 2017.
(8) Rossi M, Brunelli D. Forecasting data centers power consumption
with the Holt-Winters method. 2015 IEEE Workshop on Environ-
mental, Energy, and Structural Monitoring Systems (EESMS) Pro-
ceedings, 2015; 210 – 214.
(9) Kaur H, Ahuja S. Time series analysis and prediction of electricity
consumption of health care institution using ARIMA model. Pro-
ceedings of Sixth International Conference on Soft Computing for
Problem Solving, 347– 358(2017)
(10) Magoul`
es F, Piliougine M, Elizondo D. Support vector regression for
electricity consumption prediction in a building in japan. 2016 IEEE
Intl Conference on Computational Science and Engineering (CSE)
and IEEE Intl Conference on Embedded and Ubiquitous Computing
(EUC) and 15th Intl Symposium on Distributed Computing and
Applications for Business Engineering (DCABES), 2016; 189– 196.
(11) Fu Y, Li Z, Zhang H, Xu P. Using support vector machine to predict
next day electricity load of public buildings with sub-metering
devices. Procedia Engineering 2015; 121:1016 – 1022.
(12) Zhang Y, Guo L, Li Q, Li J. Electricity consumption forecasting
method based on MPSO-BP neural network model. arXiv preprint
arXiv:1810.08886, 2018.
(13) Hu W, Tao Z, Guo D, Pan Z. Natural gas prediction model based on
wavelet transform and BP neural network. 2018 33rd Youth Academic
Annual Conference of Chinese Association of Automation (YAC),
2018; 952– 955.
(14) Kim N, Kim M, Choi JK. LSTM based short-term electricity
consumption forecast with daily load profile sequences. 2018 IEEE
7th Global Conference on Consumer Electronics (GCCE), 2018;
136– 137.
(15) Ke K, Hongbin S, Chengkang Z, Brown C. Short-term electrical
load forecasting method based on stacked auto-encoding and GRU
neural network. Evolutionary Intelligence 2019;12, 385.
(16) Li P, Sun BY, Li ZM, Jiang JS. Investigation and analysis on
electrical load of residence. Building Electricity 2014; 33(7):13–18.
(17) Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly
learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
6IEEJ Trans (2020)
ELECTRICITY CONSUMPTION PREDICTION WITH ATTENTION-LSTM
(18) Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Bengio
Y. Show, attend and tell: Neural image caption generation with
visual attention. International Conference on Machine Learning,
2015; 2048– 2057.
Zhifeng Lin (Non-member) received a B.S. degree in automation
from Guangdong University of Technology
(GDUT), China, in 2017. Currently, he is
pursuing an M.S. degree in computer science
and technology at GDUT. His research inter-
ests include time series pattern mining, deep
neural networks, and computer networks.
Lianglun Cheng (Non-member) is currently a professor at Guang-
dong University of Technology, a computer
dean of Guangdong University of Technol-
ogy, a doctoral tutor, an excellent teacher
of Nanyue, and a cross-century talentsin
of Guangdong Province. Executive director
of the Robotics Professional Committee of
China Automation Association, member of
China Computer Federation, and vice chair-
man of Guangdong Automation Association. His main research
interests include Knowledge Graph, Knowledge Automation and
Information Physics Fusion Systems.
Guoheng Huang (Non-menber) is currently a talented person in
the ‘Hundred Talents Program’ of Guang-
dong University of Technology, an assistant
professor of computer science, and a mas-
ter’s tutor. He received his B.S. (Mathe-
matics and Applied Mathematics) and M.E.
(Computer Science) degrees from South
China Normal University in 2008 and 2012,
respectively, and his Ph.D. (Software Engi-
neering) from Macau University in 2017. His research interests
include computer vision, pattern recognition and artificial intelli-
gence. He has hosted and undertaken a number of national and
provincial-level scientific research projects, including the National
Natural Science Foundation and National Key Research and Devel-
opment Plan.
7IEEJ Trans (2020)
A preview of this full-text is provided by Wiley.
Content available from IEEJ Transactions on Electrical and Electronic Engineering
This content is subject to copyright. Terms and conditions apply.