ArticlePDF Available

Electricity consumption prediction based on LSTM with attention mechanism

IEEJ Transactions on Electrical and Electronic Engineering

January 2020
15(4)

DOI:10.1002/tee.23088

Authors:

Guoheng Huang

GuangDong University of Technology

Power data analysis in power system, such as electricity consumption prediction, has always been the basis for the power department to adjust electricity price, substation regulation, total load prediction and peak avoidance management. In this paper, a short‐term time‐phased electricity consumption prediction model based on Long Short‐Term Memory (LSTM) with an attention mechanism is proposed. First, the attention mechanism is used to assign weight coefficients to the input sequence data. Then, the output value of every cell of LSTM is calculated according to the forward propagation method, and the error between the real value and the predicted value is calculated using the back‐propagation method. The gradient of each weight is calculated according to the corresponding error term, and the weight of the model is updated by the gradient descent direction to make the error smaller. Using modeling and predicting experiments on different types of electricity consumption, the results show that the prediction accuracy of the model proposed increased by 6.5% compared to the state‐of‐the‐art model. The model has a good effect on electricity consumption prediction. Not only can it be close to actual results numerically, but it can also better predict the development trend of data. © 2020 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

Smart grid framework

…

LSTM with attention mechanism

…

Graphical illustration of attention mechanism

Figures - available from: IEEJ Transactions on Electrical and Electronic Engineering

This content is subject to copyright. Terms and conditions apply.

Content uploaded by Guoheng Huang

Content may be subject to copyright.

IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING

IEEJ Trans 2020

Published online in Wiley Online Library (wileyonlinelibrary.com). DOI:10.1002/tee.23088

Paper

Electricity Consumption Prediction Based on LSTM with

Attention Mechanism

Zhifeng Lin*,Non-menber

Lianglun Cheng**,Non-member

Guoheng Huang**a,Non-menber

Power data analysis in power system, such as electricity consumption prediction, has always been the basis for the power

department to adjust electricity price, substation regulation, total load prediction and peak avoidance management. In this paper,

a short-term time-phased electricity consumption prediction model based on Long Short-Term Memory (LSTM) with an attention

mechanism is proposed. First, the attention mechanism is used to assign weight coefﬁcients to the input sequence data. Then,

the output value of every cell of LSTM is calculated according to the forward propagation method, and the error between the

real value and the predicted value is calculated using the back-propagation method. The gradient of each weight is calculated

according to the corresponding error term, and the weight of the model is updated by the gradient descent direction to make

the error smaller. Using modeling and predicting experiments on different types of electricity consumption, the results show that

the prediction accuracy of the model proposed increased by 6.5% compared to the state-of-the-art model. The model has a good

effect on electricity consumption prediction. Not only can it be close to actual results numerically, but it can also better predict

Keywords: electricity consumption prediction; attention mechanism; LSTM; error optimization

Received 12 April 2019; Revised 18 September 2019

1. Introduction

Electricity consumption prediction is one of the core technolo-

gies in the construction of smart grid, also known as grid 2.0. It

also plays important roles in electricity development planning and

business planning. Currently, the electricity industry is developing

rapidly based on the electricity transmission and public electric-

ity resources provided by the State Grid. The electricity generated

by a smart grid can be used not only for electricity supply in

the region but also for transmission to other regions through the

national grid line. In addition, some studies have shown that elec-

tricity prediction can also help improve the efﬁciency of electricity

distribution in smart grids, especially in electricity stations. As

the electricity transmission cost from the electricity station to the

substation or the user is very high in the electricity grid, unneces-

sary costs can be reduced through electricity consumption analysis

and prediction before the planned construction of the power trans-

mission network infrastructure [1]. Therefore, accurately analyzing

and predicting electricity consumption is not only the key to ensur-

ing the smooth operation of the national or regional social and

economic systems—the need to ensure the development of the

electricity industry is also required.

In the ﬁeld of time series, there are still many shortcomings

in the current study of electricity consumption prediction, such

as the prediction of different types of electricity consumption.

Many uncertain impact factors make the prediction difﬁcult and

complex, such as irregular data ﬂuctuations, measurement error,

and so on [2,3]. Therefore, a new method should be proposed

aCorrespondence to: Guoheng Huang. E-mail: kevinwong@gdut.edu.cn

*Laboratory of Cyber-Physical System, Department of Computer Science

and Technology, School of Computes, Guangdong University of Technol-

ogy, Guangzhou, China

**School of Computes, Guangdong University of Technology, Guangzhou,

China

to solve such problems. Currently, there are some traditional

statistical-based models, such as Holt-Winters model [4,5] and

the Auto Regressive Integrated Moving Average (ARIMA) model

[6,7]. The Holt-Winters model is used to predict the electricity

consumption of the data centers, which can remarkably increase

the energy efﬁciency of data centers [8]. The ARIMA model is

used to predict the electricity consumption of medical institutions,

but smoothing of the data is needed at the beginning [9]. These

models are required to enhance the smoothness and quantity of

data. If these requirements are not met, the prediction effects of

these models are relatively poor. Moreover, for different types of

data, it is always necessary to manually adjust the parameters,

which makes it difﬁcult to generalize the model. In order to

solve the problem of generalization, some machine learning-based

prediction methods have been proposed, such as Support Vector

Machine (SVM) [10,11], neural network [12,13] and so on. These

methods have different variants depending on the application

scenario. Among them, Long Short-Term Memory (LSTM) and

Gated Recurrent Unit (GRU) are two different variants of recurrent

neural networks (RNNs), which have better predictive effects on

time series prediction. The method is proposed to utilize the LSTM

network, which takes a sequence of past consumption proﬁles to

perform a month-ahead electricity consumption prediction as a

sequence [14]. The multilayer GRU is used to construct the model

to predict the electricity consumption [15]. However, these two

methods are not accurate enough to extract the features of the

training data to obtain the best prediction results.

Due to the particularity and variability of electricity consump-

tion data, if the model cannot purposefully learn from key data,

the prediction accuracy of the model will be relatively poor. Elec-

tricity consumption data can be divided into different categories

according to the type of electricity consumption, such as residen-

tial electricity, commercial electricity, large industrial electricity,

agricultural electricity, and so on. Electricity consumption of vari-

ous types has different trends and features. For example, regarding

Z. LIN, L. CHENG, AND G. HUANG

Fig. 1. Smart grid framework

residential electricity, electricity consumption is relatively low and

is affected by the region, the season, and so on [16]. Regarding

business electricity consumption, it varies from region to region

as business practices in different regions have different character-

istics. It is also the same for industrial and agricultural electricity.

Thus, how to accurately extract the speciﬁc features of electricity

consumption sequence in different types is the key to improv-

ing the prediction effect. However, the methods above perform

indiscriminate learning on the time series, which results in poor

prediction.

In order to perform features extraction on electricity consump-

tion data more efﬁciently, we propose a module based on LSTM

network with an attention mechanism for electricity consumption

prediction. The contributions of our study are shown as follows:

1. Apply LSTM as a basic model to the ﬁeld of electricity

consumption prediction and achieve better results.

2. The attention mechanism is used to assign weight coefﬁ-

cients to the input sequence data so that the speciﬁc features

can be accurately extract.

3. Effectively improve the accuracy of electricity consumption

prediction based on real-world datasets and have reference

values in the construction of smart grid. The location of our

proposed scheme in the smart grid construction framework

is shown in Fig. 1.

The content of this paper is mainly divided into four sections.

The second section introduces the overall structure and electric-

ity consumption of electricity consumption prediction, the third

section introduces the experimental process and experimental

results, and the last section is the conclusion and future work.

2. Electricity Consumption Prediction

with Attention-LSTM

The overall architecture of the model is shown in Fig. 2, includ-

ing input and output modules, sequence attention mechanism,

LSTM network and weight optimization module.

The input data are the electricity consumption sequence data for

a period of time, and the output data are the predicted electricity

consumption sequence data for a period of time after that. The

attention mechanism, which will be introduced in detail in Section

2.1, is used to weight the input training data to make it easier

to learn. Layers in the LSTM network include an input layer, an

output layer and a hidden layer. The values of the input sequence

form the input layer, which is the ﬁrst layer, and the output layer is

the ﬁnal layer that contains the predicted result. The hidden layer

exists between the input and output layers. To reduce learning

errors of the weights of the attention mechanism and LSTM, the

errors from the previous iteration are fed back into the network, and

the weights are optimized; more training details will be introduced

in Section 2.2.

2.1. Data weighted with attention mechanism The

attention mechanism stems from the study of human vision.

Fig. 2. LSTM with attention mechanism

Fig. 3. Graphical illustration of attention mechanism

Attention is used in the ﬁeld of machine translation; when

seeking attention distribution probability distribution, it gives a

probability to any word in the input sentence [17]. Then, the soft

attention model and hard attention model are proposed. The soft

attention model is a fully differentiable deterministic mechanism

that spreads to other parts of the network while propagating

through the attention mechanism. The hard attention model is

a stochastic process in which the system randomly samples an

implicit state instead of using all implicit states for decoding [18].

As the gradient can be directly calculated rather than estimated

by a random process and can be effectively integrated with the

prediction algorithm, we choose soft attention as the attention

mechanism.

In order to make rational use of limited visual information

processing resources, humans need to select a speciﬁc part of

the visual area and then focus on it. Likewise, in order to allow

the model to focus on sequence segments that can represent key

features of the whole sequence, the soft attention mechanism

is used to improve the system performance of the electricity

consumption sequence learning task. The graphical illustration of

the attention proposed is shown in Fig. 3.

In the attention mechanism, the weights ak

tof input sequence

is computed according to the previous hidden state ht−1and the

previous cell state ct−1and then feed the computed 

Xtinto the

LSTM unit.

In the process of electricity consumption prediction, data

need to be preprocessed to meet the input requirements of

the attention mechanism. Given an electricity consumption

sequence Seq ={s1,s2,s3,...,sN}, we divide it into training

sequence Seqtrain ={s1,s2,s3,...,sM}and testing sequence

Seqtest ={sM+1,sM+2,sM+3,...,sN},whereNis the length of

the sequence, and Mis the length of the training sequence. Then,

divide the training sequence into nsequence segments. The value

of ncan be calculated using the following formula:

n=M−T

k+1(1)

2IEEJ Trans (2020)

ELECTRICITY CONSUMPTION PREDICTION WITH ATTENTION-LSTM

Fig. 4. LSTM network

where Tis the length of each sequence segment and also represents

the number of LSTM cells, and kis the step size that the data need

to move backward each time the data is segmented.

The set of sequence segments is Xo=Seqtrain ={x1,x2,x3,

...,xn},wherex2represents the second sequence segment, x3

is the third sequence segment, and so on. x1

tas an element of

Xt={x1

t,x2

t,x3

t,... ,xn

t},t={1, 2, 3, ... ,T}represents the value

of the ﬁrst sequence segment at time t. The attention mechanism

can be constructed via an input Xoby referring to the previous

hidden state ht−1and the cell state ct−1in the LSTM unit with:

t=Vetanh(We[ht−1;ct−1]+Uexk

t+Be)(2)

and

t=exp(ek

n

i=1exp(ei

t)(3)

where Ve,We,Ueare parameters to learn; Beis the bias terms;

and ak

t,k={1, 2, 3, ... ,n}is the attention weight measuring the

importance of the input electricity consumption sequence at time

t. SoftMax function is applied to ek

t,k={1, 2, 3, ... ,n}to ensure

all the attention weights sum to 1. The training sequence segment

is weighted by (2)–(3), and the segment that has a greater inﬂuence

on the prediction effect will be given a greater weight. In electricity

consumption prediction, the model with an attention mechanism

will focus more on periods that include peak electricity and sudden

changes in electricity rather than treating all time periods equally.

The attention mechanism is a feedforward network that can be

jointly trained with other components of the LSTM. With these

attention weights, the sequence can be adaptively extracted with:



Xt=(a1

tx1

t,a2

tx2

t,a3

tx3

t,... ,an

txn

t)(4)

2.2. Prediction with LSTM LSTM is an improved

RNN network that is good at exploiting nonlinear relationships

between time series data. It replaces the hidden layer of RNN

cells with LSTM cells, which adds a state for long-term memory,

as shown in Fig. 4.

Compared with RNN, LSTM cell has one more state c,which

enables LSTM to have long-term memory. The LSTM network

consists of TLSTM cells in an orderly manner in the electricity

consumption prediction; each LSTM cell can be constructed via

the hidden state ht−1; the cell state ct−1from the upper layer of

cells; and input sequence 

Xt, which is the output of the attention

mechanism.

LSTM enhances the control of data weights by introducing the

concept of a gate to control long-term state c. The forget gate

controls the hidden state of the upper layer, the input gate controls

the input data, and the output gate controls the output data of the

layer. Detailed architecture of LSTM cell is shown in Fig. 5.

Given an electricity consumption sequence weighted by

the attention mechanism 

Xt=(a1

tx1

t,a2

tx2

t,a3

tx3

t,... ,an

txn

t),

Fig. 5. LSTM cell structure

maximum–minimum normalization is used to process data as

follows:



X

t=

Xt−min(

Xt)

max(

Xt)−min(

Xt)(5)

The output of the forget gate is calculated using the following

formula:

ft=σ(wxf ak

txk

t+whf ht−1+bf)(6)

where xk

tis an element of the 

Xt,k={1, 2, 3, ...,n},wxf is the

weight coefﬁcient matrix of the input xto the forget gate f,whf is

the weight coefﬁcient matrix of the hidden state of the upper layer

ht−1to the forget gate f,andbfis the bias of the forget gate. σ

is the sigmoid activation function.

The cell state ctcan be updated with:

it=σ(wxi ak

txk

t+whi ht−1+bi)

ct=ftct−1+ittanh(wxc ak

txk

t+whc ht−1+bc)(7)

where itis the output of the input gate; wxi is the weight coefﬁcient

matrix of the input xto the input gate i;whi is the weight

coefﬁcient matrix of the hidden state of the upper layer ht−1to

the input gate i;wxi is the weight coefﬁcient matrix of the input

xto the input gate i;wxc is the weight coefﬁcient matrix of the

inputxto the candidate cell status c;whc is the weight coefﬁcient

matrix of the hidden state of the upper layer ht−1to the candidate

cell status c;andand tanh are an elementwise multiplication

and the hyperbolic tangent activation function, respectively.

The hidden state htcan be updated with:

ot=σ(wxo ak

txk

t+who ht−1+bo)

ht=ottanh(ct)(8)

where otis the output of the output gate, wxo is the weight

coefﬁcient matrix of the input xto the output gate o,andwho

is the weight coefﬁcient matrix of the hidden state of the upper

layer ht−1to the output gate o.

The predicted value of the model can be calculated with:

pt=σ(Vht+ct)(9)

In the process of electricity prediction, only the value of the last

LSTM cell is output at a time.

The root mean squard error (RMSE) is used as the error

calculation formula for the predicted value and the true value.

The loss function is deﬁned as:

loss =n

i=1(pi−yi)2

n(10)

3IEEJ Trans (2020)

Z. LIN, L. CHENG, AND G. HUANG

where piis the predicted data, and yiis the true data.

Given the network initialization random number seed, learning

rate ηand training iterations steps, the Back Propagation Through

Time (BPTT) method is used to minimize the error. The weights

are iteratively optimized in the meanwhile to obtain the attention

mechanism of the training and the LSTM network. BPTT is a time-

based back-propagation algorithm. First, the data are predicted

based on the forward calculation method, and then, the error

between the actual value and the predicted value are propagated

backward. Unlike the conventional back propagation, the error

term calculates the gradient of each weight according to the

corresponding error term and updates the weight of the model

by the direction of the gradient descent.

Algorithm 1

Input: The input sequence Seq

The number of training data M

The length of each sequence segment T

The LSTM parameters Sstate ,seed,steps

Output: Predicted sequence Po

Evaluation parameter R2

1: N=Len(Seq )

2: Get Seqtrain,Seq test from Seq by M

3: Generate Xofrom Seqtrain by T

4: For each t∈[1, T]

5: Get Xtfrom Xoat time t

6: Xt

=Max_Min(Xt)

7: 

Xt=Attention_Mechanism(

Xt)

8: Append 

Xwith 

9: End

10: Create LSTMcell by Sstate(c,h)

11: Connect LSTMnet by LSTMcell

12: Initialize LSTMnet by seed

13: For each step ∈[1, steps]

14: P=LSTMnet(

15: Weight updated by using BPTT with Loss and η

16: End

17: Get LSTM∗

net

18: Get Te 1from Seqtrain

19: For each i∈[1, (N−M)]

20: pi=LSTM∗

net(Te i)

21: Get Te i+1from Te iand pi

22: Append P0with pi

23: End

24: Output P

o=de_Max_Min(Po),R2(Po,Seq test )

In the testing session, the trained LSTM network with an atten-

tion mechanism is represented as LSTM∗

net. According to the

LSTM network, each iteration can predict the data at the next

point in time, that is, by giving the initial input sequence Te 1=

{sM−T+1,sM−T+2,sM−T+3,...,sM},wheresare the last T

elements of the training sequence, and the prediction result of the

next moment p1=LSTM∗

net(Te 1), and then, p2=LSTM∗

net(Te 2),

where Te 2is composed of {sM−T+2,sM−T+3,...,sM,p1}.

In this way, we can obtain a set of predicted sequences

Po={P1,P2,P3,...,PN−M}.

Then, the prediction sequence is denormalized by the following

formula:

P

o=[Po−min(Po)]×[max(

Xt)−min(

Xt)] (11)

Finally, given Seqtest ={sM+1,s

M+2,s

M+3,...,s

N}, the elec-

tricity consumption prediction model is evaluated by calculating

the coefﬁcient of determination R2of the real sequence and the

predicted value of the testing sequence with:

SS tot =(pi−s)2

SS res =(si−pi)2

R2=1−SS res

SS tot

(12)

where SS tot is the error between real data and average, pi

represents predicted data, and sis the average of real data. SS res is

the sum of squares of the residuals of real data and the predicted

data, and siis the real data. If the predicted data are closer to the

true data, the value of R2will be closer to 1.

Algorithm 1 is listed in order to make the model training and

evaluation algorithm ﬂow more concisely.

3. Experiment

This experiment combines the actual situation of China Southern

Power Grid and applies the method proposed to predict electricity

consumption.

The running environment of the experiment is Python3.6, Linux

14.04; the CPU conﬁguration is Inter Core i5-7300HQ; and the

GPU conﬁguration is Nvidia GeForce GTX 1050.

The experimental datasets consist of numbers and uses the data

of the ﬁrst power supply station of China Southern Power Grid

from May 20 to June 20, 32 days in total. The types of electricity

collected belong to four categories, including ‘Residential’, ‘Large

Industrial electricity’, ‘Business’ and ‘Agricultural’. Each electric-

ity type contains 768 electricity consumption records. The range of

the training set for each electricity type is 1∼M,M=768*0.8,

and the range of the testing set is (M+1) ∼N,N=768. Adam

optimizer is used to train the model, the input layer Tis 8, the out-

put layer is 1, the hidden layer contains four neurons, the random

seed is 1, the number of iterations steps is 500, and the learning

rate η=0.01.

3.1. Experiments for four different types of electricity

In order to verify the generalization of the proposed method, we

performed four sets of experiments with different electricity types.

The conditions of each set of experiments are the same except

for the training data. For example, the normalization process and

evaluation model are set the same in each set.

The power consumption prediction results for each type of

electricity are shown in Fig. 6. The x-axis represents time in hours,

and the y-axis represents hourly electricity consumption prediction

in KW/h. There are two curves in each of the ﬁgure: one represents

real data, and the other represents forecast data. The experimental

results of residential, large industrial, business, and agricultural

electricity consumption show that the predicted curve using the

proposed method has a high degree of ﬁt to the real curve. Not

only can it accurately predict the electricity consumption peaks,

but it also predict trends. The model is evaluated by (12), and the

evaluation results are shown in Table I.

The model performs better for large industrial electricity and

business electricity consumption prediction, which beneﬁts from

the use of attention mechanisms to focus on the rules of learning

data. However, the accuracy of residential and agricultural electric-

ity consumption prediction is low. The reason for this phenomenon

might be the high mutation rate of these two types of electricity

consumption. Thus, the electricity consumption trend might not be

well learned from the training data for a whole month.

3.2. Effect of the attention mechanism To verify

the validity of the attention mechanism, we performed two

experiments; the LSTM model with attention mechanism is used to

4IEEJ Trans (2020)

ELECTRICITY CONSUMPTION PREDICTION WITH ATTENTION-LSTM

145

True Predict

(a)

140

135

130

125

LW/h

120

115

110

0 20406080

Hours

100 120 140

True Predict

(b)

40 000

30 000

20 000

LW/h

10 000

0 20406080

Hours

100 120 140

True Predict

(c)

350

300

250

200

LW/h

150

100

0 20406080

Hours

100 120 140

True Predict

(d)

1400

1600

1800

1200

1000

LW/h

800

600

02040

60 80

Hours

100 120 140

Fig. 6. The prediction results for different types of electricity: (a) residential, (b) large industrial electricity, (c) business and (d) agricultural

Tab l e I. R2score of electricity consumption for different types of

electricity consumption

Electricity type R2score

Residential 0.87

Large industrial electricity 0.99

Business 0.98

Agricultural 0.88

predict electricity consumption in the ﬁrst experiment, and LSTM

method without attention mechanism [14] is used in the other

experiment. The result is shown in Fig. 7.

It can be clearly seen in Fig. 7 that, in the sudden change

of electricity consumption, the yellow curve (predict without

attention) cannot accurately predict the arrival of the sudden

change point, but the green curve (predict with attention) can

accurately predict this.

The introduction of the attention mechanism helps the model to

learn the salient features of the sequence by giving weight to the

sequence segments, reducing the interference factors and obtaining

better prediction results.

3.3. Comparative experiment First, traditional meth-

ods based on statistics are used to predict large industrial electricity

consumption data. The methods used are Holt–Winter [8] and

ARIMA [9]. The experimental result is shown in Fig. 8. Different

seasonal period (SP) values are set in the Holt–Winter method.

There is a big gap between the prediction results of Holt–Winter

with different SPs. When SP is set to 24, the result is perfect. But

when another value is set, the result is very bad. The generalization

of Holt–Winter is very poor, which takes a long time to adjust the

parameters for different data, and much prior knowledge is needed

in the parameter adjustment process. Time series data should be

True Predict without attention

Predict with attention

40 000

30 000

LW/h

20 000

10 000

0 20406080

Hours

100 120 140

Fig. 7. Predicted result of LSTM with attention and LSTM

Without attention

stationary or stable after differential processing when ARIMA is

used. If the requirement cannot be met, the prediction effect will

be poor.

Then, machine learning-based methods are used to predict the

large industrial electricity consumption data. The methods used

are SVM [11] and Neural Networks. The kernel function used by

the SVM method is Radial Basis Function; the conﬁguration of

the neural network is 3, 4 and 5 layers of fully connected layers;

and Relu acts as an activation function. The experimental result is

showninFig.9.

The prediction effects of machine learning-based methods are

better than the prediction effects of traditional methods. Because

the correlation of data over time is not considered, the prediction

effect of neural networks with different numbers of hidden layers

is not very different.

The results of the experiments in Sections 3.2 and 3.3 are

summarized as Table II, which shows that the proposed LSTM

method with attention mechanism has the highest prediction

5IEEJ Trans (2020)

Z. LIN, L. CHENG, AND G. HUANG

Predict with ARIMA

Predict with Holt_Winter

Sp=24

Predict with Holt_Winter

Sp=23

True

Predict with Holt_Winter

Sp=22

40 000

50 000

60 000

70 000

30 000

LW/h

20 000

10 000

0 20406080

Hours

100 120 140

Fig. 8. Predicted result of traditional methods

Predict with SVM

Predict with NN_5Predict with NN_4

True

Predict with NN_3

25 000

30 000

35 000

40 000

15 000

20 000

LW/h

10 000

5000

0 20406080

Hours

100 120 140

Fig. 9. Predicted result of machine learning-based methods

Table II. R2score of electricity consumption for large industrial

electricity consumption between different methods

Method R2score

Holt–Winter [8] (SP =22) −5.6

Holt–Winter [8] (SP =23) −0.44

Holt–Winter [8] (SP =24) 0.98

ARIMA [9] 0.53

SVM [11] 0.69

Neural Network (3 hidden layers) 0.70

Neural Network (4 hidden layers) 0.72

Neural Network (5 hidden layers) 0.71

LSTM [14] (without attention) 0.91

LSTM (with attention, proposed) 0.99

accuracy. Prediction accuracy increased by 6.5% compared to

state-of-the-art model (LSTM without attention mechanism).

4. Conclusion

In this paper, we propose an LSTM network with an attention

mechanism to predict the electricity consumption data. First, the

attention mechanism is used to process the training data, so that

the LSTM training can focus on the correct sequence segment,

and then, the weight coefﬁcient of the attention mechanism and

the LSTM are updated by back-propagation and gradient descent

to minimize the RMSE. Finally, we use four sets of data to

evaluate the predicted effect of the proposed method and compare

it with other methods. From comparison with some state-of-the-

art algorithms, the method performs on the best prediction effect.

It does not only learn the law of actual change of electricity

consumption more accurately but also improves the accuracy of

the prediction model.

In the future, we will focus on long-sequence predictions and

incorporate more power-inﬂuencing factors into the model.

Acknowledgments

This work was supported by National Natural Science Foundation of

China Youth Science Fund Project (Research on Service Composition

Optimization Model and Optimization Algorithm for Manufacturing IoT

Collaborative Perception, No. 61502110).

References

(1) Yeliz Y, ¨

Onen A, Muyeen SM, Vasilakos AV, irfan Alan. Enhancing

smart grid with microgrids: Challenges and opportunities. Renewable

and Sustainable Energy Reviews 2017; 72:205 – 214.

(2) Colak I, Sagiroglu S, Fulli G, Yesilbudak M, Covrig CF. A survey

on the critical issues in smart grid technologies. Renewable and

Sustainable Energy Reviews 2016; 54:396 – 405.

(3) Bouzid AM, Guerrero JM, Cheriti A, Bouhamida M, Sicard P,

Benghanem M. A survey on control of electric power distributed

generation systems for microgrid applications. Renewable and Sus-

tainable Energy Reviews 2015; 44:751 – 766.

(4) Yang Y-M, Yu H, Sun Z. Aircraft failure rate forecasting method

based on Holt-Winters seasonal model. 2017 IEEE 2nd International

Conference on Cloud Computing and Big Data Analysis (ICCCBDA),

2017; 520– 524.

(5) Zheng T, Zhang Y, Fan C. Research on hospital operation index

prediction method based on PSO-Holt-winters model. Proceedings

of the 2nd International Conference on Computer Science and

Application Engineering, 23, 2018.

(6) Kumar SV, Vanajakshi L. Short-term trafﬁc ﬂow prediction using

seasonal ARIMA model with limited input data. European Transport

Research Review 2015; 7(3):21.

(7) Guarnaccia C, Mastorakis NE, Quartieri J, Tepedino C, Kaminaris

SD. Development of seasonal ARIMA models for trafﬁc noise

forecasting. MATEC Web of Conferences, 05013, 2017.

(8) Rossi M, Brunelli D. Forecasting data centers power consumption

with the Holt-Winters method. 2015 IEEE Workshop on Environ-

mental, Energy, and Structural Monitoring Systems (EESMS) Pro-

ceedings, 2015; 210 – 214.

(9) Kaur H, Ahuja S. Time series analysis and prediction of electricity

consumption of health care institution using ARIMA model. Pro-

ceedings of Sixth International Conference on Soft Computing for

Problem Solving, 347– 358(2017)

(10) Magoul`

es F, Piliougine M, Elizondo D. Support vector regression for

electricity consumption prediction in a building in japan. 2016 IEEE

Intl Conference on Computational Science and Engineering (CSE)

and IEEE Intl Conference on Embedded and Ubiquitous Computing

(EUC) and 15th Intl Symposium on Distributed Computing and

Applications for Business Engineering (DCABES), 2016; 189– 196.

(11) Fu Y, Li Z, Zhang H, Xu P. Using support vector machine to predict

next day electricity load of public buildings with sub-metering

devices. Procedia Engineering 2015; 121:1016 – 1022.

(12) Zhang Y, Guo L, Li Q, Li J. Electricity consumption forecasting

method based on MPSO-BP neural network model. arXiv preprint

arXiv:1810.08886, 2018.

(13) Hu W, Tao Z, Guo D, Pan Z. Natural gas prediction model based on

wavelet transform and BP neural network. 2018 33rd Youth Academic

Annual Conference of Chinese Association of Automation (YAC),

2018; 952– 955.

(14) Kim N, Kim M, Choi JK. LSTM based short-term electricity

consumption forecast with daily load proﬁle sequences. 2018 IEEE

7th Global Conference on Consumer Electronics (GCCE), 2018;

136– 137.

(15) Ke K, Hongbin S, Chengkang Z, Brown C. Short-term electrical

load forecasting method based on stacked auto-encoding and GRU

neural network. Evolutionary Intelligence 2019;12, 385.

(16) Li P, Sun BY, Li ZM, Jiang JS. Investigation and analysis on

electrical load of residence. Building Electricity 2014; 33(7):13–18.

(17) Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly

learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.

6IEEJ Trans (2020)

ELECTRICITY CONSUMPTION PREDICTION WITH ATTENTION-LSTM

(18) Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Bengio

Y. Show, attend and tell: Neural image caption generation with

visual attention. International Conference on Machine Learning,

2015; 2048– 2057.

Zhifeng Lin (Non-member) received a B.S. degree in automation

from Guangdong University of Technology

(GDUT), China, in 2017. Currently, he is

pursuing an M.S. degree in computer science

and technology at GDUT. His research inter-

ests include time series pattern mining, deep

neural networks, and computer networks.

Lianglun Cheng (Non-member) is currently a professor at Guang-

dong University of Technology, a computer

dean of Guangdong University of Technol-

ogy, a doctoral tutor, an excellent teacher

of Nanyue, and a cross-century talentsin

of Guangdong Province. Executive director

of the Robotics Professional Committee of

China Automation Association, member of

China Computer Federation, and vice chair-

man of Guangdong Automation Association. His main research

interests include Knowledge Graph, Knowledge Automation and

Information Physics Fusion Systems.

Guoheng Huang (Non-menber) is currently a talented person in

the ‘Hundred Talents Program’ of Guang-

dong University of Technology, an assistant

professor of computer science, and a mas-

ter’s tutor. He received his B.S. (Mathe-

matics and Applied Mathematics) and M.E.

(Computer Science) degrees from South

China Normal University in 2008 and 2012,

respectively, and his Ph.D. (Software Engi-

neering) from Macau University in 2017. His research interests

include computer vision, pattern recognition and artiﬁcial intelli-

gence. He has hosted and undertaken a number of national and

provincial-level scientiﬁc research projects, including the National

Natural Science Foundation and National Key Research and Devel-

opment Plan.

7IEEJ Trans (2020)

A preview of this full-text is provided by Wiley.

Learn more

Content available from IEEJ Transactions on Electrical and Electronic Engineering

This content is subject to copyright. Terms and conditions apply.

IEEE JOURNAL OF OCEANIC ENGINEERING 1 Wave Height Forecasting Over Ocean of Things Based on Machine Learning Techniques: An Application for Ocean Renewable Energy Generation

Article

Dec 2023
IEEE J OCEANIC ENG

With the evolution and integration of information and communication technologies, the marine environment is being converted into a smart ocean of things. The only way to monitor the marine environment is to access marine information through satellites, radar, etc. Recently, many researchers have focused their interest on generating power from renewable energy. Among all the available renewable resources, ocean waves are attracting the interest of researchers for power generation. Therefore, this article focuses on designing a data-driven forecasting model for marine renewable energy generation applications. This article applies a novel Gini-impurity-index-based bidirectional long short-term memory model for selecting the best ocean/marine environmental factors to forecast wave height and ultimately predict power generation using the numerical model. This article presents short-and long-term forecasting results. In the experiment, four stations each are taken for both short-and long-term forecasting. The average root-mean-square error was approximately 0.17 for long-term forecasting and approximately 0.05 for short-term forecasting. Index Terms-Data driven, forecasting, ocean environment, renewable energy, wave height.

A DEEP LEARNING-BASED DEMAND FORECASTING SYSTEM FOR PLANNING ELECTRICITY GENERATION

Article

Full-text available

Jun 2024

In today's world, where economic and industrial development continues, the importance of electrical energy is constantly increasing. Energy demand should be forecast as precisely as possible to reduce lost energy costs in the system, to plan generation expenditures appropriately, to ensure that market players are not economically harmed, and to deliver quality and uninterrupted energy to system consumers. Balancing the electric energy supply and demand of the system is possible with a forecasting plan. Our research aims to generate hourly electricity consumption load forecasts for the period 2018-2021 using Turkish Electricity Consumption Data and meteorological data, with the addition of time and public holiday features. The forecasting performance of the models is evaluated by training multiple machine learning models and deep neural network-based time series models with the data. When the prediction results of our load demand forecasting problem were evaluated, it was seen that deep learning methods gave higher results in prediction success compared to machine learning models. It has been observed that the prediction success of the LSTM model, one of the deep learning methods we use, is higher than the RNN and GRU models. The analysis envisages the elimination of mismatches between energy supply and demand.

Empowering Sustainability: A Consumer-Centric Analysis Based on Advanced Electricity Consumption Predictions

Article

Full-text available

Apr 2024

This study addresses the critical challenge of accurately forecasting electricity consumption by utilizing Exponential Smoothing and Seasonal Autoregressive Integrated Moving Average (SARIMA) models. The research aims to enhance the precision of forecasting in the dynamic energy landscape and reveals promising outcomes by employing a robust methodology involving model application to a large amount of consumption data. Exponential Smoothing demonstrates accurate predictions, as evidenced by a low Sum of Squared Errors (SSE) of 0.469. SARIMA, with its seasonal ARIMA structure, outperforms Exponential Smoothing, achieving lower Mean Absolute Percentage Error (MAPE) values on both training (2.21%) and test (2.44%) datasets. This study recommends the adoption of SARIMA models, supported by lower MAPE values, to influence technology adoption and future-proof decision-making. This study highlights the societal implications of informed energy planning, including enhanced sustainability, cost savings, and improved resource allocation for communities and industries. The synthesis of model analysis, technological integration, and consumer-centric approaches marks a significant stride toward a resilient and efficient energy ecosystem. Decision-makers, stakeholders, and researchers may leverage findings for sustainable, adaptive, and consumer-centric energy planning, positioning the sector to address evolving challenges effectively and empowering consumers while maintaining energy efficiency.

A Novel Variant of LSTM Stock Prediction Method Incorporating Attention Mechanism

Article

Full-text available

Mar 2024

Long Short-Term Memory (LSTM) is an effective method for stock price prediction. However, due to the nonlinear and highly random nature of stock price fluctuations over time, LSTM exhibits poor stability and is prone to overfitting, resulting in low prediction accuracy. To address this issue, this paper proposes a novel variant of LSTM that couples the forget gate and input gate in the LSTM structure, and adds a “simple” forget gate to the long-term cell state. In order to enhance the generalization ability and robustness of the variant LSTM, the paper introduces an attention mechanism and combines it with the variant LSTM, presenting the Attention Mechanism Variant LSTM (AMV-LSTM) model along with the corresponding backpropagation algorithm. The parameters in AMV-LSTM are updated using the Adam gradient descent method. Experimental results demonstrate that the variant LSTM alleviates the instability and overfitting issues of LSTM, effectively improving prediction accuracy. AMV-LSTM further enhances accuracy compared to the variant LSTM, and compared to AM-LSTM, it exhibits superior generalization ability, accuracy, and convergence capability.

A Prediction Model of Power Consumption in Smart City Using Hybrid Deep Learning Algorithm

Article

Full-text available

Dec 2023

A smart city utilizes vast data collected through electronic methods, such as sensors and cameras, to improve daily life by managing resources and providing services. Moving towards a smart grid is a step in realizing this concept. The proliferation of smart grids and the concomitant progress made in the development of measuring infrastructure have garnered considerable interest in short-term power consumption forecasting. In reality, predicting future power demands has shown to be a crucial factor in preventing energy waste and developing successful power management techniques. In addition, historical time series data on energy consumption may be considered necessary to derive all relevant knowledge and estimate future use. This research paper aims to construct and compare with original deep learning algorithms for forecasting power consumption over time. The proposed model, LSTM-GRU-PPCM, combines the Long -Short-Term -Memory (LSTM) and Gated- Recurrent- Unit (GRU) Prediction Power Consumption Model. Power consumption data will be utilized as the time series dataset, and predictions will be generated using the developed model. This research avoids consumption peaks by using the proposed LSTM-GRU-PPCM neural network to forecast future load demand. In order to conduct a thorough assessment of the method, a series of experiments were carried out using actual power consumption data from various cities in India. The experiment results show that the LSTM-GRU-PPCM model improves the original LSTM forecasting algorithms evaluated by Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) for various time series. The proposed model achieved a minimum error prediction of MAE=0.004 and RMSE=0.032, which are excellent values compared to the original LSTM. Significant implications for power quality management and equipment maintenance may be expected from the LSTM-GRU-PPCM approach, as its forecasts will allow for proactive decision-making and lead to load shedding when power consumption exceeds the allowed level

Short-term power load forecasting of a city in Henan Province using Attention based LTSM

Conference Paper

Mar 2024

Electricity Consumption Classification using Various Machine Learning Models

Article

Full-text available

Jun 2024

INTRODUCTION: As population has increased over successive generations, human dependency on electricity has increased to the point where it has become a norm and indispensable, and the idea of living without it has become unthinkable.OBJECTIVES: Machine learning is emerging as a fundamental method for performing tasks autonomously without human intervention. Forecasting electricity consumption is challenging due to the many factors that influence it; embracing modern technology with its heavy focus on machine learning and artificial intelligence is a potential solution.METHODS: This study employs various machine learning algorithms to forecast power usage and determine which method performs best in predicting the dataset based on different variables.RESULTS: Eight models were tested, including Linear Regression, DT Classifier, RF Classifier, KNN, DT Regression, SVM, Logistic Regression, and GNB Classifier. The Decision Tree model had the greatest accuracy of 98.3%.CONCLUSION: The Decision Tree model’s accuracy can facilitate efficient use of electricity, leading to both conservation of electricity and cost savings, and be a guiding light in future planning.

Short-term power load forecasting based on AC-BiLSTM model

Article

Jun 2024

Short-term power load forecasting based on sparrow search algorithm-variational mode decomposition and attention-long short-term memory

Article

Apr 2024
Int J Low Carbon Tech

To improve the forecasting accuracy of power load, the forecasting model based on sparrow search algorithm (SSA), variational mode decomposition (VMD), attention mechanism and long short-term memory (LSTM) was proposed. Firstly, SSA is used to optimize the number of decomposition and penalty factor in VMD and realize the decomposition operation of the initial data. Then, LSTM is used to predict each component, and on this basis, feature and temporal attention mechanisms are introduced. Feature attention mechanism is introduced to calculate the contribution rate of relevant input features in real time, and the feature weights are modified to avoid the limitations of traditional methods relying on the threshold of expert experience association rules. Temporal attention mechanism is applied to extract the historical key moments and improve the stability of the time series prediction effect. Finally, the final result is obtained by superimposing the prediction results of each component to complete the power load prediction. Practical examples show that, compared with other methods, the proposed model achieves the highest prediction accuracy, with an RMSE of 1.23, MAE of 0.99 and MAPE of 11.62%.

Electricity consumption prediction based on Transformer-LSTM

Conference Paper

Dec 2023

Short-term electrical load forecasting method based on stacked auto-encoding and GRU neural network

Article

Full-text available

Sep 2019

With the rapid development of smart grid, to solve the power enterprises’ requirement in short-term load forecasting, this paper proposes a short-term electrical load forecasting method based on stacked auto-encoding and GRU (Gated recurrent unit) neural network. Firstly, the method input historical data which contains power load, weather information, and holiday information, and use auto-encoding to compress the historical data; and then, the multi-layer GRU is used to construct the model to predict the power load. The experiment results show, compared with traditional models, the proposed method can effectively predict the daily variation of power load and have lower prediction error and higher precision.

Development of Seasonal ARIMA Models for Traffic Noise Forecasting

Article

Full-text available

Jan 2017

In this paper, a time series analysis approach is adopted to monitor and predict a traffic noise levels dataset, measured in a site of Messina, Italy. In general, acoustical noise shows a high prediction complexity, since its slope is strongly related to the variability of the sources and to intrinsic randomness. In the analysed site the predominant source is road traffic, that has a periodic and non-stationary behaviour. The study of the time evolution of this hazardous agent is very useful to assess the impact to human health and activities. The time series models adopted in this paper are of the stochastic seasonal ARIMA class; these types of model are based on the strong periodicity registered in the acoustical equivalent levels. The observed periodicity is related to the highly variability of urban traffic in the different days of the week. Three different seasonal ARIMA models are proposed and calibrated on a rich dataset of 800 sound level measurements. The predictive capabilities of these techniques are encouraging. The implemented models show a good forecasting performances in terms of low residuals, i.e. difference between observed and estimated noise values. The residuals are analysed by means of statistical indexes, plots and tests.

Support Vector Regression for Electricity Consumption Prediction in a Building in Japan

Conference Paper

Full-text available

Aug 2016

Enhancing smart grid with microgrids: Challenges and opportunities

Article

Full-text available

May 2017
RENEW SUST ENERG REV

The modern electric power systems are going through a revolutionary change because of increasing demand of electric power worldwide, developing political pressure and public awareness of reducing carbon emission, incorporating large scale renewable power penetration, and blending information and communication technologies with power system operation. These issues initiated in establishing microgrid concept which has gone through major development and changes in last decade, and recently got a boost in its growth after being blessed by smart grid technologies. The objective of this paper is to presents a detailed technical overview of microgrid and smart grid in light of present development and future trend. First, it discusses microgrid architecture and functions. Then, smart features are added to the microgrid to demonstrate the recent architecture of smart grid. Finally, existing technical challenges, communication features, policies and regulation, etc. are discussed from where the future smart grid architecture can be visualized.

LSTM Based Short-term Electricity Consumption Forecast with Daily Load Profile Sequences

Conference Paper

Oct 2018

Research on Hospital Operation Index Prediction Method Based on PSO-Holt-Winters Model

Conference Paper

Oct 2018

The prediction1 of hospital operation indicators is of great significance and can provide an important basis for hospital operation and management, so as to assist managers to make decisions such as resource allocation and task planning. In order to solve this problem, a novel Holt-Winters model based on particle swarm optimization (PSO) is proposed, aiming at the accurate prediction of hospital operating indicators. In the process of model construction, according to the characteristics of time series data of hospital operation indicators, a time decay mean square error function is constructed as an optimization function of particle swarm optimization algorithm, which enables particle swarm optimization algorithm to better fit recent historical data and grasp the characteristics of recent time series, so as to improve the prediction accuracy. An example is given to analyze the hospital operation index data of a third-class hospital from 2014 to 2017. By initializing the parameters of the model and optimizing the parameters, the improved PSO-Holt-Winters model of TDMSE-1 is established, which can accurately predict the outpatient, inpatient, emergency, discharged and surgical cases.

Natural gas prediction model based on wavelet transform and BP neural network

Conference Paper

May 2018

Aircraft failure rate forecasting method based on Holt-Winters seasonal model

Conference Paper

Apr 2017

Time Series Analysis and Prediction of Electricity Consumption of Health Care Institution Using ARIMA Model

Conference Paper

Nov 2016

Time Series Analysis and Prediction of Electricity Consumption of Health Care Institution Using ARIMA Model

Chapter

Apr 2017

The purpose of this research is to find a best fitting model to predict the electricity consumption in a health care institution and to find the most suitable forecasting period in terms of monthly, bimonthly, or quarterly time series. The time series data used in this study has been collected from a health care institution Apollo Hospital, Ludhiana for the time period of April 2005 to February 2016. The analysis of the time series data and prediction of electricity consumption have been performed using ARIMA (Autoregressive Integrated Moving Average) model. The most suitable candidate model for the three time series is selected by considering the lowest value of two relative quality measures i.e. AIC (Akaike Information Criterion) and SBC (Schwarz Bayesian Criterion). The appropriate forecasting period is selected by considering the lowest value of RMSE (Root Mean Square Error) and MPE (Mean Percentage Error). After building the final model a two-year prediction of electricity consumption of the health care institution is performed.

Electricity consumption prediction based on LSTM with attention mechanism

Abstract and Figures

Recommended publications

Short-Term Electricity Consumption Forecasting: Time-Series Approaches

Forecasting the Short-Term Electricity Consumption of Building Using a Novel Ensemble Model

Reinforcement learning for optimization hyperparameters of Long Short-Term Memory applied to Electri...

Feasibility Study for Implementing Smart Community in India: A case study of Panipat Project