ArticlePDF Available

Multi-Source and Temporal Attention Network for Probabilistic Wind Power Prediction

Authors:

Abstract

The temporal dependencies of wind power are significant to be involved in the modeling of short-term wind power forecasts. However, different time series inputs will contribute differently to the forecasting performance and bring in challenges to the selection of the relevant driving information. In this paper, a Multi-Source and Temporal Attention Network (MSTAN) is proposed for short-term wind power probabilistic prediction. The MSTAN model introduces multi-source NWP and makes three specific designs to improve prediction performance. Firstly, a novel multi-source variable attention module is proposed to select the driving variables of NWP. Secondly, a temporal attention module is used to capture the implicit temporal dependency hidden in the historical measurements and multi-source NWP sequence. Thirdly, the residual module is wrapped in MSTAN to skip some unnecessary nonlinear transformations and provide adaptive complexity to the entire model. After training, multi-horizon density forecasts for the next 48 hours are yielded by MSTAN. The MSTAN is compared with state-of-the-art machine learning schemes in the wind power forecasting system using the operation data from 3 wind farms. We demonstrate that MSTAN outperforms other counterparts on both deterministic and probabilistic prediction. The structure design scheme of MSTAN has been proven effective.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
1
AbstractThe temporal dependencies of wind power are
significant to be involved in the modeling of short-term wind
power forecasts. However, different time series inputs will
contribute differently to the forecasting performance and bring in
challenges to the selection of the relevant driving information. In
this paper, a Multi-Source and Temporal Attention Network
(MSTAN) is proposed for short-term wind power probabilistic
prediction. The MSTAN model introduces multi-source NWP and
makes three specific designs to improve prediction performance.
Firstly, a novel multi-source variable attention module is proposed
to select the driving variables of NWP. Secondly, a temporal
attention module is used to capture the implicit temporal
dependency hidden in the historical measurements and multi-
source NWP sequence. Thirdly, the residual module is wrapped in
MSTAN to skip some unnecessary nonlinear transformations and
provide adaptive complexity to the entire model. After training,
multi-horizon density forecasts for the next 48 hours are yielded
by MSTAN. The MSTAN is compared with state-of-the-art
machine learning schemes in the wind power forecasting system
using the operation data from 3 wind farms. We demonstrate that
MSTAN outperforms other counterparts on both deterministic
and probabilistic prediction. The structure design scheme of
MSTAN has been proven effective.
Index Termswind power probabilistic prediction, multi-step
prediction, multi-source NWP, variable attention, attention
mechanism, residual connection, mixture density. 1
I. INTRODUCTION
With the integration of high penetration wind energy, wind
power uncertainties are much concerned to achieve reliable and
economical power system operation and planning. Wind power
probabilistic forecasting (WPPF) provides detailed uncertainty
information, allowing system operators and electricity traders
to make better decisions in the process of reserve setting, unit
commitment, electricity trading, and so on [1]. However,
accurate short-term WPPF is still challenging due to the
inherent randomness of the wind resource [2].
Up to now, the realization of high accuracy WPPF mainly
relies on the following two critical technical routes:
(1) Better Data Inputs and Feature Engineering. Some
novel features and efficient data preprocessing methods will
reduce the difficulty of modeling and improve prediction
accuracy. Therefore, the introduction, construction, and
selection of target-related features have been widely adopted in
WPF research. (i) Some novel features are introduced into the
WPF model to reduce the prediction uncertainty. For instance,
multi-sites Numerical Weather Prediction (NWP) [3] and
ensemble NWP [4] have been used to enrich the input
This work was supported by the National Natural Science Foundation of China (U1765104), and
North China Electric Power University International Joint Training Graduate Program.
Hao ZHANG is with School of Renewable Energy, North China Electric Power University,
Beijing, China. (E-mail: zhanghaoncepu@163.com)
information of the WPPF model. Historical measurements and
NWP data [5-7] are simultaneously used as model input data to
improve very short-term forecasts and short-term forecasts.
However, how to dynamically balance the relative importance
between historically observed values and NWP values at
different prediction horizons is seldom discussed, leading to
underutilization of input data. Furthermore, off-site information
[8,9] and geospatial information [10] are introduced to provide
more spatial features. (ii) Besides, constructing highly target-
related driving features is also an effective way to improve
prediction accuracy. Many feature construction schemes are
proposed to enrich feature inputs of wind power forecasting.
The frequently used manual features include the multiple time
steps average feature, polynomial feature [11], clustering
feature [12], wavelet decomposition feature [13], dimension-
reducing features [14], unsupervised features [15]. (iii) When
the features are redundant and noisy, the features need to be
selected to reduce the influence of irrelevant features and noise.
Feature selection is often used to reduce the model complexity
and avoid the curse of dimensionality. The classical feature
selection methods include Filter, Wrapper, and Embedded
method. For example, the Filter selection based on mutual
information and the Embedded Selection based on the tree
method are proposed in [16, 17]. An embedded selection based
on Automatic Relevance Determination is presented in [7]. The
traditional feature selection methods only pick out the
important features globally, making the selected features not
suitable for every time slot.
(2) Accurate and Flexible Probabilistic Prediction Models.
Generally, short-term wind farm power prediction can be
divided into physical models [18,19], statistic models [20, 21]
and Machine Learning models. In recent years, Machine
Learning models, which could efficiently provide interval,
quantile, probability density, and scenario prediction results,
have gradually become the mainstream of short-term WPPF.
Machine Learning models have two categories: conventional
Machine Learning models and Deep Learning models. (i) From
the perspective of the conventional Machine Learning models,
there have been proposed many modes for short-term WPPF,
such as K-Nearest NeighborsKNN[22], Support Vector
Machine(SVM) [23], Gaussian Process (GP) [17], Tree-based
model [24], Bayes Learning [5], Autoregressive-based
models[25-27], shallow Artificial Neural Network (ANN) [28-
30] and Ensemble model [31,32]. Since the conventional ML
models cannot automatically extract deep-level features,
achieving high-accuracy wind power prediction often requires
detailed and specialized feature engineering. (ii) Deep Learning
Jie YAN and Yongqian Liu are the corresponding authors, with School of Renewable Energy,
North China Electric Power University, Beijing, China. (E-mail: yanjie@ncepu.edu.cn)
Yongqi GAO is with the Nansen Environmental and Remote Sensing Center, the University of
Bergen, Bergen, Norway.
Multi-Source and Temporal Attention Network for
Probabilistic Wind Power Prediction
Hao ZHANG, Jie YAN*, Member, IEEE, Yongqian LIU, Yongqi GAO, Shuang HAN, Li LI
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
2
models have strong nonlinear fitting capabilities and flexible
network structures, which can be regarded as competitive data-
driven solutions for WPF. Deep Learning models used for short
WPF can be divided into the following categories: Dense, RNN,
CNN, GCN. Early DL methods used for WPF are mostly
densely connected networks, such as Deep Belief Network [33],
Deep Boltzmann Machine [34], and DAE [35]. All three models
mentioned above have unsupervised training and a fine-tuning
process. Due to the limitation of the network structure, the
Dense models have some defects in modeling the dependence
of spatiotemporal data. Recurrent Neural Network (RNN),
including LSTM [36] and GRU [37], are gradually used to
multi-step wind power prediction, and the temporal dependence
of wind power was better learned in RNNs. Convolution Neural
Network (CNN) models, including 1D-CNN, 2D-CNN, and
TCN models, also have locally temporal dependency learning
capabilities [38-40]. For instance, 2D CNN is used to establish
the spatial dependence of regular grid data [41]. The
combination of CNN and LSTM is hired to capture the
spatiotemporal relationship in wind farms or wind farm clusters
[42]. GCN extends the convolution operation to the Non-
Euclidean domain. It shows better adaptability and higher
efficiency than CNN in the wind power prediction task [9].
However, several problems have not been carefully
addressed in earlier studies due to the limitations of the
prediction model structure and the variety of input data. (I)
Single-source ensemble NWP provided by a weather forecast
institution with different initial conditions and parameterization
schemes has been widely used before. However, the multi-
source NWP from diverse weather forecast providers has rarely
been considered in short-term WPF. Due to the limitations of
observations available for assimilation, computing resources
and engineering experience, one weather forecast provider
cannot guarantee that single-source NWP is accurate enough in
all regions and weather conditions. It is necessary to consider a
multi-source NWP scheme to reduce the risk of wind farm
power prediction [43]. (II) The relative importance of different
NWP features is changing dynamically. However, traditional
feature selection methods cannot pick out the dominant features
step by step, which leads to the input features of some time slots
are not optimal. For instance, a global optimal feature set is
selected by traditional feature selection methods from the entire
NWP dataset. However, there may be an optimal feature set
more suitable for a specific time slot. (III) The most existing
CNN and RNN based WPPF models have difficulties in
modeling the long-term temporal dependency hidden in the
observed sequence and the NWP sequence. When the
concerned temporal dependency is over a long-time window,
RNN models will suffer from the gradient vanish problem and
parallelization difficulties. CNN models focus more on the local
pattern, and it needs more layers and specific layer design to
obtain long-term temporal dependencies. (IV) The temporal-
based Deep Learning models require time sequences as inputs.
Thus, the short-term WPF training set is generally small under
the limitation of the NWP access times. If a short-term WPF
system obtains the NWP data once a day, the daily received
NWP sequence might only provide one sample for the model
training. Deep learning models have strong fitting capabilities
but also prone to overfitting, especially when the data set is
small. How to avoid the overfitting problem under data sets
with different sizes is rarely discussed in short-term WPPF.
In this paper, a Multi-Source and Temporal Attention
Network (MSTAN) is proposed for multi-step short-term
WPPF. The MSTAN uses the multi-source NWP from diverse
weather forecast providers and the historical observations as the
model inputs, the density forecasts for the next 48 hours as the
model outputs. Compared with previous short-term power
prediction studies, this paper has the following contributions:
Multi-source NWP is used in WPPF, and its long-term
temporal error pattern is discussed.
A novel multi-source variable attention module/layer is
designed to extract important variables from multi-source
NWP dynamically. Compared with the general feature
selection schemes [15,16,17], the multi-source variable
attention module makes the specific selection on every
single step.
The temporal dependency in the wind power sequence is
learned by a novel temporal attention layer. Compared
with RNN [36-37] and CNN [38-40] models, the proposed
temporal attention module is more effective in capturing
the long-term dependencies.
For avoiding the overfitting problem, a residual module
constructed by skip connection, gating mechanism, and
layer normalization is used to control the extent of
nonlinear transformation and reduce unnecessary
nonlinear transformation. Compared with some DL-based
WPPF models proposed in the literature [33-42], the
residual module could make the MSTAN model more
stable and adaptive.
This paper is organized as follows: Section II describes the
advantage of multi-source NWP and discusses the temporal
error pattern hidden in multi-source NWP. Section III defines
the multi-horizon wind power probabilistic prediction problem
and formally introduces the proposed MSTAN model. A case
study over three wind farm data is presented in Section IV.
Section V gives the conclusions and future works.
II. MULTI-SOURCE NWP AND ITS TEMPORAL ERROR PATTERN
A. Multi-source NWP
NWP models adopted by weather forecast providers can
differ in many aspects, such as spatial and temporal resolution,
observations available for assimilation and the specific
assimilation scheme, parameterization of physical process, and
other factors [44]. When the observational data for assimilation,
computing resources, and engineering experiences are limited,
single-source NWP products are likely to perform poorly in
some regions and weather conditions, which brings significant
risks to short-term wind power forecasts. As shown in Table 1,
the annual Root Mean Square Error (RMSE) of the multi-source
NWP wind speed for 10 wind farms is counted. No single NWP
source can achieve the lowest error index in ten wind farms at
the same time. Even the best NWP source only can achieve the
smallest RMSE in 6 of 10 wind farms. If a single NWP source
is used in a WPF system that serves many wind farms, it will
bring prediction risks to some of the wind farms it serves.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
3
On the contrary, multi-source NWP comes from different
weather forecast providers with varying settings of prediction.
Each forecast provider has their advantage and disadvantage in
NWP models, observations, parameterization schemes and
computing resources, etc. Therefore, multi-source NWP, which
integrates multiple advantages of different NWP, is more likely
to achieve low prediction risk and reliable accuracy. Significant
benefits will be achieved by using the multi-source NWP
scheme in real WPF projects. In some regions of China, the
accuracy of the WPF will directly affect the wind power
integration priority and the benefits of wind farms. The top
wind farms of the prediction accuracy ranking will be rewarded,
and the bottom wind farms of the prediction accuracy ranking
will be punished. Take a wind farm located in North China as
an example. The electricity price is about 0.078 $/kw · h, the
regional wind curtailment rate of 2019 is about 7.1%, the wind
farm capacity is 100 MW, running hours are about 250 per
month. If the wind farm is punished by an average curtailment
rate, the lost revenue will be 100000 250 0.078 7.1%
1.3845 10 $ per month. The penalty of the wind farm will
far exceed the cost of purchasing multi-source NWPs. Wind
energy companies would be delighted to use multi-source NWP
to promote the WPF accuracy and the grid integration priority.
TABLE 1 THE ANNUAL WIND SPEED RMSE OF 4 NWP SOURCES IN 10 WIND
FARMS
Wind Farm
Source1
Source2
Source3
ensemble
Source4
1
2.69
3.06
2.76
3.10
2
2.31
2.66
2.05
3.18
3
1.91
2.36
1.68
3.19
4
1.95
2.67
1.69
2.87
5
3.03
2.94
2.93
2.74
6
2.23
2.24
1.96
2.61
7
2.00
2.37
1.83
2.21
8
2.13
2.30
2.33
9
2.00
2.37
1.83
2.21
10
2.06
2.23
2.11
2.44
The annual wind speed RMSE of 4 NWP sources are counted by one-year data.
The 3rd source NWP in wind farm 8 is far less the one year. Therefore, the
statistical error of NWP source 3 from wind farm 8 is ignored.
B. Temporal error pattern of multi-source NWP
It is founded from the wind speed prediction error that the
multi-source NWP wind speed has its specific Temporal Error
Pattern (TEP). The TEP of multi-source NWP wind speed is
illustrated in Figure 1. This figure shows the annual averaged
0-47th hour wind speed RMSE of four NWP sources and two
multi-step time series methods (Persistence and Seq2Seq [45]).
Three major characteristics are shown in this figure.
Firstly, multi-step wind speed prediction results are more
accurate than the NWP wind speed from 0 to 6th hour. However,
NWP wind speeds are better than multi-step wind speed
predictions within the 6th to 48th hour. Similar conclusions are
also reported by some literature [46,47]. It means that historical
information (Measurements) and future information (NWP)
both have significant value for highly accurate short-term wind
power forecasts. Short-term forecasts and long-term forecasts
should respectively focus on the historical measurements and
the weather forecasts.
Secondly, the hourly wind speed errors of each NWP source
show a 24-hour cyclical trend. In a day, the NWP wind speed
error first decreases and then increases. Moreover, the NWP
wind speed errors from 24th to 48th hour are slightly higher than
the NWP wind speed errors from 0th to 24th hour. This trend
represents the TEP of multi-source NWP span a long-time
window.
Thirdly, around the 12th and the 36th hour, the wind speed
errors of four NWP sources are close to each other. The wind
speed errors at other time slots are much different. This trend
indicates that the relative wind speed prediction accuracy of
four NWP sources is changing dynamically. In other words, the
prediction model should dynamically pay attention to four
NWP sources.
Fig.1. The Temporal Error Pattern of multi-source NWP wind speed (wind farm
9). Four NWP sources and two kinds of time series prediction methods are
studied in this figure. A one-year dataset is used.
III. MULTI-SOURCE AND TEMPORAL ATTENTION
NETWORK
Introducing multi-source NWP and considering the TEP
hidden in the multi-source NWP is a promising way to improve
the accuracy of the WPPF models. Therefore, a deep learning
based WPPF model, called Multi-Source and Temporal
Attention Network (MSTAN), is proposed in this paper. Four
critical modules/layers are used in MSTAN.
Multi-source variable attention module. It is designed
to extract the driving variables of multi-source NWP
dynamically. The collinearity problem and the harmful
effects of irrelevant variables and noise are reduced.
Residual module. It skips some unnecessary nonlinear
transformations and adaptively control the complexity of
the model to reduce the overfitting risk.
Temporal attention module. It dynamically selects
historical and future information and extracts the long-
term temporal dependency.
Mixture density module. It outputs the joint probability
density of multi-horizon wind power forecasts.
In this section, the used probabilistic prediction framework is
introduced first. Then four designed modules and the loss
function used in MSTAN are presented. Finally, the overall
structure and the relationship between all modules are clarified.
A. The Probabilistic Prediction Framework
Short-term WPPF aims to establish a multi-horizon
prediction function, which takes historical information (such as
wind speed and power measurements) and future information
(NWP) as inputs, and future wind power distribution as outputs.
More formally, let : =(,…,) be the time-
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
4
varying covariates that could be deemed as known feature
values, such as NWP and relative time index (hour of the day),
and , is the dimension number of . Let the value
of measured wind power at time by , where :
(,,…,) and . Similarly, the value of measured
wind speed at time by , where : (,,…,) and
. 0 is the length of historical measurements, is the
maximum prediction horizon.
As shown in Figure 2, given the past wind power and wind
speed measurements :, :, the time-varying covariates
: , and model parameters , the short-term wind
power probabilistic forecasting problem is described as
(1)
Estimating a joint probability density function of future
multi-step wind power values could be considered as a
supervised learning problem, which is solved by minimizing the
discrepancy or maximizing the likelihood between
measurements and model forecasts.
Fig. 2. The structure of MSTAN
B. Multi-source variable attention module
A multi-source variable attention module/layer is designed
for multi-source NWP by combining prior knowledge and
attention mechanism. In terms of prior knowledge, the
theoretical relationship =
 can be easily derived,
where is the coefficient of performance, is the air density,
is the rotor swept area, is the wind velocity. Infer from this
equation, (1) wind speed is the most important variable in wind
power forecasting. (2) The other variables (such as pressure,
temperature, humidity, and wind direction) mediately affect the
wind power outputs.
For reflecting the importance of wind speed variables and
other related variables, the multi-source attention module is
constructed by two sub-modules: 1) the multi-source wind
speed variable attention sub-module, 2) the other variable
attention sub-module. The outputs of the multi-source variable
attention are the concatenation of two sub-modules outputs.
Let the multi-source NWP at time by =
[
,...,,
,...,
],  is the number of wind
speed variables,  is the number of other variables.
and both are scalars. Each wind speed variable
and other
variable is transformed into a vector by a nonlinear Dense
layer.
,=󰇡
,+
,󰇢  1 (2)
,=󰇡,+
,󰇢  1 (3)
where,  is activation function, weight parameters ,
and ,  , bias parameters
, and
, .
, ,
, .
The multi-source wind speed variable attention sub-module
performs a weighted summation on all transformed wind speed
vectors
, . The other variable attention sub-module
performs a weighted summation on all other transformed
vectors
, . The attention weights of each transformed
vector are determined by and the Softmax function. The
output of the multi-source variable attention module is the
concatenation of 󰆻
 and 󰆻
.
(4)
󰆻
 =
,
,

(5)
(6)
󰆻
 =
,
,

(7)
(8)
where
  and
  are selection
weights of two sub-module at time . 󰆻
  and
󰆻
  are the outputs of two sub-modules. The
concatenation of
󰆻
 and 󰆻
 is
= [󰆻
,󰆻
].
  ,   ,   ,
  are the weight and bias parameters of the
nonlinear Dense layer before the Softmax. The structure of the
multi-source variable attention module is shown in Figure 3.
The inputs determine the attention weights at time step .
Therefore, the strengthened important variables and the
weakened irrelevant variables at each time step are different.
Such a dynamic attention mechanism can take the temporal
error pattern of multi-source NWP into account.
Fig. 3. The structure of the multi-source variable attention module.
C. Residual module
In real situations, we cannot know in advance which input
variables have a strong correlation with the supervised target. It
is also challenging to determine whether a nonlinear
transformation is required, especially when the training data set
is minimal and noisy.
The residual module/layer adaptively controls the model
complexity and reduces unnecessary nonlinear transformation
to prevent over-fitting. As shown in Figure 4, the used residual
Add Positional Encoding
Layer Norm
LSTM -Encoder
Variable selection
LSTM -Decoder
Variable selection
LSTM -Decoder
1T0
T0+1 T0+
Past Inputs
Wind speed measurements
Wind power measurements
Future Inputs
NWP
Add
Gate
Dense
Layer Norm
LSTM -Encoder
Add
Gate
Dense
Dense
Layer Norm
Add
Gate
Layer Norm
Add
Gate
Dense
Layer Norm
Add
Gate
Layer Norm
Add
Gate
Softmax
Dense2
Dense1
Softmax
Dense2
Dense1
Self-Attention
Dense_w
1
Dense_w
n1
Dense_s1
X
v
1
Softmax
Dense_o
1
Dense
_o
n2
Dense_
s2
X
1

Softmax
Concatenate
v


 ,1
 ,


 ,1
 ,

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
5
module consists of three operations. The Skip Connection
operation[48] can retain the original information. The Gating
Mechanism operation [49] controls the degree of nonlinear
transformation. The Layer Normalization operation [50] is
used to stabilize the output distribution and accelerate the
convergence of the model.
Since the residual module needs to be used together with
other nonlinear layers/modules, the output of the residual
module is affected by the linear path and the nonlinear path.
When the model needs low complexity, the residual module
skips the nonlinear path and is simplified to a linear mapping or
identity mapping. When the model needs high complexity, the
residual module retains most information from the nonlinear
path and outputs the summation from the linear and nonlinear
paths.
Formally, let the residual module receive a tensor as inputs,
is the vector of at time step , . The input
signal will pass through two paths, the nonlinear gated path
and the linear/identity mapping path. In the nonlinear gated path,
is transformed to , by the nonlinear transformation
function , ,. In MSTAN, LSTM and self-
attention module are used as the . Then , is controlled by
the Gated Linear Units (GLU) [48] to output ,,,
. In linear/identity mapping path, when ,has the
same shape with , , equals to . When the shape of and
, are different, is transformed by linear Dense layer to ,,
, will have the same shape with ,. Finally, the Layer
Normalization module takes the summation of , and , as
inputs and outputs ,, ,.
The nonlinear gated path:
,=,,
(9)
,=,=(,+) (,+)
(10)
The linear or identity mapping path:
,=
+
, 

, =
(11)
The outputs of residual modules:
,=(,+,)
(12)
where , ,, are the weight parameters,
, , , are the bias parameters. is the elementwise
Hadamard product.
Fig. 4. The structure of the residual module. Left: the residual module with the
linear mapping path. Right: the residual module with the identity mapping path.
D. Temporal Attention module
For capturing the temporal dependency across all the time
steps, a temporal attention module constructed by Encoder-
Decoder, Positional Encoding (PE), and self-attention, is
employed [51].
(1) Encoder-Decoder module receives historical
measurements and the outputs of the multi-source variables
selection module. For enhancing the expression capability of
temporal dependency, two LSTM are used as Encoder-Decoder
module. Let the historical wind speed and wind power
measurements sequence be : and :. The encoder part
takes the : and : as inputs. The decoder part takes the
󰆻: as inputs. At the same time, the residual module is
wrapped on the Encoder-Decoder.
=󰇫


(

, [
,
]) 1 0




,󰆻
0 + 1 0 + (13)
=󰇫
(([
,
])+(
) 1 0
(󰆻
+(
) 0 + 1 0 + (14)
where is the initial state.  is the Layer Normalization. 0
is the encoder length, is the maximum prediction horizon. The
is the outputs of the Encoder-Decoder with a residual
module. For the simplicity and convenience, the output
dimension of  is set to _, .
(2) Positional encoding (PE) generates the position
information of each time step. Since self-attention is a global
attention mechanism, the relative position information cannot
be considered when calculating the similarity. Therefore,
position information is added to the input tensor . The
positional information is defined as:
PE(,2i) = sin(
10000


) (15)
PE(,2i + 1) = cos(
10000


) (16)
The inner product between (, : ) and (+, : ) will
decrease with the increase of , so PE indirectly represents the
relative distance between different time steps. The input tensor
of the self-attention layer could be expressed as:
=+PE(, : )
(17)
(3) In simple terms, the essence of the attention mechanism
is to select the information that has a similar relationship with
the query information from a sequence. The self-attention maps
a query representation to a new representation by the
weighted sum of the values representation . The attention
weights are determined by the scaled dot-product of query
representation and key representation .
=(,,) = (


) (18)
In order to strengthen the expressive ability of the attention
mechanism, the Q, K, and V tensors are usually transformed
linearly by ,,,  is the attention
dimension.
=(,,)
(19)
In the practice of the multi-horizon wind power prediction,
the outputs of the self-attention layer are expressed as:
:
=(:
,:
,:
)
(20)
Let the =, the outputs of the temporal attention
module wrapped by the residual module is
󰆻=󰇡+
󰇢 0 + 1 0 + (21)
where
, 󰆻.
E. Mixture density module
Similar to our previous work [52], the mixture density
network (MDN) module is used to model the probability
density of short-term wind power. The hired MDN
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
6
approximates the distribution of normalized forecasts by
weighting multiple Beta distributions. As shown in Figure 5,
the mixture density module is constructed by two Dense layers
and one Softmax layer. The _ outputs shape parameters
, and the _ outputs shape parameters
, the Softmax
layer outputs mixing coefficients . The activation function of
 and _ is ReLu. The mixture density module
outputs the ,
, for each time step in a parameter sharing
manner.
=_(󰆻) = (󰆻+))
(22)
= _(󰆻) = (󰆻+))
(23)
=((󰆻+))
(24)
() = ,(|,,
,)

(25)
where is the number of components in the MDN, , is the
 mixing coefficient at time step , and ,,
,
corresponding to the shape parameters of the  component at
time step . ,
, . The sum of mixing coefficients
is equal to 1, ,
 = 1 . ,,  are
weight parameters, ,, are the bias parameters.
Since the domain of Beta distribution is [0,1], the domain of
mixture density is also [0,1]. The wind power forecast density
could be acquired by multiply the wind power capacity on the
mixture distribution.
Fig. 5. The structure of the mixture density module.
F. Loss Function and Training Method
MSTAN is an end-to-end deep learning model. All
modules/layers of MSTAN could be jointly trained by the
Back-Propagation algorithm and Gradient Descent. Therefore,
the negative log-likelihood function is used as the loss function,
and the Adam optimizer is used as the optimizer. The negative
log-likelihood function is as follows:
(,,
,) = (,(|,,
,)
)
(26)
The trained model outputs the mixture distribution parameters, and
the deterministic forecasts could be calculated by mean value or
median value equation.
The mean value equation of mixture distribution:
=,,
,+
,

(27)
The median value equation of mixture distribution:
,,1/3
,+
,2/3

(28)
G. The Relation between the Modules
In order to state the relationship between the used modules,
the tensor operations in the forward process of MSTAN are
shown in Figure 6. MSTAN takes 3D tensor as the model inputs
and outputs. The shape of past inputs [:,:] is [ B, , 2].
The shape of future inputs : is [ B, , 
+ ]. B is the batch size.
As shown in Figure 6, the multi-source variable attention
module/layer takes : as inputs and 󰆻: as
outputs. The LSTM Encoder that wrapped by residual module
takes [:,:] as inputs and : as outputs. The LSTM
Decoder that wrapped by residual module takes 󰆻: as
inputs and : as outputs. Then : adds position
encoding to itself. The self-attention module that wrapped by
residual module take the : as inputs and 󰆻: as
outputs. The mixture density module takes the 󰆻: as
inputs and :,
:,: as outputs.
Fig. 6. The relation between the used modules.
IV. APPLICATIONS IN THREE WIND FARMS
A. Dataset
Three wind farms located in North China are studied. In each
wind farm, four NWP sources are used. The 3rd NWP is an
ensemble NWP. Each NWP source contains one to two NWP
sites. Case 1, Case 2 and Case 3 correspond to WF 6, WF 9, and
WF2 in Table 1. The NWP data at each site contains several
features: wind speeds (WS), wind direction (WD), relative
humidity (RH), temperature (TMP), air pressure (PRE)). Before
0:00 every day, the next 48 hours of multi-source NWP data
will be received. The multi-source NWP data is accessed once
a day. The measured data includes the wind speed of the wind
tower and the power output of the booster station. The data set
has one year of data (2019-01-01:2019-12-31), and the time
resolution is one hour. The used wind power prediction dataset
comes from a real regional wind power forecasting project. Due
to the confidentiality agreement with the wind farm operator
and NWP provider, the data cannot be open accessed yet.
The entire datasets are divided into training sets and testing
sets by the Date Time. The data from the 1st to the 24th days of
each month is divided into training sets. The data from the 25th
day to the end of each month is divided into testing sets. 20%
of the samples are randomly selected in the training set as the
validation set. This division scheme ensures that the MSTAN is
tested by the data of each month throughout the year.
B. Benchmarks
Two classic technical routes are used to compare with the
proposed model. These two technical routes are widely used by
Softmax
Dense_
Dense_
x
LSTM_Encoder with
Residual module
LSTM_Decoder with
Residual module
Multi-Source Variable
Selection module
Self-Attention with Residual module
Mixture Density module
TemporalAttention module
Add Positional Encoding
Past inputs: [
:
,
:
]Future inputs:
:
Shape:[ B, , 2] Shape:[ B, ,

+

]
Shape:[ B, ,

*2]
:
Shape:[ B, 0,

]
:

Shape:[ B, ,

]
:
Shape:[ B, +,

]
:
Shape:[ B, ,

]
:
,
:
,
:
Shape:[ B, , ]

:

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
7
machine learning competitions and commercial applications.
The first technical route is feature engineering + regressor, and
the second technical route is deep learning.
1) Technical route 1: Feature Engineering + Regressor
The purpose of feature engineering is to alleviate the curse of
dimensionality and reduce the difficulty of learning tasks.
Regarding the winning feature engineering scheme in
GEFcom2012 and 2014 [10, 11, 53], the feature engineering
scheme, including Feature construction, Feature selection,
Normalization, Category Encoder, and Target transformation,
is adopted.
Four algorithms are employed as regressors, including
Ridge regression, Support Vector Regression (SVR), K Nearest
Neighbor Regression (KNNR), and LightGBM. LightGBM
could be deemed as an advanced implementation of GBM and
GBDT used in GEFcom2014[53]. The feature engineering is
used in each regressor. Therefore, four wind power forecasting
pipelines are built by combining feature engineering and
regressor.
In order to obtain the probabilistic results, different strategies
are used. (1) LightGBM can directly provide the quantile
outputs by setting the quantile loss function. Therefore, 19
models are built to obtain the quantile forecasts corresponding
to 0.05:0.05:0.95 quantiles. (2) Ridge, SVR, KNN cannot
directly produce probabilistic prediction results. Therefore, we
count the prediction error distribution of training data in
different power prediction intervals by Kernel Density
Estimation(KDE). The quantile prediction results could be
given through deterministic results and the corresponding error
distribution when testing.
2) Technical route 2: Deep Learning
Three classical deep learning models for multi-horizon wind
power forecasting are selected as the benchmarks, including
Seq2Seq [44], DeepAR [55], MQRNN [55]. These models are
integrated into Amazons time series prediction library
GluonTS [56].
Seq2Seq: using two LSTM as the Encoder and Decoder. The
Encoder LSTM takes the historical measurements as inputs.
The Decoder parts take the multi-source NWP and encoder
context as inputs. Seq2Seq gives the quantile forecasts by
optimizing the quantile loss function. The considered quantiles
are set as 0.05:0.05:0.95 (19 quantiles).
DeepAR: The probability density distribution of target
variables is given by the outputs of LSTM. The maximum
gaussian likelihood of the simulated distribution is used as the
optimization objective during the training process. The multi-
step prediction results are obtained step by step through
sampling during the forecasting process. The input of DeepAR
also includes historical measurements and multi-source NWP.
MQRNN encodes historical measurements by LSTM and
then decodes the encoded information and future inputs through
a global MLP branch and a local MLP branch. The multi-source
NWP is used as the future inputs of Global MLP and local MLP.
MQRNN also outputs the quantile values by optimizing the
quantile loss function. The considered quantiles are set as
0.05:0.05:0.95. (19 quantiles).
C. Training Settings and Software
Hyper-parameters of MSTAN are selected by grid search.
The best hyper-parameter settings are presented in Table 2. In
this paper, the Adam optimizer is used. For Adam optimizer,
learning rate (LR) and batch size are the most important
hyperparameters. The LR of Adam is usually in the range of
[0.0001, 0.1], and the default recommended LR is usually 0.001.
In order to find the optimal LR, we set the grid of LR to [0.01,
0.003, 0.001, 0.0003, 0.0001]. Batch size generally cannot be
too small or too large. It is recommended to select from 2, n
3, 4, 5, 6, etc. Since one-year training samples less than 300,
the batch size value should be much smaller than 300. Therefore,
we set the grid of batch size to [8, 16, 32, 64]. d_model and
d_attention are model hyperparameters, which determine the
capacity of the model. Since the training data set is small, a
small model capacity is enough. The d_model and d_attention
both set to [8,16,32,64].
All the deep learning schemes used in this paper are
implemented by Pytorch [57]. Feature engineering + regressor
schemes are implemented by Sklearn [58] and LightGBM [59].
TABLE 2 THE HYPER-PARAMETERS OF MSTAN IN CASE TWO
Hyper Parameters
d_model=16, LSTM_layer= 2,
LSTM_hidden_dim = d_model =16,
d_attention=16, m=3, T0=48, =48
Optimizer
Adam, learning rate =0.001, batch=32
Computing Resource
Apple M1
D. Evaluation criterion
The deterministic and probabilistic forecasting performance
is evaluated by several evaluation criteria. Normalized Root
Mean Square Error (NRMSE) and Normalized Mean Absolute
Error (NMAE) are used for deterministic forecasting evaluation.
For probabilistic forecasting evaluation, the stand Quantile
Loss (QL) index [53], Continuous Ranked Probability Score
(CRPS) [60] and the Average Coverage Error (ACE) [61] are
used.
E. Results and Discussions
1) Deterministic results
NRMSE and NMAE results of three cases are presented in
Table.3. The NRMSE and NMAE of the MSTAN are lower
than the NRMSE and NMAE of other counterparts. By
comparing with the best model using the first technical route,
the NRMSE of the proposed model in the three cases is reduced
by 0.8%, 1.1%, and 0.6%, and the NMAE is reduced by 0.6%,
1.0%, and 0.5% respectively. Among the three algorithms using
the second technical route, the Seq2Seq algorithm performs
better. Compared with the Seq2Seq algorithm, the proposed
algorithm reduces the NRMSE by 1.0%, 1.7%, and 1.0%,
reduces the NMAE by 0.6%, 1.2%, and 0.6%.
TABLE 3 NRMSE AND NMAE OF THREE WIND FARMS
Methods
Case 1
Case 2
Case 3
NRMSE
NMAE
NRMSE
NMAE
NRMSE
NMAE
Persistence
0.331
0.246
0.373
0.279
0.323
0.249
Ridge
0.174
0.126
0.171
0.124
0.163
0.119
KNN
0.182
0.131
0.184
0.131
0.173
0.125
SVM
0.175
0.126
0.17
0.122
0.160
0.116
lightGBM
0.174
0.125
0.168
0.118
0.162
0.117
DeepAR
0.187
0.143
0.198
0.156
0.189
0.146
Seq2Seq
0.176
0.125
0.176
0.124
0.164
0.116
MQRNN
0.179
0.139
0.176
0.135
0.166
0.125
Proposed
0.166
0.119
0.159
0.112
0.154
0.110
Figure 7 shows the averaged 48-hour NRMSE and NMAE
across all test samples for the proposed model and the DL
benchmarks. The 48-hour NRMSE and NMAE of MSTAN are
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
8
below the other three Deep Learning methods. A small number
of RMSE and MAE points of MSTAN are not the lowest, but
the overall error trend of the MSTAN model is better.
Fig. 7. The NRMSE and NMAE of MSTAN from 0th to 47th hours for Case 2.
The results are the averaged values across all testing samples.
2) Probabilistic results
QL and CRPS results of the three cases are presented in Table
4. The QL and CRPS of MSTAN are lower than the QL and
CRPS of other counterparts. By comparing with the best model
using the first technical route, the QL of the proposed model is
reduced by 5%, 7.2%, and 3.0% for the three cases. The CRPS
is reduced by 14.4%, 10.8%, and 10.3% for the three cases.
Among the three algorithms using the second technical route,
the Seq2Seq algorithm performs better. Compared with the
Seq2Seq algorithm, the proposed algorithm reduces the QL by
4.4%, 12.5%, and 0.7%, and the CRPS is reduced by 15.6%,
13.3%, and 6.9%.
TABLE 4 QL AND CRPS OF THREE WIND FARMS
Methods
Case1
Case 2
Case 3
QL
CRPS
QL
CRPS
QL
CRPS
Persistence
-
-
-
-
-
-
Ridge
0.382
0.103
0.3
0.094
0.307
0.096
KNN
0.397
0.106
0.317
0.101
0.322
0.101
SVM
0.38
0.104
0.294
0.094
0.301
0.096
lightGBM
0.378
0.107
0.285
0.092
0.302
0.100
DeepAR
0.432
0.114
0.372
0.115
0.368
0.112
Seq2Seq
0.376
0.104
0.297
0.094
0.295
0.093
MQRNN
0.421
0.114
0.322
0.106
0.318
0.110
Proposed
0.36
0.090
0.264
0.083
0.293
0.087
Figure 8 shows the averaged 48-hour QL and CRPS across
all test samples for the proposed model and the DL benchmarks.
The probabilistic prediction results of the MSTAN model are
significantly better than several deep learning algorithms under
most prediction horizons.
Fig. 8. Hourly QL and CPRS of MSTAN from 0th to 47th hours for Case 2. The
results are the averaged values across all testing samples.
The ACE score of several benchmarks and the proposed
model are shown in Appendix. For the three studied wind farms,
the ACE of MSTAN is below 0 at most confidence levels. ACE
of Ridge and KNN is more stable and closer to 0 than MSTAN
in case 1 and case 3, but MSTAN performs better than most
other counterparts in three cases. The ACE of LightGBM,
MQRNN and DeepAR are poor in three cases.
The probability distributions of wind power forecasts for the
next 48 hours are shown in Figure 9. In each sub-figure, the
probabilistic forecasts for three consecutive days are presented.
In each wind farm, the predicted values can follow the actual
wind farm power outputs very well. When the deterministic
wind power forecasts close to 0 MW or the wind farm capacity,
uncertainty of the wind power forecasts is low. When the
deterministic wind power forecasts close to 50% of wind farm
capacity, wind power prediction uncertainty is high.
Fig. 9. The short-term probabilistic forecasts for the next 48 hours are deduced
every 24 hours. The generated quantiles are 5% to 95% (19 lines).
3) Performance variation evaluation by bootstrapping
Since the used dataset is small, bootstrapping is used to
estimate the significance of the deterministic and probabilistic
results [43]. Three kinds of comparisons are implemented in
this part, (1) the performance variation of different forecasting
models, (2) the performance variation of single-source NWP
and multi-source NWP schemes, (3) the performance variation
of module ablation. The box plots show the NRMSE and CRPS
variation using the bootstrap approach with 200 bootstrap
samples.
(a) The comparison between benchmarks and MSTAN
Results similar to Table 3 and Table 4 are acquired by
bootstrapping in Figure 10 (a) and (d), Figure 11 (a) and (d),
Figure 12 (a) and (d). These figures show that the accuracy
improvement of the proposed model is significant.
(b) The comparison between single-source and multi-source
NWP
As shown in Table 1, each wind farm has 4 sources of NWP.
The most accurate two source and three source NWPs are
picked based on the NWP RMSE ranking. The NWP RMSE
ranking of Case 1(WF6) is 3<1<2<4. The NWP RMSE ranking
of Case 2(WF9) is 3<1<4<2. The NWP RMSE ranking of Case
3(WF2) is 3<1<2<4.
Single source NWP (source 1, source 2, source 3 and source
4) and most accurate two-source, three-source, four-source
NWPs are compared in Figure 10 (b) and (e), Figure 11 (b) and
NRMSE
NMAE
QL
CRPS
Day 1
Day 1
Day 1
Day 2
Day 2
Day 2
Day 3
Day 3
Day 3
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
9
Fig. 10. NRMSE and CRPS of case 1.
Fig. 11. NRMSE and CRPS of case 2.
Fig. 12. NRMSE and CRPS of case 3.
(a) NRMSE of benchmarks and MSTAN. (b) NRMSE of MSTAN by using single-source NWP and multi-source NWP. (c) NRMSE of MSTAN with and without
different modules.
(d) CRPS of benchmarks and MSTAN. (e) CRPS of MSTAN by using single-source NWP and multi-source NWP. (f) CRPS of MSTAN with and without different
modules.
The box plots show the NRMSE and CRPS variation using the bootstrap approach with 200 bootstrap samples.
(d)
(e)
(f)
(a)
(b)
(c)
(d)
(e)
(f)
(a)
(b)
(c)
(d)
(e)
(f)
(a)
(b)
(c)
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
10
(e), Figure 12 (b) and (e). As depicted in these figures, the
poorest prediction performance of MSTAN all appears when
single-source NWP is used. The best prediction results of
MSTAN are achieved when using 4-source NWPs for all cases.
The results of using the best two-sources NWPs and best three-
sources NWPs are close. At wind farm 2, the results of using
one source (source 1 and source 3), best 2-source NWPs and
best 3-source NWPs are close.
In addition, the NWP with the smaller RMSE may not
necessarily lead to better wind power prediction accuracy. This
phenomenon is common and may be caused by the nonlinear
relationship between wind speed and power. It may bring more
risks to the single-source NWP forecasting scheme.
(c) Ablation experiments
In order to demonstrate the effectiveness of the designed
model, ablation experiments are implemented. Specifically, we
remove one module at a time in the MSTAN and readjust the
hyper-parameters. The MSTAN without different module are
named as follows:
w_o selection: The MSTAN model without the multi-source
variable attention module.
w_o temp_attn: The MSTAN model without the temporal
attention module.
w_o skip: The MSTAN model without the residual module.
As shown in Figure 10 (c) and (f), Figure 11 (c) and (f),
Figure 12 (c) and (f), the best prediction results are acquired by
the intact MSTAN. For three cases, removing the temporal
attention module and the residual module will cause a
significant prediction performance drop. Removing the Multi-
source variable attention module causes slight performance
drops.
4) Temporal Attention weights pattern
When giving the forecasts at a time step, the MSTAN model
not only considers the inputs of the current time step but also
considers the inputs of other time steps. Temporal Attention
weights (
) determine which time step should
be concerned. As shown in Figure 13. (a), the historical wind
power and 48-hour forecasts of one sample are drawn. Figure
13. (b) plot the weight matrix of the temporal attention module
for this sample. Figure 13. (c) shows the 3D version of Figure
13. (b). Y-axis represents the lead time step of the output
sequence, and X-axis represents the considered time step of the
input sequence, Z-axis represents the value of attention weights.
At each output time step, it should be determined how to assign
the weights to each input time step. In Figure 13. (a), the next
48-hour real wind power sequence can be divided into three
parts. The first part is a downward ramp process (0th to 18th
hour). The second part is a fast-upward ramp process (18th to
30th hour). The third part is a smooth process (30th to 47th hour).
In the first process, the wind power prediction results focus
on the inputs sequence from the 12th to 18th hour. The wind
power forecasts of the second process focus on the input
sequence from the 18th to 48th hour. The wind power forecasts
of the last process give higher weights to the input sequence
from 40th to 47th hour. Therefore, the learned temporal pattern
could follow the trends of the real wind power sequence. The
ability to dynamically pay attention to the critical parts of the
sequence is not available in DL models such as LSTM and CNN.
Fig. 13. The temporal attention weights of one day (case 2). Subfigure (c) is
the 3D version of subfigure (b).
V. CONCLUSIONS
In this paper, a Multi-Source and Temporal Attention
Network (MSTAN) is proposed for the short-term WPPF. The
MSTAN model takes the multi-source NWP data and historical
measurements sequence as inputs and the future 48-hours wind
power density forecasts as output. The MSTAN is constructed
by four major modules. (1) In order to dynamically select the
driving variables and reduce the harmful effects raised by
introducing multi-source NWP, a novel multi-source selection
module is designed. (2) The temporal attention module is
proposed to extract the long-term temporal dependency hidden
in the multi-source NWP. (3) The residual module is wrapped
into the MSTAN model to provide adaptive complexity and
avoid overfitting. (4) the beta kernel-based mixture density
module is used to output the multi-step probabilistic prediction
results.
Based on the case study over three selected wind farms, the
MSTAN is strictly compared with two state-of-the-art technical
Lead time
Considered time step
(a)
(b)
(c)
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
11
routes. Results demonstrate that MSTAN gives higher
deterministic prediction accuracy and better probabilistic
evaluation score. The effectiveness of the multi-source
selection module, the temporal attention, and the residual
module are respectively demonstrated.
Some works should be done to improve the proposed
MSTAN architecture further. (1) The proposed model only
considers the temporal dependency, but the spatial dependence
is important for wind power forecasting. Novel spatial attention
or spatial feature extraction modules should be merged into
MSTAN. (2) In order to meet the demands of more wind farms,
the applicability of MSTAN at other time resolutions should be
verified.
VI. REFERENCE
[1] J. Yan, Y. Liu, S. Han, Y. Wang, and Shuanglei Feng, Reviews on
uncertainty analysis of wind power forecasting,” Renewab le and Sustain.
Energy Rev., vol. 52, pp. 1322-1330, 2015.
[2] G. Sideratos and N. D. Hatziargyriou, An Advanced Statistical Method
for Wind Power Forecasting,IEEE Trans. Power Syst., vol. 22, no. 1,
pp. 258-265, Feb. 2007.
[3] J. R. Andrade and R. J. Bessa, Improving Renewable Energy
Forecasting with a Grid of Numerical Weather Predictions,” IEEE Trans.
Sustain. Energy, vol. 8, no. 4, pp. 1571-1580, Oct. 2017.
[4] J. W. Taylor, P. E. McSharry and R. Buizza, Wind Power Density
Forecasting Using Ensemble Predictions and Time Series Models,IEEE
Trans. Energy Convers., vol. 24, no. 3, pp. 775-782, Sept. 2009.
[5] W. Xie, P. Zhang, R. Chen, and Z. Zhou, A Nonparametric Bayesian
Framework for Short-Term Wind Power Probabilistic Forecast,IEEE
Trans. Power Syst., vol. 34, no. 1, pp. 371-379, Jan. 2019.
[6] P. Du, “Ensemble Machine Learning-Based Wind Forecasting to
Combine NWP Output with Data from Weather Station, IEEE Trans.
Sustain. Energy, vol. 10, no. 4, pp. 2133-2141, Oct. 2019.
[7] N. Chen, Z. Qian, I. T. Nabney and X. Meng, “Wind Power Forecasts
Using Gaussian Processes and Numerical Weather Prediction, IEEE
Trans. Power Syst., vol. 29, no. 2, pp. 656-665, March 2014.
[8] Y. Zhang and J. Wang, A Distributed Approach for Wind Power
Probabilistic Forecasting Considering Spatio-Temporal Correlation
Without Direct Access to Off-Site Information, IEEE Trans. Power
Syst., vol. 33, no. 5, pp. 5714-5726, Sept. 2018.
[9] Z. Wang, W. Wang, C. Liu, Z. Wang, and Y. Hou, “Probabilistic
Forecast for Multiple Wind Farms Based on Regular Vine Copulas,
IEEE Trans. Power Syst., vol. 33, no. 1, pp. 578-589, Jan. 2018.
[10] M. Khodayar and J. Wang, Spatio-Temporal Graph Deep Neural
Network for Short-Term Wind Speed Forecasting,IEEE Trans. Sustain.
Energy, vol. 10, no. 2, pp. 670-681, April. 2019.
[11] L. Mark, T. P. Erlinger, D. Patschke, and C. Varrichio, “Probabilistic
Gradient Boosting Machines for GEFCom2014 Wind Forecasting,Int.
J. Forecast., vol. 32, no.3, pp. 106166, 2016.
[12] Silva. Lucas, A Feature Engineering Approach to Wind Power
Forecasting,” I Int. J. Forecast. vol. 30, no. 2, pp. 395401, 2014.
[13] K. Bhaskar and S. N. Singh, “AWNN-Assisted Wind Power Forecasting
Using Feed-Forward Neural Network,IEEE Trans. Sustain. Energy, v ol.
3, no. 2, pp. 306-315, April 2012.
[14] Da. Federica and S. Alessandrini, “Post-Processing Techniques and
Principal Component Analysis for Regional Wind Power and Solar
Irradiance Forecasting,Solar Energy, vol.134, pp. 32738, 2016.
[15] Y. Wu, Q. Wu and J. Zhu, “Data-driven wind speed forecasting using
deep feature extraction and LSTM,IET Renew. Power Gene., vol. 13,
no. 12, pp. 2062-2069, 2019.
[16] S. Li, P. Wang, and L. Goel, “Wind Power Forecasting Using Neural
Network Ensembles with Feature Selection,” IEEE Trans. Sustain.
Energy, vol. 6, no. 4, pp. 1447-1456, Oct. 2015.
[17] S. Kunpeng, Y. Qiao, W. Zhao, Q. Wang, M. Liu, and Z. Lu, An
Improved Random Forest Model of Short term Wind power
Forecasting to Enhance Accuracy, Efficiency, and Robustness, Wind
Energy, vol. 21, no. 12, pp. 13831394, 2018.
[18] L. Li, Y. LIU, Y. YANG, S. HAN, Short-term wind speed forecasting
based on CFD pre-calculated flow fields,” Proceed. The Chinese Soc.
Electric. Eng., vol. 33, no. 7, pp.27-32, 2013.
[19] L. Landberg, “A mathematical look at a physical power prediction
model,Wind Energy, vol.1, no.1, pp:23-28, 2015.
[20] E. Erdem, and S. Jing, ARMA based approaches for forecasting the
tuple of wind speed and direction,Appl Energy, vol.88, no.4, pp.1405-
1414, 2011.
[21] P. Louka, G. Galanisac, N. Siebert, et al., Improvements in wind speed
forecasts for wind power prediction purposes using Kalman filtering,J.
Wind Eng. & Indus. Aero., vol.96, no.12, pp.2348-2362, 2018.
[22] E. Mangalova and O. Shesterneva, “K-nearest neighbors for
GEFCom2014 probabi listic wind power forecasting,Int. J. of Forecast.,
vol. 32, no. 3, pp. 1067-1073, 2016.
[23] H. S. Dhiman, D. Deb, and J. M. Guerrero, “Hybrid machine intelligent
SVR variants for wind forecasting and ramp events,Renew. Sustain.
Energy Rev., vol. 108, pp. 369-379, 2019.
[24] M. Landry, T.P. Erlinger, D. Patschke, and C. Varrichio, “Probabilistic
gradient boosting machines for GEFCom2014 wind forecasting,Int. J.
Forecast., vol. 32, no. 3, pp. 1061-1066, 2016.
[25] Y. Zhao, L. Ye, P. Pinson, Y. Tang, P. Lu, Correlation-constrained and
sparsity-controlled vector autoregressive model for spatio-temporal wind
power forecasting, IEEE Trans. Power Syst., vol.33, no. 5, pp.5029-
5040, 2018.,
[26] J. W. Messner, P. Pinson, Online adaptive lasso estimation in vector
autoregressive models for high dimensional wind power forecasting,Int.
J. Forecast., vol.35, no. 4, pp.1485-1498, 2019
[27] L. Cavalcante, J. B. Ricardo, R. Marisa, and J. Browell, LASSO vector
autoregression structures for very shortterm wind power forecasting,
Wind Energy, vol.20, no. 4, pp. 657-675, 2017.
[28] H. Quan, D. Srinivasan, and A. Khosravi, Short-Term Load and Wind
Power Forecasting Using Neural Network-Based Prediction Intervals,
IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 2, pp. 303-315, Feb.
2014.
[29] C. Wan, Z. Xu, P. Pinson, Z. Y. Dong, and K. P. Wong, Optimal
Prediction Intervals of Wind Power Generation, IEEE Trans. Power
Syst., vol. 29, no. 3, pp. 1166-1174, May 2014.
[30] Z. Shi, H. Liang, and V. Dinavahi, Direct Interval Forecast of Uncertain
Wind Power Based on Recurrent Neural Networks,IEEE Trans. Sustain.
Energy, vol. 9, no. 3, pp. 1177-1187, Jul. 2018.
[31] Y. Lin, M. Yang, C. Wan, J. Wang, and Y. Song, A Multi-Model
Combination Approach for Probabilistic Wind Power Forecasting,
IEEE Trans. Sustain. Energy., vol. 10, no. 1, pp. 226-237, Jan. 2019.
[32] T. Li, Y. Wang, and N. Zhang, “Combining Probability Density
Forecasts for Power Electrical Loads,IEEE Trans. Smart Grid, vol. 11,
no. 2, pp. 1679-1690, Mar. 2020.
[33] H. Z. Wang, G. B. Wang, G. Q. Li, J. C. Peng, and Y. T. Liu, Deep
belief network based deterministic and probabilistic wind speed
forecasting approach,Appl. Energy, vol. 182, pp. 80-93, 2016.
[34] C. Zhang, C. L. P. Chen, M. Gan, and L. Chen, Predictive Deep
Boltzmann Machine for Multiperiod Wind Speed Forecasting, IEEE
Trans. on Sustain. Energy, vol. 6, no. 4, pp. 1416-1425, Oct. 2015.
[35] J. Yan, H. Zhang, Y. Liu, S. Han, L. Li, and Z. Lu, Forecasting the High
Penetration of Wind Power on Multiple Scales Using Multi-to-Multi
Mapping,IEEE Trans. on Power Syst., vol. 33, no. 3, pp. 3276-3284,
May 2018.
[36] A. Banik, C. Behera, T. V. Sarathkumar and A. K. Goswami, “Uncertain
wind power forecasting using LSTM-based prediction interval, IET
Renew. Power Gene., vol. 14, no. 14, pp. 2657-2667, Oct. 2020.
[37] C. Li, G. Tang, X. Xue, A. Saeed, and X. Hu, Short-Term Wind Speed
Interval Prediction Based on Ensemble GRU Model, IEEE Trans.
Sustain. Energy, vol. 11, no. 3, pp. 1370-1380, Jul. 2020.
[38] H. Wang, G. Li, G. Wang, J. Peng, H. Jiang, and Y. Liu, Deep learning
based ensemble approach for probabilistic wind power forecasting,
Appl. Energy, vol. 188, pp. 56-70, 2017.
[39] Y. Hong, C Lian, and P. P. Rioflorido, A hybrid deep learning-based
neural network for 24-h ahead wind power forecasting, Appl. Energy,
vol. 250, pp. 530-539, 2019.
[40] Borovykh A, Bohte S, Oosterlee C W. Conditional Time Series
Forecasting with Convolutional Neural Networks[J]. arXiv, 2017,
[Online] Available: https://arxiv.org/abs/1703.04691.
[41] P. Kou, C. Wang, D. Liang, S. Cheng, and L. Gao, Deep learning
approach for wind speed forecasts at turbine locations in a wind farm,
IET Renew. Power Gene., vol. 14, no. 13, pp. 2416-2428, Oct. 2020.
[42] Y. Yu, X. Han, M. Yang, and J. Yang, Probabilistic Prediction of
Regional Wind Power Based on Spatiotemporal Quantile Regression,
IEEE Trans. Indust. Appl., vol. 56, no. 6, pp. 6117-6127, Dec. 2020.
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
12
[43] B. J. Ricardo, M. Corinna, and F. Vanessa, et al., “Towards Improved
Understanding of the Applicability of Uncertainty Forecasts in the
Electric Power Industry,” Energies, vol. 10, no. 9, 2017.
[44] J. W. Messner, P. Pinson, J. Browel, M. B. Bjerregård, I. Schicker,
Evaluation of Wind Power Forecasts an up-to-Date View, Wind
Energy, vol. 23, no. 6, pp.14611481, 2020.
[45] I. Sutskever, O. Vinyals, and Q. V. Le., Sequence to Sequence Learning
with Neural Networks,Proceed. The 27th Int. Conf. NIPS, pp. 310412,
2014.
[46] G, Gregor. The State-Of-The-Art in Short-Term Prediction of Wind
Power. A Literature Overview,National Laboratory, Denmark, Aug.
2003.
[47] S. S. Soman, H. Zareipour, O. Malik and P. Mandal, A review of wind
power and wind speed forecasting methods with different time horizons,
North American Power Symposium 2010, Arlington, TX, USA, 2010,
pp. 1-8.
[48] Y. Dauphin, A. Fan, M. Auli, and D. Grangier, Language Modeling
with Gated Convolutional Networks, In Proceed. the 34th Int. Conf.
Machine Learning, 2017.
[49] H. Kaiming, X. Zhang, S. Ren, and J. Sun, Identity Mappings in Deep
Residual Networks,Euro. Conf. Computer Vision, pp. 63045. 2016.
[50] B. J. Lei, J. R. Kiros, G. E. Hinton, “Layer Normalization,2016, [Online]
Available: https://arxiv.org/pdf/1607.06450v1.pdf
[51] V. Ashish, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
L. Kaiser, and I. Polosukhin, “Attention Is All You Need,Proceed. the
31st Int. Conf. NIPS, pp. 59986008, 2017.
[52] H. Zhang, Y. Liu, J. Yan, S. Han, L. Li, and Q. Long, Improved Deep
Mixture Density Network for Regional Wind Power Probabilistic
Forecasting,IEEE Trans. Power Syst., vol. 35, no. 4, pp. 2549-2560,
Jul. 2020.
[53] [51] H. Tao, P. Pinson, S. Fan, H. Zareipour, A. Troccoli, and R. J.
Hyndman, Probabilistic Energy Forecasting: Global Energy
Forecasting Competition 2014 and Beyond. Int. J. Forecast., vol. 32,
no. 3, pp. 896913, 2016.
[54] D. Salinas, V. Flunkert, and J. Gasthaus, DeepAR: Probabilistic
Forecasting with Autoregressive Recurrent Networks,Int. J. Forecast.,
vol. 36, no. 3, pp. 11811191. 2020.
[55] R. Wen, K. Torkkola, B. Narayanaswamy, and D. Madeka, A Multi-
Horizon Quantile Recurrent Forecaster, 2017, [Online] Available:
https://arxiv.org/pdf/1711.11053.pdf
[56] A. Alexandrov, K. Benidis, and M. B. Schneider, et al., GluonTS:
Probabilistic Time Series Models in Python,2019, [Online] Available:
https://arxiv.org/pdf/1906.05264.pdf
[57] A. Paszke, S. Gross, S. Chintala, et al., “Automatic differentiation in
PyTorch,NIPS 2017 Workshop Autodiff, Oct. 2017.
[58] F. Pedregosa, G. Varoquaux, A. Gramfort, et al, Scikit-Learn: Machine
Learning in Python,J. Machine. Learn. Res., vol. 12, no. 85, pp. 2825
2830, 2011.
[59] G. Ke, Q. Meng, T. Finley, et al., LightGBM: A Highly Efficient
Gradient Boosting Decision Tree,Proceed. the 31st Int. Conf. NIPS, vol.
30, pp. 31493157, 2017.
[60] H. Hans, “Decomposition of the Continuous Ranked Probability Score
for Ensemble Prediction Systems,Weather and Forecasting, vol. 15, no.
5, pp.559570, 2000.
[61] C. Wan, Z. Xu, P. Pinson, Z. Y. Dong and K. P. Wong, Probabilistic
Forecasting of Wind Power Generation Using Extreme Learning
Machine, IEEE Trans Power Syst., vol. 29, no. 3, pp. 1033-1044, May
2014.
VII. APPENDIX
A. The ACE results
Fig. 14 The Averaged Coverage Error for the 3 studied wind farms.
B. The Temporal Error Pattern of Multi-source NWP for other two wind farms
Fig.15. The temporal error pattern of multi-source NWP wind speed (Case 1 and Case 3)
(Four NWP sources and two kinds of time series prediction methods are studied. The one-year dataset is used.)
(a)
(b)
(c)
... Temporal convolutional network (TCN) performs superior to traditional CNN in time series. In Zhang et al., 26 TCN has been well introduced for WPP. Recently, hybrid neural networks like CNN-LSTM and TCN-LSTM increasingly exhibit better performance than traditional models. ...
Article
Full-text available
Wind power prediction for newly built wind farms is usually faced with the problem of no sufficient historical data. To efficiently extract the useful features from related wind farms, a novel transfer learning method based on temporal convolutional network (TCN)‐Bi‐long short‐term memory (LSTM) with dynamic loss weights is proposed. Firstly, a novel multi‐task TCN‐Bi‐LSTM model is designed to extract common features. The separate TCNs, and common Bi‐LSTM layers of the proposed model are designed to extract the temporal features from related wind farms. Secondly, in the pre‐training stage, to optimize the training process of the neural networks, a dynamic loss‐weighting strategy is proposed for multi‐task learning (MTL) to select the most related features, which increase the prediction accuracy by providing a suitable optimization object. Thirdly, the multi‐task TCN‐Bi‐LSTM model is re‐trained based on the samples from the target wind farm. Finally, a dataset of seven wind farms was employed to evaluate the efficiency of the proposed MTL structure and the dynamic loss‐weighting strategy. The result shows that the root mean squared error of the 12‐h short‐term prediction can be decreased by 4.19% compared with the traditional single‐task learning model, which verifies the validity of the proposed multi‐task transfer learning method.
Article
Full-text available
Wind energy plays a critical role in the transition towards renewable energy sources. However, the uncertainty and variability of wind can impede its full potential and the necessary growth of wind power capacity. To mitigate these challenges, wind power forecasting methods are employed for applications in power management, electricity trading, or maintenance scheduling. In this work, we present, evaluate, and compare four machine learning-based wind power forecasting models. Our models correct and improve 48-hour forecasts extracted from a numerical weather prediction (NWP) model. The models are evaluated on datasets from a wind park comprising 65 wind turbines. The best improvement in forecasting error and mean bias was achieved by a convolutional neural network, reducing the average NRMSE down to 22%, coupled with a significant reduction in mean bias, compared to a NRMSE of 35% from the strongly biased baseline model using uncorrected NWP forecasts. Our findings further indicate that changes to neural network architectures play a minor role in affecting the forecasting performance, and that future research should rather investigate changes in the model pipeline. Moreover, we introduce a continuous learning strategy, which is shown to achieve the highest forecasting performance improvements when new data is made available.
Article
This paper studies an adaptive approach for probabilistic wind power forecasting (WPF) including offline and online learning procedures. In the offline learning stage, a base forecast model is trained via inner and outer loop updates of meta-learning, which endows the base forecast model with excellent adaptability to different forecast tasks, i.e., probabilistic WPF with different lead times or locations. In the online learning stage, the base forecast model is applied to online forecasting combined with incremental learning techniques. On this basis, the online forecast takes full advantage of recent information and the adaptability of the base forecast model. Two applications are developed based on our proposed approach concerning forecasting with different lead times (temporal adaptation) and forecasting for newly established wind farms (spatial adaptation), respectively. Numerical tests were conducted on real-world wind power data sets. Simulation results validate the advantages in adaptivity of the proposed methods compared with existing alternatives.
Conference Paper
The accurate identification of offshore wind power ramp events has great effects on wind power forecast. In order to improve the prediction accuracy of offshore wind power, this paper proposes an XGBoost-GRU combined forecasting model considering the number of climbing features. Firstly, the adaptive revolving door algorithm is used to identify the wind power climbing event, as well as data compression and feature extraction. Then, the XGBoost decision tree and gating loop unit are used to make preliminary power prediction. In case studies, the results are weighted and combined in detail. It is proved that the proposed model has a terrific performance on the offshore wind power prediction.
Article
Full-text available
Estimating prediction intervals (PIs) is an efficient and reliable way of capturing the uncertainties associated with wind power forecasting. In this study, a state of the art recurrent neural network (RNN) known as long short-term memory (LSTM) is used to produce reliable PIs for one-hour ahead wind power uncertainty forecast using the non-parametric lower upper bound estimation framework. Two realistic hourly stamped wind power data sets are obtained and by using mutual information and false nearest neighbours techniques, the data are made suitable for model inputs. A novel comprehensive objective function consisting of the coverage probability, the average width of the PIs, symmetricity and variational synchronicity is developed to train the LSTM model using intelligent optimisation techniques. The standard of the PIs generated for the test set as well as for different seasons are evaluated based on the indices used to design the objective function for model training, with one of them being modified. The performance of the proposed LSTM model is found to outperform typical RNN models like Elman, non-linear auto-regressive with exogenous models and other benchmarking models while tested on the real-world data sets.
Article
Full-text available
In a wind farm, individual turbines disturb the wind field by generating wakes, so wind speeds at various turbine locations are different. From the perspective of wind farm control, there is an interest in dynamic optimization of the power reference for each individual wind turbine, and the wind speed forecast at each turbine location is hence required. This paper develops a joint model of convolutional neural network (CNN) and the gated recurrent units (GRU) to forecast the wind speed at turbine locations. This model employs a two‐layer architecture. At the lower‐layer, the spatial features are automatically extracted by CNN. The extracted spatial features describe the spatial correlations among multiple wind turbines. At the upper‐layer, GRU learns the temporal correlations across the extracted spatial features. This joint model is trained in an integrated manner. A salient characteristic of this model is that it extracts high‐level spatial‐temporal features from wind data. These automatically learnt features capture the spatial‐temporal wind dynamics and interactions in a wind farm, thus being informative and appropriate for the forecasting at specific turbine locations. The simulation on actual data demonstrates the effectiveness of the presented model.
Article
Full-text available
Wind power forecast evaluation is of key importance for forecast provider selection, forecast quality control, and model development. While forecasts are most often evaluated based on squared or absolute errors, these error measures do not always adequately reflect the loss functions and true expectations of the forecast user, neither do they provide enough information for the desired evaluation task. Over the last decade, research in forecast verification has intensified, and a number of verification frameworks and diagnostic tools have been proposed. However, the corresponding literature is generally very technical and most often dedicated to forecast model developers. This can make forecast users struggle to select the most appropriate verification tools for their application while not fully appraising subtleties related to their application and interpretation. This paper revisits the most common verification tools from a forecast user perspective and discusses their suitability for different application examples as well as evaluation setup design and significance of evaluation results.
Article
Full-text available
The unsteady motion of the atmosphere incurs nonlinear and spatiotemporally coupled uncertainties in the wind power prediction (WPP) of multiple wind farms. This brings both opportunities and challenges to wind power probabilistic forecasting (WPPF) of a wind farm cluster or region, particularly when wind power is highly penetrated within the power system. This paper proposes an Improved Deep Mixture Density Network (IDMDN) for short-term WPPF of multiple wind farms and the entire region. In this respect, a deep multi-to-multi (m2m) mapping Neural Network model, which adopts the beta kernel as the mixture component to avoid the density leakage problem, is established to produce probabilistic forecasts in an end-to-end manner. A novel modified activation function and several general training procedures are then introduced to overcome the unstable behavior and NaN (Not a Number) loss issues of the beta kernel function. Verification of IDMDN is based on an open-source dataset collected from seven wind farms, and comparison results show that the proposed model improves the WPPF performance at both wind farm and regional levels. Furthermore, a laconic and accurate probabilistic expression of predicted power at each time step is produced by the proposed model.
Article
Full-text available
Researchers have proposed various probabilistic load forecasting models in the form of quantiles, densities, or intervals to describe the uncertainties of future energy demand. Density forecasts can provide more uncertainty information compared with quantile and interval. This paper proposes a novel and easily-implemented approach to combine density probabilistic load forecasts to further improve the performance of the final probabilistic forecasts. The combination problem is formulated as an optimization problem to minimize the continuous ranked probability score of the combined model by searching the weights of different individual methods. Under Gaussian mixture distribution assumption of the density forecasts, the problem is cast to a linearly constrained quadratic programming problem and can be solved efficiently. Case studies on the electric load datasets of eight areas verify the effectiveness of our method.
Article
Full-text available
Wind speed forecasting is important for high-efficiency utilisation of wind energy and management of grid-connected power systems. Due to the noise, instability and irregularity of atmosphere system, the current models based on raw historical data have encountered many problems. In this study, a deep novel feature extraction approach is developed based on stacked denoising autoencoders and batch normalisation. Then the deep features extracted from raw historical data are fed to long short-term memory (LSTM) neural networks for prediction. Meanwhile, density-based spatial clustering of applications with noise is employed to process the numerical weather prediction data. By picking out the abnormal samples, the representative training samples are selected to improve the efficiency of the model. For illustration and verification purposes, the proposed model is used to predict the wind speed of Wind Atlas for South Africa (WASA). Empirical results show that deep feature extraction can improve the forecasting accuracy of LSTM 49% than feature selection, indicating that proper feature extraction is crucial to wind speed forecasting. And the proposed model outperforms other benchmark methods at least 17%. Hence, the proposed model is promising for wind speed forecasting.
Article
Different from power prediction for a single wind farm, the regional wind power prediction is to predict the total power of multiple wind farms located in the specific region. The regional wind power prediction involves more data which implicate abundant information on spatiotemporal correlations and nonlinearity. So that addressing the massive data and extracting representative features become the crucial issues to construct an effective regional wind power prediction model. This paper proposes a spatiotemporal quantile regression (SQR) algorithm to perform short-term nonparametric probabilistic prediction of regional wind power, incorporating the advantages of the hybrid neural network (HNN) and quantile regression (QR). In the approach, the high dimensional input data are reorganized into a feature graph that is ready for feature extraction by the HNN. Therefore, the advantages of HNN can be utilized to extract the representative features and construct nonlinear regression models. Meanwhile, by following the QR rules, the model obtains quantiles and perform probabilistic prediction. By properly addressing the explanatory variable selection issue, the approach provides a specific solution for regional wind power probabilistic prediction with the massive input data. Test results on a region with 10 wind farms demonstrate the effectiveness of the proposed approach.
Article
Wind speed and power forecast is an essential component to ensure grid stability and reliability. The traditional forecasting methods fail to address the non-linearity in the wind speed time-series, thus paving way for machine intelligent algorithms. This paper discusses a hybrid machine intelligent wind forecasting model utilizing different variants of Support Vector Regression (SVR) built on wavelet transform. Various performance indices are evaluated to identify the possible best one among four different machine learning regressors for wind forecasting application. Apart from standard ε-SVR and LS-SVR, two new regression models, namely, ε-Twin Support vector regression (ε-TSVR) and Twin Support vector regression (TSVR) are used to forecast short-term wind speed and are compared with Persistence model for four wind farm sites. The effect of the larger dataset on forecasting performance is evaluated for two wind farm sites from USA and India. Further, wind power ramp events are investigated at different hub heights and the forecasting performance of different variants of SVR is compared for five wind farm sites.
Article
Wind speed interval prediction plays an increasingly important role in wind power production. The intermittent and fluctuant characteristics of wind power make high-quality prediction interval (PI) challenging. In this paper, a novel hybrid model based on gated recurrent unit (GRU) neural network and variational mode decomposition (VMD) is proposed for the wind speed interval prediction. At first, VMD is employed to decompose the complex wind speed time series into simplified modes, interval prediction model (IPM) and point prediction model (PPM) based on GRU are designed to conduct interval prediction on primary mode and point prediction on the rest modes before a composition and construction of the prediction interval. Then, error prediction model based on GRU is proposed to enhance the model performance by error correction. Eight cases from two wind fields are used to test and verify the proposed method. The results indicate that the proposed method is a high qualified method which is in possession of much higher PI coverage probability and narrower PI width.
Article
Wind power generation is always associated with uncertainties as a result of fluctuations of wind speed. Accurate predictions of wind power generation are important for the efficient operation of power systems. This paper presents a hybrid deep learning neural network for 24 h-ahead wind power generation forecasting. This novel method is based on a Convolutional Neural Network (CNN) that is cascaded with a Radial Basis Function Neural Network (RBFNN) with a double Gaussian function (DGF) as its activation function. The CNN is utilized to extract wind power characteristics by convolution, kernel and pooling operations. The supervised RBFNN, incorporating a DGF, deals with uncertain characteristics. Realistic wind power generations, measured on a wind farm, were used in simulations. The proposed method is implemented using TensorFlow and Keras Library. Comparative studies of different approaches are shown. Simulation results reveal that the proposed method is more accurate than traditional methods for 24 h-ahead wind power forecasting.