ArticlePDF Available

Multi-Source and Temporal Attention Network for Probabilistic Wind Power Prediction

June 2021
IEEE Transactions on Sustainable Energy PP(99):1-1

June 2021
PP(99):1-1

DOI:10.1109/TSTE.2021.3086851

Authors:

Jie Yan

North China Electric Power University

Liu Yongqian

North China Electric Power University

Show all 6 authorsHide

The temporal dependencies of wind power are significant to be involved in the modeling of short-term wind power forecasts. However, different time series inputs will contribute differently to the forecasting performance and bring in challenges to the selection of the relevant driving information. In this paper, a Multi-Source and Temporal Attention Network (MSTAN) is proposed for short-term wind power probabilistic prediction. The MSTAN model introduces multi-source NWP and makes three specific designs to improve prediction performance. Firstly, a novel multi-source variable attention module is proposed to select the driving variables of NWP. Secondly, a temporal attention module is used to capture the implicit temporal dependency hidden in the historical measurements and multi-source NWP sequence. Thirdly, the residual module is wrapped in MSTAN to skip some unnecessary nonlinear transformations and provide adaptive complexity to the entire model. After training, multi-horizon density forecasts for the next 48 hours are yielded by MSTAN. The MSTAN is compared with state-of-the-art machine learning schemes in the wind power forecasting system using the operation data from 3 wind farms. We demonstrate that MSTAN outperforms other counterparts on both deterministic and probabilistic prediction. The structure design scheme of MSTAN has been proven effective.

Content uploaded by Jie Yan

Content may be subject to copyright.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

Abstract—The temporal dependencies of wind power are

significant to be involved in the modeling of short-term wind

power forecasts. However, different time series inputs will

contribute differently to the forecasting performance and bring in

challenges to the selection of the relevant driving information. In

this paper, a Multi-Source and Temporal Attention Network

(MSTAN) is proposed for short-term wind power probabilistic

prediction. The MSTAN model introduces multi-source NWP and

makes three specific designs to improve prediction performance.

Firstly, a novel multi-source variable attention module is proposed

to select the driving variables of NWP. Secondly, a temporal

attention module is used to capture the implicit temporal

dependency hidden in the historical measurements and multi-

source NWP sequence. Thirdly, the residual module is wrapped in

MSTAN to skip some unnecessary nonlinear transformations and

provide adaptive complexity to the entire model. After training,

multi-horizon density forecasts for the next 48 hours are yielded

by MSTAN. The MSTAN is compared with state-of-the-art

machine learning schemes in the wind power forecasting system

using the operation data from 3 wind farms. We demonstrate that

MSTAN outperforms other counterparts on both deterministic

and probabilistic prediction. The structure design scheme of

MSTAN has been proven effective.

Index Terms—wind power probabilistic prediction, multi-step

prediction, multi-source NWP, variable attention, attention

mechanism, residual connection, mixture density. 1

I. INTRODUCTION

With the integration of high penetration wind energy, wind

power uncertainties are much concerned to achieve reliable and

economical power system operation and planning. Wind power

probabilistic forecasting (WPPF) provides detailed uncertainty

information, allowing system operators and electricity traders

to make better decisions in the process of reserve setting, unit

commitment, electricity trading, and so on [1]. However,

accurate short-term WPPF is still challenging due to the

inherent randomness of the wind resource [2].

Up to now, the realization of high accuracy WPPF mainly

relies on the following two critical technical routes:

(1) Better Data Inputs and Feature Engineering. Some

novel features and efficient data preprocessing methods will

reduce the difficulty of modeling and improve prediction

accuracy. Therefore, the introduction, construction, and

selection of target-related features have been widely adopted in

WPF research. (i) Some novel features are introduced into the

WPF model to reduce the prediction uncertainty. For instance,

multi-sites Numerical Weather Prediction (NWP) [3] and

ensemble NWP [4] have been used to enrich the input

This work was supported by the National Natural Science Foundation of China (U1765104), and

North China Electric Power University International Joint Training Graduate Program.

Hao ZHANG is with School of Renewable Energy, North China Electric Power University,

Beijing, China. (E-mail: zhanghaoncepu@163.com)

information of the WPPF model. Historical measurements and

NWP data [5-7] are simultaneously used as model input data to

improve very short-term forecasts and short-term forecasts.

However, how to dynamically balance the relative importance

between historically observed values and NWP values at

different prediction horizons is seldom discussed, leading to

underutilization of input data. Furthermore, off-site information

[8,9] and geospatial information [10] are introduced to provide

more spatial features. (ii) Besides, constructing highly target-

related driving features is also an effective way to improve

prediction accuracy. Many feature construction schemes are

proposed to enrich feature inputs of wind power forecasting.

The frequently used manual features include the multiple time

steps average feature, polynomial feature [11], clustering

feature [12], wavelet decomposition feature [13], dimension-

reducing features [14], unsupervised features [15]. (iii) When

the features are redundant and noisy, the features need to be

selected to reduce the influence of irrelevant features and noise.

Feature selection is often used to reduce the model complexity

and avoid the curse of dimensionality. The classical feature

selection methods include Filter, Wrapper, and Embedded

method. For example, the Filter selection based on mutual

information and the Embedded Selection based on the tree

method are proposed in [16, 17]. An embedded selection based

on Automatic Relevance Determination is presented in [7]. The

traditional feature selection methods only pick out the

important features globally, making the selected features not

suitable for every time slot.

(2) Accurate and Flexible Probabilistic Prediction Models.

Generally, short-term wind farm power prediction can be

divided into physical models [18,19], statistic models [20, 21]

and Machine Learning models. In recent years, Machine

Learning models, which could efficiently provide interval,

quantile, probability density, and scenario prediction results,

have gradually become the mainstream of short-term WPPF.

Machine Learning models have two categories: conventional

Machine Learning models and Deep Learning models. (i) From

the perspective of the conventional Machine Learning models,

there have been proposed many modes for short-term WPPF,

such as K-Nearest Neighbors（KNN）[22], Support Vector

Machine(SVM) [23], Gaussian Process (GP) [17], Tree-based

model [24], Bayes Learning [5], Autoregressive-based

models[25-27], shallow Artificial Neural Network (ANN) [28-

30] and Ensemble model [31,32]. Since the conventional ML

models cannot automatically extract deep-level features,

achieving high-accuracy wind power prediction often requires

detailed and specialized feature engineering. (ii) Deep Learning

Jie YAN and Yongqian Liu are the corresponding authors, with School of Renewable Energy,

North China Electric Power University, Beijing, China. (E-mail: yanjie@ncepu.edu.cn)

Yongqi GAO is with the Nansen Environmental and Remote Sensing Center, the University of

Bergen, Bergen, Norway.

Multi-Source and Temporal Attention Network for

Probabilistic Wind Power Prediction

Hao ZHANG, Jie YAN*, Member, IEEE, Yongqian LIU, Yongqi GAO, Shuang HAN, Li LI

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

models have strong nonlinear fitting capabilities and flexible

network structures, which can be regarded as competitive data-

driven solutions for WPF. Deep Learning models used for short

WPF can be divided into the following categories: Dense, RNN,

CNN, GCN. Early DL methods used for WPF are mostly

densely connected networks, such as Deep Belief Network [33],

Deep Boltzmann Machine [34], and DAE [35]. All three models

mentioned above have unsupervised training and a fine-tuning

process. Due to the limitation of the network structure, the

Dense models have some defects in modeling the dependence

of spatiotemporal data. Recurrent Neural Network (RNN),

including LSTM [36] and GRU [37], are gradually used to

multi-step wind power prediction, and the temporal dependence

of wind power was better learned in RNNs. Convolution Neural

Network (CNN) models, including 1D-CNN, 2D-CNN, and

TCN models, also have locally temporal dependency learning

capabilities [38-40]. For instance, 2D CNN is used to establish

the spatial dependence of regular grid data [41]. The

combination of CNN and LSTM is hired to capture the

spatiotemporal relationship in wind farms or wind farm clusters

[42]. GCN extends the convolution operation to the Non-

Euclidean domain. It shows better adaptability and higher

efficiency than CNN in the wind power prediction task [9].

However, several problems have not been carefully

addressed in earlier studies due to the limitations of the

prediction model structure and the variety of input data. (I)

Single-source ensemble NWP provided by a weather forecast

institution with different initial conditions and parameterization

schemes has been widely used before. However, the multi-

source NWP from diverse weather forecast providers has rarely

been considered in short-term WPF. Due to the limitations of

observations available for assimilation, computing resources

and engineering experience, one weather forecast provider

cannot guarantee that single-source NWP is accurate enough in

all regions and weather conditions. It is necessary to consider a

multi-source NWP scheme to reduce the risk of wind farm

power prediction [43]. (II) The relative importance of different

NWP features is changing dynamically. However, traditional

feature selection methods cannot pick out the dominant features

step by step, which leads to the input features of some time slots

are not optimal. For instance, a global optimal feature set  is

selected by traditional feature selection methods from the entire

NWP dataset. However, there may be an optimal feature set 

more suitable for a specific time slot. (III) The most existing

CNN and RNN based WPPF models have difficulties in

modeling the long-term temporal dependency hidden in the

observed sequence and the NWP sequence. When the

concerned temporal dependency is over a long-time window,

RNN models will suffer from the gradient vanish problem and

parallelization difficulties. CNN models focus more on the local

pattern, and it needs more layers and specific layer design to

obtain long-term temporal dependencies. (IV) The temporal-

based Deep Learning models require time sequences as inputs.

Thus, the short-term WPF training set is generally small under

the limitation of the NWP access times. If a short-term WPF

system obtains the NWP data once a day, the daily received

NWP sequence might only provide one sample for the model

training. Deep learning models have strong fitting capabilities

but also prone to overfitting, especially when the data set is

small. How to avoid the overfitting problem under data sets

with different sizes is rarely discussed in short-term WPPF.

In this paper, a Multi-Source and Temporal Attention

Network (MSTAN) is proposed for multi-step short-term

WPPF. The MSTAN uses the multi-source NWP from diverse

weather forecast providers and the historical observations as the

model inputs, the density forecasts for the next 48 hours as the

model outputs. Compared with previous short-term power

prediction studies, this paper has the following contributions:

 Multi-source NWP is used in WPPF, and its long-term

temporal error pattern is discussed.

 A novel multi-source variable attention module/layer is

designed to extract important variables from multi-source

NWP dynamically. Compared with the general feature

selection schemes [15,16,17], the multi-source variable

attention module makes the specific selection on every

single step.

 The temporal dependency in the wind power sequence is

learned by a novel temporal attention layer. Compared

with RNN [36-37] and CNN [38-40] models, the proposed

temporal attention module is more effective in capturing

the long-term dependencies.

 For avoiding the overfitting problem, a residual module

constructed by skip connection, gating mechanism, and

layer normalization is used to control the extent of

nonlinear transformation and reduce unnecessary

nonlinear transformation. Compared with some DL-based

WPPF models proposed in the literature [33-42], the

residual module could make the MSTAN model more

stable and adaptive.

This paper is organized as follows: Section II describes the

advantage of multi-source NWP and discusses the temporal

error pattern hidden in multi-source NWP. Section III defines

the multi-horizon wind power probabilistic prediction problem

and formally introduces the proposed MSTAN model. A case

study over three wind farm data is presented in Section IV.

Section V gives the conclusions and future works.

II. MULTI-SOURCE NWP AND ITS TEMPORAL ERROR PATTERN

A. Multi-source NWP

NWP models adopted by weather forecast providers can

differ in many aspects, such as spatial and temporal resolution,

observations available for assimilation and the specific

assimilation scheme, parameterization of physical process, and

other factors [44]. When the observational data for assimilation,

computing resources, and engineering experiences are limited,

single-source NWP products are likely to perform poorly in

some regions and weather conditions, which brings significant

risks to short-term wind power forecasts. As shown in Table 1,

the annual Root Mean Square Error (RMSE) of the multi-source

NWP wind speed for 10 wind farms is counted. No single NWP

source can achieve the lowest error index in ten wind farms at

the same time. Even the best NWP source only can achieve the

smallest RMSE in 6 of 10 wind farms. If a single NWP source

is used in a WPF system that serves many wind farms, it will

bring prediction risks to some of the wind farms it serves.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

On the contrary, multi-source NWP comes from different

weather forecast providers with varying settings of prediction.

Each forecast provider has their advantage and disadvantage in

NWP models, observations, parameterization schemes and

computing resources, etc. Therefore, multi-source NWP, which

integrates multiple advantages of different NWP, is more likely

to achieve low prediction risk and reliable accuracy. Significant

benefits will be achieved by using the multi-source NWP

scheme in real WPF projects. In some regions of China, the

accuracy of the WPF will directly affect the wind power

integration priority and the benefits of wind farms. The top

wind farms of the prediction accuracy ranking will be rewarded,

and the bottom wind farms of the prediction accuracy ranking

will be punished. Take a wind farm located in North China as

an example. The electricity price is about 0.078 $/kw · h, the

regional wind curtailment rate of 2019 is about 7.1%, the wind

farm capacity is 100 MW, running hours are about 250 per

month. If the wind farm is punished by an average curtailment

rate, the lost revenue will be 100000 250 0.078 7.1% 

1.3845 10 $ per month. The penalty of the wind farm will

far exceed the cost of purchasing multi-source NWPs. Wind

energy companies would be delighted to use multi-source NWP

to promote the WPF accuracy and the grid integration priority.

TABLE 1 THE ANNUAL WIND SPEED RMSE OF 4 NWP SOURCES IN 10 WIND

FARMS

Wind Farm

Source1

Source2

Source3

（

ensemble

）

Source4

2.69

3.06

2.76

3.10

2.31

2.66

2.05

3.18

1.91

2.36

1.68

3.19

1.95

2.67

1.69

2.87

3.03

2.94

2.93

2.74

2.23

2.24

1.96

2.61

2.00

2.37

1.83

2.21

2.13

2.30

—

2.33

2.00

2.37

1.83

2.21

2.06

2.23

2.11

2.44

The annual wind speed RMSE of 4 NWP sources are counted by one-year data.

The 3rd source NWP in wind farm 8 is far less the one year. Therefore, the

statistical error of NWP source 3 from wind farm 8 is ignored.

B. Temporal error pattern of multi-source NWP

It is founded from the wind speed prediction error that the

multi-source NWP wind speed has its specific Temporal Error

Pattern (TEP). The TEP of multi-source NWP wind speed is

illustrated in Figure 1. This figure shows the annual averaged

0-47th hour wind speed RMSE of four NWP sources and two

multi-step time series methods (Persistence and Seq2Seq [45]).

Three major characteristics are shown in this figure.

Firstly, multi-step wind speed prediction results are more

accurate than the NWP wind speed from 0 to 6th hour. However,

NWP wind speeds are better than multi-step wind speed

predictions within the 6th to 48th hour. Similar conclusions are

also reported by some literature [46,47]. It means that historical

information (Measurements) and future information (NWP)

both have significant value for highly accurate short-term wind

power forecasts. Short-term forecasts and long-term forecasts

should respectively focus on the historical measurements and

the weather forecasts.

Secondly, the hourly wind speed errors of each NWP source

show a 24-hour cyclical trend. In a day, the NWP wind speed

error first decreases and then increases. Moreover, the NWP

wind speed errors from 24th to 48th hour are slightly higher than

the NWP wind speed errors from 0th to 24th hour. This trend

represents the TEP of multi-source NWP span a long-time

window.

Thirdly, around the 12th and the 36th hour, the wind speed

errors of four NWP sources are close to each other. The wind

speed errors at other time slots are much different. This trend

indicates that the relative wind speed prediction accuracy of

four NWP sources is changing dynamically. In other words, the

prediction model should dynamically pay attention to four

NWP sources.

Fig.1. The Temporal Error Pattern of multi-source NWP wind speed (wind farm

9). Four NWP sources and two kinds of time series prediction methods are

studied in this figure. A one-year dataset is used.

III. MULTI-SOURCE AND TEMPORAL ATTENTION

NETWORK

Introducing multi-source NWP and considering the TEP

hidden in the multi-source NWP is a promising way to improve

the accuracy of the WPPF models. Therefore, a deep learning

based WPPF model, called Multi-Source and Temporal

Attention Network (MSTAN), is proposed in this paper. Four

critical modules/layers are used in MSTAN.

 Multi-source variable attention module. It is designed

to extract the driving variables of multi-source NWP

dynamically. The collinearity problem and the harmful

effects of irrelevant variables and noise are reduced.

 Residual module. It skips some unnecessary nonlinear

transformations and adaptively control the complexity of

the model to reduce the overfitting risk.

 Temporal attention module. It dynamically selects

historical and future information and extracts the long-

term temporal dependency.

 Mixture density module. It outputs the joint probability

density of multi-horizon wind power forecasts.

In this section, the used probabilistic prediction framework is

introduced first. Then four designed modules and the loss

function used in MSTAN are presented. Finally, the overall

structure and the relationship between all modules are clarified.

A. The Probabilistic Prediction Framework

Short-term WPPF aims to establish a multi-horizon

prediction function, which takes historical information (such as

wind speed and power measurements) and future information

(NWP) as inputs, and future wind power distribution as outputs.

More formally, let : =(,…,) be the time-

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

varying covariates that could be deemed as known feature

values, such as NWP and relative time index (hour of the day),

and ,  is the dimension number of . Let the value

of measured wind power at time  by , where : 

(,,…,) and . Similarly, the value of measured

wind speed at time  by , where : (,,…,) and

. 0 is the length of historical measurements,  is the

maximum prediction horizon.

As shown in Figure 2, given the past wind power and wind

speed measurements :, :, the time-varying covariates

: , and model parameters , the short-term wind

power probabilistic forecasting problem is described as ：

(:|:,:,:,)

(1)

Estimating a joint probability density function of future

multi-step wind power values could be considered as a

supervised learning problem, which is solved by minimizing the

discrepancy or maximizing the likelihood between

measurements and model forecasts.

Fig. 2. The structure of MSTAN

B. Multi-source variable attention module

A multi-source variable attention module/layer is designed

for multi-source NWP by combining prior knowledge and

attention mechanism. In terms of prior knowledge, the

theoretical relationship =

 can be easily derived,

where  is the coefficient of performance,  is the air density,

is the rotor swept area,  is the wind velocity. Infer from this

equation, (1) wind speed is the most important variable in wind

power forecasting. (2) The other variables (such as pressure,

temperature, humidity, and wind direction) mediately affect the

wind power outputs.

For reflecting the importance of wind speed variables and

other related variables, the multi-source attention module is

constructed by two sub-modules: 1) the multi-source wind

speed variable attention sub-module, 2) the other variable

attention sub-module. The outputs of the multi-source variable

attention are the concatenation of two sub-modules outputs.

Let the multi-source NWP at time by =

[

,...,,

,...,

],  is the number of wind

speed variables,  is the number of other variables. 



and  both are scalars. Each wind speed variable 

 and other

variable  is transformed into a vector  by a nonlinear Dense

layer.



,=󰇡

,+

,󰇢  1 (2)



,=󰇡,+

,󰇢  1 (3)

where,  is activation function, weight parameters ,

and ,  , bias parameters 

, and



, . 

, , 

, .

The multi-source wind speed variable attention sub-module

performs a weighted summation on all transformed wind speed

vectors 

, . The other variable attention sub-module

performs a weighted summation on all other transformed

vectors 

, . The attention weights of each transformed

vector are determined by  and the Softmax function. The

output of the multi-source variable attention module is the

concatenation of 󰆻

 and 󰆻

.



 =(( +))

(4)

󰆻

 = 

,

,







(5)



 =(( +))

(6)

󰆻

 = 

,

,







(7)



= [󰆻

,󰆻

]

(8)

where 

  and 

  are selection

weights of two sub-module at time  . 󰆻

  and

󰆻

  are the outputs of two sub-modules. The

concatenation of

󰆻

 and 󰆻

 is 

= [󰆻

,󰆻

].

  ,    ,   ,

  are the weight and bias parameters of the

nonlinear Dense layer before the Softmax. The structure of the

multi-source variable attention module is shown in Figure 3.

The inputs  determine the attention weights at time step .

Therefore, the strengthened important variables and the

weakened irrelevant variables at each time step are different.

Such a dynamic attention mechanism can take the temporal

error pattern of multi-source NWP into account.

Fig. 3. The structure of the multi-source variable attention module.

C. Residual module

In real situations, we cannot know in advance which input

variables have a strong correlation with the supervised target. It

is also challenging to determine whether a nonlinear

transformation is required, especially when the training data set

is minimal and noisy.

The residual module/layer adaptively controls the model

complexity and reduces unnecessary nonlinear transformation

to prevent over-fitting. As shown in Figure 4, the used residual

Add Positional Encoding

Layer Norm

LSTM -Encoder

Variable selection

LSTM -Decoder

Variable selection

LSTM -Decoder

…

……

1T0

T0+1 T0+

Past Inputs

•Wind speed measurements

•Wind power measurements

Future Inputs

•NWP

…

Add

Gate

Dense

Layer Norm

LSTM -Encoder

Add

Gate

Dense

…

Dense

Layer Norm

Add

Gate

Layer Norm

Add

Gate

Dense

Layer Norm

Add

Gate

Layer Norm

Add

Gate

Softmax

Dense2

Dense1

Softmax

Dense2

Dense1

Self-Attention

Dense_w

…

Dense_s1



…

Softmax

Dense_o

Dense

…

Dense_





…





Softmax

Concatenate





















 ,1



 ,









 ,1



 ,



> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

module consists of three operations. The Skip Connection

operation[48] can retain the original information. The Gating

Mechanism operation [49] controls the degree of nonlinear

transformation. The Layer Normalization operation [50] is

used to stabilize the output distribution and accelerate the

convergence of the model.

Since the residual module needs to be used together with

other nonlinear layers/modules, the output of the residual

module is affected by the linear path and the nonlinear path.

When the model needs low complexity, the residual module

skips the nonlinear path and is simplified to a linear mapping or

identity mapping. When the model needs high complexity, the

residual module retains most information from the nonlinear

path and outputs the summation from the linear and nonlinear

paths.

Formally, let the residual module receive a tensor  as inputs,

 is the vector of  at time step , . The input

signal  will pass through two paths, the nonlinear gated path

and the linear/identity mapping path. In the nonlinear gated path,

 is transformed to , by the nonlinear transformation

function , ,. In MSTAN, LSTM and self-

attention module are used as the . Then , is controlled by

the Gated Linear Units (GLU) [48] to output ,,,

. In linear/identity mapping path, when ,has the

same shape with , , equals to . When the shape of  and

, are different,  is transformed by linear Dense layer to ,,

, will have the same shape with ,. Finally, the Layer

Normalization module takes the summation of , and , as

inputs and outputs ,, ,.

The nonlinear gated path:

,=,,

(9)

,=,=(,+) (,+)

(10)

The linear or identity mapping path:

,=









+



, 







, =

(11)

The outputs of residual modules:

,=(,+,)

(12)

where , ,,  are the weight parameters,

, , ,  are the bias parameters.  is the elementwise

Hadamard product.

Fig. 4. The structure of the residual module. Left: the residual module with the

linear mapping path. Right: the residual module with the identity mapping path.

D. Temporal Attention module

For capturing the temporal dependency across all the time

steps, a temporal attention module constructed by Encoder-

Decoder, Positional Encoding (PE), and self-attention, is

employed [51].

(1) Encoder-Decoder module receives historical

measurements and the outputs of the multi-source variables

selection module. For enhancing the expression capability of

temporal dependency, two LSTM are used as Encoder-Decoder

module. Let the historical wind speed and wind power

measurements sequence be : and :. The encoder part

takes the : and : as inputs. The decoder part takes the

󰆻: as inputs. At the same time, the residual module is

wrapped on the Encoder-Decoder.

=󰇫





(



, [



,



]) 1 0









,󰆻



 0 + 1 0 +  (13)

=󰇫

(([



,



])+(



) 1    0

(󰆻



+(



) 0 + 1   0 +  (14)

where is the initial state.  is the Layer Normalization. 0

is the encoder length,  is the maximum prediction horizon. The

 is the outputs of the Encoder-Decoder with a residual

module. For the simplicity and convenience, the output

dimension of  is set to _, .

(2) Positional encoding (PE) generates the position

information of each time step. Since self-attention is a global

attention mechanism, the relative position information cannot

be considered when calculating the similarity. Therefore,

position information is added to the input tensor . The

positional information is defined as:

PE(,2i) = sin(



10000







) (15)

PE(,2i + 1) = cos(



10000







) (16)

The inner product between (, : ) and (+, : ) will

decrease with the increase of , so PE indirectly represents the

relative distance between different time steps. The input tensor

of the self-attention layer could be expressed as:

=+PE(, : )

(17)

(3) In simple terms, the essence of the attention mechanism

is to select the information that has a similar relationship with

the query information from a sequence. The self-attention maps

a query representation  to a new representation  by the

weighted sum of the values representation . The attention

weights are determined by the scaled dot-product of query

representation  and key representation .

=(,,) = (





) (18)

In order to strengthen the expressive ability of the attention

mechanism, the Q, K, and V tensors are usually transformed

linearly by ,,,  is the attention

dimension.

=(,,)

(19)

In the practice of the multi-horizon wind power prediction,

the outputs of the self-attention layer are expressed as:







:

=(:



,:



,:



)

(20)

Let the =, the outputs of the temporal attention

module wrapped by the residual module is

󰆻=󰇡+

󰇢 0 + 1  0 +  (21)

where 

 , 󰆻.

E. Mixture density module

Similar to our previous work [52], the mixture density

network (MDN) module is used to model the probability

density of short-term wind power. The hired MDN

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

approximates the distribution of normalized forecasts by

weighting multiple Beta distributions. As shown in Figure 5,

the mixture density module is constructed by two Dense layers

and one Softmax layer. The _ outputs shape parameters

, and the _ outputs shape parameters 

, the Softmax

layer outputs mixing coefficients . The activation function of

 and _ is ReLu. The mixture density module

outputs the , 

,  for each time step in a parameter sharing

manner.

=_(󰆻) = (󰆻+))

(22)



= _(󰆻) = (󰆻+))

(23)

=((󰆻+))

(24)

() = ,(|,,

,)





(25)

where  is the number of components in the MDN, , is the

 mixing coefficient at time step , and ,, 

,

corresponding to the shape parameters of the  component at

time step . ,

,  . The sum of mixing coefficients

is equal to 1, ,



 = 1 . ,,  are

weight parameters, ,, are the bias parameters.

Since the domain of Beta distribution is [0,1], the domain of

mixture density is also [0,1]. The wind power forecast density

could be acquired by multiply the wind power capacity on the

mixture distribution.

Fig. 5. The structure of the mixture density module.

F. Loss Function and Training Method

MSTAN is an end-to-end deep learning model. All

modules/layers of MSTAN could be jointly trained by the

Back-Propagation algorithm and Gradient Descent. Therefore,

the negative log-likelihood function is used as the loss function,

and the Adam optimizer is used as the optimizer. The negative

log-likelihood function is as follows:

(,,

,) = (,(|,,

,)





)





(26)

The trained model outputs the mixture distribution parameters, and

the deterministic forecasts could be calculated by mean value or

median value equation.

The mean value equation of mixture distribution:

=,,

,+

,





(27)

The median value equation of mixture distribution:

,,1/3

,+

,2/3





(28)

G. The Relation between the Modules

In order to state the relationship between the used modules,

the tensor operations in the forward process of MSTAN are

shown in Figure 6. MSTAN takes 3D tensor as the model inputs

and outputs. The shape of past inputs [:,:] is [ B, , 2].

The shape of future inputs : is [ B, , 

+ ]. B is the batch size.

As shown in Figure 6, the multi-source variable attention

module/layer takes : as inputs and 󰆻: as

outputs. The LSTM Encoder that wrapped by residual module

takes [:,:] as inputs and : as outputs. The LSTM

Decoder that wrapped by residual module takes 󰆻: as

inputs and : as outputs. Then : adds position

encoding to itself. The self-attention module that wrapped by

residual module take the : as inputs and 󰆻: as

outputs. The mixture density module takes the 󰆻: as

inputs and :,

:,: as outputs.

Fig. 6. The relation between the used modules.

IV. APPLICATIONS IN THREE WIND FARMS

A. Dataset

Three wind farms located in North China are studied. In each

wind farm, four NWP sources are used. The 3rd NWP is an

ensemble NWP. Each NWP source contains one to two NWP

sites. Case 1, Case 2 and Case 3 correspond to WF 6, WF 9, and

WF2 in Table 1. The NWP data at each site contains several

features: wind speeds (WS), wind direction (WD), relative

humidity (RH), temperature (TMP), air pressure (PRE)). Before

0:00 every day, the next 48 hours of multi-source NWP data

will be received. The multi-source NWP data is accessed once

a day. The measured data includes the wind speed of the wind

tower and the power output of the booster station. The data set

has one year of data (2019-01-01:2019-12-31), and the time

resolution is one hour. The used wind power prediction dataset

comes from a real regional wind power forecasting project. Due

to the confidentiality agreement with the wind farm operator

and NWP provider, the data cannot be open accessed yet.

The entire datasets are divided into training sets and testing

sets by the Date Time. The data from the 1st to the 24th days of

each month is divided into training sets. The data from the 25th

day to the end of each month is divided into testing sets. 20%

of the samples are randomly selected in the training set as the

validation set. This division scheme ensures that the MSTAN is

tested by the data of each month throughout the year.

B. Benchmarks

Two classic technical routes are used to compare with the

proposed model. These two technical routes are widely used by

Softmax

Dense_

Dense_

LSTM_Encoder with

Residual module

LSTM_Decoder with

Residual module

Multi-Source Variable

Selection module

Self-Attention with Residual module

Mixture Density module

TemporalAttention module

Add Positional Encoding

Past inputs: [

:

,

:

]Future inputs: 

:

Shape:[ B, , 2] Shape:[ B, ,



+



]

Shape:[ B, , 



*2]





:



Shape:[ B, 0, 



]







:





Shape:[ B, , 



]





:

Shape:[ B, +, 



]





:

Shape:[ B, ,



]





:

,



:

,



:

Shape:[ B, , ]













:





> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

machine learning competitions and commercial applications.

The first technical route is feature engineering + regressor, and

the second technical route is deep learning.

1) Technical route 1: Feature Engineering + Regressor

The purpose of feature engineering is to alleviate the curse of

dimensionality and reduce the difficulty of learning tasks.

Regarding the winning feature engineering scheme in

GEFcom2012 and 2014 [10, 11, 53], the feature engineering

scheme, including Feature construction, Feature selection,

Normalization, Category Encoder, and Target transformation,

is adopted.

Four algorithms are employed as regressors, including

Ridge regression, Support Vector Regression (SVR), K Nearest

Neighbor Regression (KNNR), and LightGBM. LightGBM

could be deemed as an advanced implementation of GBM and

GBDT used in GEFcom2014[53]. The feature engineering is

used in each regressor. Therefore, four wind power forecasting

pipelines are built by combining feature engineering and

regressor.

In order to obtain the probabilistic results, different strategies

are used. (1) LightGBM can directly provide the quantile

outputs by setting the quantile loss function. Therefore, 19

models are built to obtain the quantile forecasts corresponding

to 0.05:0.05:0.95 quantiles. (2) Ridge, SVR, KNN cannot

directly produce probabilistic prediction results. Therefore, we

count the prediction error distribution of training data in

different power prediction intervals by Kernel Density

Estimation(KDE). The quantile prediction results could be

given through deterministic results and the corresponding error

distribution when testing.

2) Technical route 2: Deep Learning

Three classical deep learning models for multi-horizon wind

power forecasting are selected as the benchmarks, including

Seq2Seq [44], DeepAR [55], MQRNN [55]. These models are

integrated into Amazon’s time series prediction library

GluonTS [56].

Seq2Seq: using two LSTM as the Encoder and Decoder. The

Encoder LSTM takes the historical measurements as inputs.

The Decoder parts take the multi-source NWP and encoder

context as inputs. Seq2Seq gives the quantile forecasts by

optimizing the quantile loss function. The considered quantiles

are set as 0.05:0.05:0.95 (19 quantiles).

DeepAR: The probability density distribution of target

variables is given by the outputs of LSTM. The maximum

gaussian likelihood of the simulated distribution is used as the

optimization objective during the training process. The multi-

step prediction results are obtained step by step through

sampling during the forecasting process. The input of DeepAR

also includes historical measurements and multi-source NWP.

MQRNN encodes historical measurements by LSTM and

then decodes the encoded information and future inputs through

a global MLP branch and a local MLP branch. The multi-source

NWP is used as the future inputs of Global MLP and local MLP.

MQRNN also outputs the quantile values by optimizing the

quantile loss function. The considered quantiles are set as

0.05:0.05:0.95. (19 quantiles).

C. Training Settings and Software

Hyper-parameters of MSTAN are selected by grid search.

The best hyper-parameter settings are presented in Table 2. In

this paper, the Adam optimizer is used. For Adam optimizer,

learning rate (LR) and batch size are the most important

hyperparameters. The LR of Adam is usually in the range of

[0.0001, 0.1], and the default recommended LR is usually 0.001.

In order to find the optimal LR, we set the grid of LR to [0.01,

0.003, 0.001, 0.0003, 0.0001]. Batch size generally cannot be

too small or too large. It is recommended to select from 2, n

∈3, 4, 5, 6, etc. Since one-year training samples less than 300,

the batch size value should be much smaller than 300. Therefore,

we set the grid of batch size to [8, 16, 32, 64]. d_model and

d_attention are model hyperparameters, which determine the

capacity of the model. Since the training data set is small, a

small model capacity is enough. The d_model and d_attention

both set to [8,16,32,64].

All the deep learning schemes used in this paper are

implemented by Pytorch [57]. Feature engineering + regressor

schemes are implemented by Sklearn [58] and LightGBM [59].

TABLE 2 THE HYPER-PARAMETERS OF MSTAN IN CASE TWO

Hyper Parameters

d_model=16, LSTM_layer= 2,

LSTM_hidden_dim = d_model =16,

d_attention=16, m=3, T0=48, =48

Optimizer

Adam, learning rate =0.001, batch=32

Computing Resource

Apple M1

D. Evaluation criterion

The deterministic and probabilistic forecasting performance

is evaluated by several evaluation criteria. Normalized Root

Mean Square Error (NRMSE) and Normalized Mean Absolute

Error (NMAE) are used for deterministic forecasting evaluation.

For probabilistic forecasting evaluation, the stand Quantile

Loss (QL) index [53], Continuous Ranked Probability Score

(CRPS) [60] and the Average Coverage Error (ACE) [61] are

used.

E. Results and Discussions

1) Deterministic results

NRMSE and NMAE results of three cases are presented in

Table.3. The NRMSE and NMAE of the MSTAN are lower

than the NRMSE and NMAE of other counterparts. By

comparing with the best model using the first technical route,

the NRMSE of the proposed model in the three cases is reduced

by 0.8%, 1.1%, and 0.6%, and the NMAE is reduced by 0.6%,

1.0%, and 0.5% respectively. Among the three algorithms using

the second technical route, the Seq2Seq algorithm performs

better. Compared with the Seq2Seq algorithm, the proposed

algorithm reduces the NRMSE by 1.0%, 1.7%, and 1.0%,

reduces the NMAE by 0.6%, 1.2%, and 0.6%.

TABLE 3 NRMSE AND NMAE OF THREE WIND FARMS

Methods

Case 1

Case 2

Case 3

NRMSE

NMAE

NRMSE

NMAE

NRMSE

NMAE

Persistence

0.331

0.246

0.373

0.279

0.323

0.249

Ridge

0.174

0.126

0.171

0.124

0.163

0.119

KNN

0.182

0.131

0.184

0.131

0.173

0.125

SVM

0.175

0.126

0.17

0.122

0.160

0.116

lightGBM

0.174

0.125

0.168

0.118

0.162

0.117

DeepAR

0.187

0.143

0.198

0.156

0.189

0.146

Seq2Seq

0.176

0.125

0.176

0.124

0.164

0.116

MQRNN

0.179

0.139

0.176

0.135

0.166

0.125

Proposed

0.166

0.119

0.159

0.112

0.154

0.110

Figure 7 shows the averaged 48-hour NRMSE and NMAE

across all test samples for the proposed model and the DL

benchmarks. The 48-hour NRMSE and NMAE of MSTAN are

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

below the other three Deep Learning methods. A small number

of RMSE and MAE points of MSTAN are not the lowest, but

the overall error trend of the MSTAN model is better.

Fig. 7. The NRMSE and NMAE of MSTAN from 0th to 47th hours for Case 2.

The results are the averaged values across all testing samples.

2) Probabilistic results

QL and CRPS results of the three cases are presented in Table

4. The QL and CRPS of MSTAN are lower than the QL and

CRPS of other counterparts. By comparing with the best model

using the first technical route, the QL of the proposed model is

reduced by 5%, 7.2%, and 3.0% for the three cases. The CRPS

is reduced by 14.4%, 10.8%, and 10.3% for the three cases.

Among the three algorithms using the second technical route,

the Seq2Seq algorithm performs better. Compared with the

Seq2Seq algorithm, the proposed algorithm reduces the QL by

4.4%, 12.5%, and 0.7%, and the CRPS is reduced by 15.6%,

13.3%, and 6.9%.

TABLE 4 QL AND CRPS OF THREE WIND FARMS

Methods

Case1

Case 2

Case 3

CRPS

Persistence

Ridge

0.382

0.103

0.3

0.094

0.307

0.096

KNN

0.397

0.106

0.317

0.101

0.322

0.101

SVM

0.38

0.104

0.294

0.094

0.301

0.096

lightGBM

0.378

0.107

0.285

0.092

0.302

0.100

DeepAR

0.432

0.114

0.372

0.115

0.368

0.112

Seq2Seq

0.376

0.104

0.297

0.094

0.295

0.093

MQRNN

0.421

0.114

0.322

0.106

0.318

0.110

Proposed

0.36

0.090

0.264

0.083

0.293

0.087

Figure 8 shows the averaged 48-hour QL and CRPS across

all test samples for the proposed model and the DL benchmarks.

The probabilistic prediction results of the MSTAN model are

significantly better than several deep learning algorithms under

most prediction horizons.

Fig. 8. Hourly QL and CPRS of MSTAN from 0th to 47th hours for Case 2. The

results are the averaged values across all testing samples.

The ACE score of several benchmarks and the proposed

model are shown in Appendix. For the three studied wind farms,

the ACE of MSTAN is below 0 at most confidence levels. ACE

of Ridge and KNN is more stable and closer to 0 than MSTAN

in case 1 and case 3, but MSTAN performs better than most

other counterparts in three cases. The ACE of LightGBM,

MQRNN and DeepAR are poor in three cases.

The probability distributions of wind power forecasts for the

next 48 hours are shown in Figure 9. In each sub-figure, the

probabilistic forecasts for three consecutive days are presented.

In each wind farm, the predicted values can follow the actual

wind farm power outputs very well. When the deterministic

wind power forecasts close to 0 MW or the wind farm capacity,

uncertainty of the wind power forecasts is low. When the

deterministic wind power forecasts close to 50% of wind farm

capacity, wind power prediction uncertainty is high.

Fig. 9. The short-term probabilistic forecasts for the next 48 hours are deduced

every 24 hours. The generated quantiles are 5% to 95% (19 lines).

3) Performance variation evaluation by bootstrapping

Since the used dataset is small, bootstrapping is used to

estimate the significance of the deterministic and probabilistic

results [43]. Three kinds of comparisons are implemented in

this part, (1) the performance variation of different forecasting

models, (2) the performance variation of single-source NWP

and multi-source NWP schemes, (3) the performance variation

of module ablation. The box plots show the NRMSE and CRPS

variation using the bootstrap approach with 200 bootstrap

samples.

(a) The comparison between benchmarks and MSTAN

Results similar to Table 3 and Table 4 are acquired by

bootstrapping in Figure 10 (a) and (d), Figure 11 (a) and (d),

Figure 12 (a) and (d). These figures show that the accuracy

improvement of the proposed model is significant.

(b) The comparison between single-source and multi-source

NWP

As shown in Table 1, each wind farm has 4 sources of NWP.

The most accurate two source and three source NWPs are

picked based on the NWP RMSE ranking. The NWP RMSE

ranking of Case 1(WF6) is 3<1<2<4. The NWP RMSE ranking

of Case 2(WF9) is 3<1<4<2. The NWP RMSE ranking of Case

3(WF2) is 3<1<2<4.

Single source NWP (source 1, source 2, source 3 and source

4) and most accurate two-source, three-source, four-source

NWPs are compared in Figure 10 (b) and (e), Figure 11 (b) and

NRMSE

NMAE

CRPS

Day 1

Day 2

Day 3

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

Fig. 10. NRMSE and CRPS of case 1.

Fig. 11. NRMSE and CRPS of case 2.

Fig. 12. NRMSE and CRPS of case 3.

(a) NRMSE of benchmarks and MSTAN. (b) NRMSE of MSTAN by using single-source NWP and multi-source NWP. (c) NRMSE of MSTAN with and without

different modules.

(d) CRPS of benchmarks and MSTAN. (e) CRPS of MSTAN by using single-source NWP and multi-source NWP. (f) CRPS of MSTAN with and without different

modules.

The box plots show the NRMSE and CRPS variation using the bootstrap approach with 200 bootstrap samples.

(d)

(e)

(f)

(a)

(b)

(c)

(d)

(e)

(f)

(a)

(b)

(c)

(d)

(e)

(f)

(a)

(b)

(c)

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

(e), Figure 12 (b) and (e). As depicted in these figures, the

poorest prediction performance of MSTAN all appears when

single-source NWP is used. The best prediction results of

MSTAN are achieved when using 4-source NWPs for all cases.

The results of using the best two-sources NWPs and best three-

sources NWPs are close. At wind farm 2, the results of using

one source (source 1 and source 3), best 2-source NWPs and

best 3-source NWPs are close.

In addition, the NWP with the smaller RMSE may not

necessarily lead to better wind power prediction accuracy. This

phenomenon is common and may be caused by the nonlinear

relationship between wind speed and power. It may bring more

risks to the single-source NWP forecasting scheme.

In order to demonstrate the effectiveness of the designed

model, ablation experiments are implemented. Specifically, we

remove one module at a time in the MSTAN and readjust the

hyper-parameters. The MSTAN without different module are

named as follows:

w_o selection: The MSTAN model without the multi-source

variable attention module.

w_o temp_attn: The MSTAN model without the temporal

attention module.

w_o skip: The MSTAN model without the residual module.

As shown in Figure 10 (c) and (f), Figure 11 (c) and (f),

Figure 12 (c) and (f), the best prediction results are acquired by

the intact MSTAN. For three cases, removing the temporal

attention module and the residual module will cause a

significant prediction performance drop. Removing the Multi-

source variable attention module causes slight performance

drops.

4) Temporal Attention weights pattern

When giving the forecasts at a time step, the MSTAN model

not only considers the inputs of the current time step but also

considers the inputs of other time steps. Temporal Attention

weights (

) determine which time step should

be concerned. As shown in Figure 13. (a), the historical wind

power and 48-hour forecasts of one sample are drawn. Figure

13. (b) plot the weight matrix of the temporal attention module

for this sample. Figure 13. (c) shows the 3D version of Figure

13. (b). Y-axis represents the lead time step of the output

sequence, and X-axis represents the considered time step of the

input sequence, Z-axis represents the value of attention weights.

At each output time step, it should be determined how to assign

the weights to each input time step. In Figure 13. (a), the next

48-hour real wind power sequence can be divided into three

parts. The first part is a downward ramp process (0th to 18th

hour). The second part is a fast-upward ramp process (18th to

30th hour). The third part is a smooth process (30th to 47th hour).

In the first process, the wind power prediction results focus

on the inputs sequence from the 12th to 18th hour. The wind

power forecasts of the second process focus on the input

sequence from the 18th to 48th hour. The wind power forecasts

of the last process give higher weights to the input sequence

from 40th to 47th hour. Therefore, the learned temporal pattern

could follow the trends of the real wind power sequence. The

ability to dynamically pay attention to the critical parts of the

sequence is not available in DL models such as LSTM and CNN.

Fig. 13. The temporal attention weights of one day (case 2). Subfigure (c) is

the 3D version of subfigure (b).

V. CONCLUSIONS

In this paper, a Multi-Source and Temporal Attention

Network (MSTAN) is proposed for the short-term WPPF. The

MSTAN model takes the multi-source NWP data and historical

measurements sequence as inputs and the future 48-hours wind

power density forecasts as output. The MSTAN is constructed

by four major modules. (1) In order to dynamically select the

driving variables and reduce the harmful effects raised by

introducing multi-source NWP, a novel multi-source selection

module is designed. (2) The temporal attention module is

proposed to extract the long-term temporal dependency hidden

in the multi-source NWP. (3) The residual module is wrapped

into the MSTAN model to provide adaptive complexity and

avoid overfitting. (4) the beta kernel-based mixture density

module is used to output the multi-step probabilistic prediction

results.

Based on the case study over three selected wind farms, the

MSTAN is strictly compared with two state-of-the-art technical

Lead time

Considered time step

(a)

(b)

(c)

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

routes. Results demonstrate that MSTAN gives higher

deterministic prediction accuracy and better probabilistic

evaluation score. The effectiveness of the multi-source

selection module, the temporal attention, and the residual

module are respectively demonstrated.

Some works should be done to improve the proposed

MSTAN architecture further. (1) The proposed model only

considers the temporal dependency, but the spatial dependence

is important for wind power forecasting. Novel spatial attention

or spatial feature extraction modules should be merged into

MSTAN. (2) In order to meet the demands of more wind farms,

the applicability of MSTAN at other time resolutions should be

verified.

VI. REFERENCE

[1] J. Yan, Y. Liu, S. Han, Y. Wang, and Shuanglei Feng,” Reviews on

uncertainty analysis of wind power forecasting,” Renewab le and Sustain.

Energy Rev., vol. 52, pp. 1322-1330, 2015.

[2] G. Sideratos and N. D. Hatziargyriou, “An Advanced Statistical Method

for Wind Power Forecasting,” IEEE Trans. Power Syst., vol. 22, no. 1,

pp. 258-265, Feb. 2007.

[3] J. R. Andrade and R. J. Bessa, “Improving Renewable Energy

Forecasting with a Grid of Numerical Weather Predictions,” IEEE Trans.

Sustain. Energy, vol. 8, no. 4, pp. 1571-1580, Oct. 2017.

[4] J. W. Taylor, P. E. McSharry and R. Buizza, “Wind Power Density

Forecasting Using Ensemble Predictions and Time Series Models,” IEEE

Trans. Energy Convers., vol. 24, no. 3, pp. 775-782, Sept. 2009.

[5] W. Xie, P. Zhang, R. Chen, and Z. Zhou, “A Nonparametric Bayesian

Framework for Short-Term Wind Power Probabilistic Forecast,” IEEE

Trans. Power Syst., vol. 34, no. 1, pp. 371-379, Jan. 2019.

[6] P. Du, “Ensemble Machine Learning-Based Wind Forecasting to

Combine NWP Output with Data from Weather Station,” IEEE Trans.

Sustain. Energy, vol. 10, no. 4, pp. 2133-2141, Oct. 2019.

[7] N. Chen, Z. Qian, I. T. Nabney and X. Meng, “Wind Power Forecasts

Using Gaussian Processes and Numerical Weather Prediction,” IEEE

Trans. Power Syst., vol. 29, no. 2, pp. 656-665, March 2014.

[8] Y. Zhang and J. Wang, “A Distributed Approach for Wind Power

Probabilistic Forecasting Considering Spatio-Temporal Correlation

Without Direct Access to Off-Site Information,” IEEE Trans. Power

Syst., vol. 33, no. 5, pp. 5714-5726, Sept. 2018.

[9] Z. Wang, W. Wang, C. Liu, Z. Wang, and Y. Hou, “Probabilistic

Forecast for Multiple Wind Farms Based on Regular Vine Copulas,”

IEEE Trans. Power Syst., vol. 33, no. 1, pp. 578-589, Jan. 2018.

[10] M. Khodayar and J. Wang, “Spatio-Temporal Graph Deep Neural

Network for Short-Term Wind Speed Forecasting,” IEEE Trans. Sustain.

Energy, vol. 10, no. 2, pp. 670-681, April. 2019.

[11] L. Mark, T. P. Erlinger, D. Patschke, and C. Varrichio, “Probabilistic

Gradient Boosting Machines for GEFCom2014 Wind Forecasting,” Int.

J. Forecast., vol. 32, no.3, pp. 1061–66, 2016.

[12] Silva. Lucas, “A Feature Engineering Approach to Wind Power

Forecasting,” I Int. J. Forecast. vol. 30, no. 2, pp. 395–401, 2014.

[13] K. Bhaskar and S. N. Singh, “AWNN-Assisted Wind Power Forecasting

Using Feed-Forward Neural Network,” IEEE Trans. Sustain. Energy, v ol.

3, no. 2, pp. 306-315, April 2012.

[14] Da. Federica and S. Alessandrini, “Post-Processing Techniques and

Principal Component Analysis for Regional Wind Power and Solar

Irradiance Forecasting,” Solar Energy, vol.134, pp. 327–38, 2016.

[15] Y. Wu, Q. Wu and J. Zhu, “Data-driven wind speed forecasting using

deep feature extraction and LSTM,” IET Renew. Power Gene., vol. 13,

no. 12, pp. 2062-2069, 2019.

[16] S. Li, P. Wang, and L. Goel, “Wind Power Forecasting Using Neural

Network Ensembles with Feature Selection,” IEEE Trans. Sustain.

Energy, vol. 6, no. 4, pp. 1447-1456, Oct. 2015.

[17] S. Kunpeng, Y. Qiao, W. Zhao, Q. Wang, M. Liu, and Z. Lu, “An

Improved Random Forest Model of Short ‐term Wind ‐power

Forecasting to Enhance Accuracy, Efficiency, and Robustness,” Wind

Energy, vol. 21, no. 12, pp. 1383–1394, 2018.

[18] L. Li, Y. LIU, Y. YANG, S. HAN, “Short-term wind speed forecasting

based on CFD pre-calculated flow fields,” Proceed. The Chinese Soc.

Electric. Eng., vol. 33, no. 7, pp.27-32, 2013.

[19] L. Landberg, “A mathematical look at a physical power prediction

model,” Wind Energy, vol.1, no.1, pp:23-28, 2015.

[20] E. Erdem, and S. Jing, “ARMA based approaches for forecasting the

tuple of wind speed and direction,” Appl Energy, vol.88, no.4, pp.1405-

1414, 2011.

[21] P. Louka, G. Galanisac, N. Siebert, et al., “Improvements in wind speed

forecasts for wind power prediction purposes using Kalman filtering,” J.

Wind Eng. & Indus. Aero., vol.96, no.12, pp.2348-2362, 2018.

[22] E. Mangalova and O. Shesterneva, “K-nearest neighbors for

GEFCom2014 probabi listic wind power forecasting,” Int. J. of Forecast.,

vol. 32, no. 3, pp. 1067-1073, 2016.

[23] H. S. Dhiman, D. Deb, and J. M. Guerrero, “Hybrid machine intelligent

SVR variants for wind forecasting and ramp events,” Renew. Sustain.

Energy Rev., vol. 108, pp. 369-379, 2019.

[24] M. Landry, T.P. Erlinger, D. Patschke, and C. Varrichio, “Probabilistic

gradient boosting machines for GEFCom2014 wind forecasting,” Int. J.

Forecast., vol. 32, no. 3, pp. 1061-1066, 2016.

[25] Y. Zhao, L. Ye, P. Pinson, Y. Tang, P. Lu, “Correlation-constrained and

sparsity-controlled vector autoregressive model for spatio-temporal wind

power forecasting,” IEEE Trans. Power Syst., vol.33, no. 5, pp.5029-

5040, 2018.,

[26] J. W. Messner, P. Pinson, “Online adaptive lasso estimation in vector

autoregressive models for high dimensional wind power forecasting,” Int.

J. Forecast., vol.35, no. 4, pp.1485-1498, 2019

[27] L. Cavalcante, J. B. Ricardo, R. Marisa, and J. Browell, “LASSO vector

autoregression structures for very short‐term wind power forecasting,”

Wind Energy, vol.20, no. 4, pp. 657-675, 2017.

[28] H. Quan, D. Srinivasan, and A. Khosravi, “Short-Term Load and Wind

Power Forecasting Using Neural Network-Based Prediction Intervals,”

IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 2, pp. 303-315, Feb.

2014.

[29] C. Wan, Z. Xu, P. Pinson, Z. Y. Dong, and K. P. Wong, “Optimal

Prediction Intervals of Wind Power Generation,” IEEE Trans. Power

Syst., vol. 29, no. 3, pp. 1166-1174, May 2014.

[30] Z. Shi, H. Liang, and V. Dinavahi, “Direct Interval Forecast of Uncertain

Wind Power Based on Recurrent Neural Networks,” IEEE Trans. Sustain.

Energy, vol. 9, no. 3, pp. 1177-1187, Jul. 2018.

[31] Y. Lin, M. Yang, C. Wan, J. Wang, and Y. Song, “A Multi-Model

Combination Approach for Probabilistic Wind Power Forecasting,”

IEEE Trans. Sustain. Energy., vol. 10, no. 1, pp. 226-237, Jan. 2019.

[32] T. Li, Y. Wang, and N. Zhang, “Combining Probability Density

Forecasts for Power Electrical Loads,” IEEE Trans. Smart Grid, vol. 11,

no. 2, pp. 1679-1690, Mar. 2020.

[33] H. Z. Wang, G. B. Wang, G. Q. Li, J. C. Peng, and Y. T. Liu, “Deep

belief network based deterministic and probabilistic wind speed

forecasting approach,” Appl. Energy, vol. 182, pp. 80-93, 2016.

[34] C. Zhang, C. L. P. Chen, M. Gan, and L. Chen, “Predictive Deep

Boltzmann Machine for Multiperiod Wind Speed Forecasting,” IEEE

Trans. on Sustain. Energy, vol. 6, no. 4, pp. 1416-1425, Oct. 2015.

[35] J. Yan, H. Zhang, Y. Liu, S. Han, L. Li, and Z. Lu, “Forecasting the High

Penetration of Wind Power on Multiple Scales Using Multi-to-Multi

Mapping,” IEEE Trans. on Power Syst., vol. 33, no. 3, pp. 3276-3284,

May 2018.

[36] A. Banik, C. Behera, T. V. Sarathkumar and A. K. Goswami, “Uncertain

wind power forecasting using LSTM-based prediction interval,” IET

Renew. Power Gene., vol. 14, no. 14, pp. 2657-2667, Oct. 2020.

[37] C. Li, G. Tang, X. Xue, A. Saeed, and X. Hu, “Short-Term Wind Speed

Interval Prediction Based on Ensemble GRU Model,” IEEE Trans.

Sustain. Energy, vol. 11, no. 3, pp. 1370-1380, Jul. 2020.

[38] H. Wang, G. Li, G. Wang, J. Peng, H. Jiang, and Y. Liu, “Deep learning

based ensemble approach for probabilistic wind power forecasting,”

Appl. Energy, vol. 188, pp. 56-70, 2017.

[39] Y. Hong, C Lian, and P. P. Rioflorido, “A hybrid deep learning-based

neural network for 24-h ahead wind power forecasting,” Appl. Energy,

vol. 250, pp. 530-539, 2019.

[40] Borovykh A, Bohte S, Oosterlee C W. Conditional Time Series

Forecasting with Convolutional Neural Networks[J]. arXiv, 2017,

[Online] Available: https://arxiv.org/abs/1703.04691.

[41] P. Kou, C. Wang, D. Liang, S. Cheng, and L. Gao, “Deep learning

approach for wind speed forecasts at turbine locations in a wind farm,”

IET Renew. Power Gene., vol. 14, no. 13, pp. 2416-2428, Oct. 2020.

[42] Y. Yu, X. Han, M. Yang, and J. Yang, “Probabilistic Prediction of

Regional Wind Power Based on Spatiotemporal Quantile Regression,”

IEEE Trans. Indust. Appl., vol. 56, no. 6, pp. 6117-6127, Dec. 2020.

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

[43] B. J. Ricardo, M. Corinna, and F. Vanessa, et al., “Towards Improved

Understanding of the Applicability of Uncertainty Forecasts in the

Electric Power Industry,” Energies, vol. 10, no. 9, 2017.

[44] J. W. Messner, P. Pinson, J. Browel, M. B. Bjerregård, I. Schicker,

“Evaluation of Wind Power Forecasts – an up-to-Date View,” Wind

Energy, vol. 23, no. 6, pp.1461–1481, 2020.

[45] I. Sutskever, O. Vinyals, and Q. V. Le., “Sequence to Sequence Learning

with Neural Networks,” Proceed. The 27th Int. Conf. NIPS, pp. 3104–12,

2014.

[46] G, Gregor. “The State-Of-The-Art in Short-Term Prediction of Wind

Power. A Literature Overview,” National Laboratory, Denmark, Aug.

2003.

[47] S. S. Soman, H. Zareipour, O. Malik and P. Mandal, “A review of wind

power and wind speed forecasting methods with different time horizons,”

North American Power Symposium 2010, Arlington, TX, USA, 2010,

pp. 1-8.

[48] Y. Dauphin, A. Fan, M. Auli, and D. Grangier, “Language Modeling

with Gated Convolutional Networks,” In Proceed. the 34th Int. Conf.

Machine Learning, 2017.

[49] H. Kaiming, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep

Residual Networks,” Euro. Conf. Computer Vision, pp. 630–45. 2016.

[50] B. J. Lei, J. R. Kiros, G. E. Hinton, “Layer Normalization,” 2016, [Online]

Available: https://arxiv.org/pdf/1607.06450v1.pdf

[51] V. Ashish, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,

L. Kaiser, and I. Polosukhin, “Attention Is All You Need,” Proceed. the

31st Int. Conf. NIPS, pp. 5998–6008, 2017.

[52] H. Zhang, Y. Liu, J. Yan, S. Han, L. Li, and Q. Long, “Improved Deep

Mixture Density Network for Regional Wind Power Probabilistic

Forecasting,” IEEE Trans. Power Syst., vol. 35, no. 4, pp. 2549-2560,

Jul. 2020.

[53] [51] H. Tao, P. Pinson, S. Fan, H. Zareipour, A. Troccoli, and R. J.

Hyndman, “Probabilistic Energy Forecasting: Global Energy

Forecasting Competition 2014 and Beyond.” Int. J. Forecast., vol. 32,

no. 3, pp. 896–913, 2016.

[54] D. Salinas, V. Flunkert, and J. Gasthaus, “DeepAR: Probabilistic

Forecasting with Autoregressive Recurrent Networks,” Int. J. Forecast.,

vol. 36, no. 3, pp. 1181–1191. 2020.

[55] R. Wen, K. Torkkola, B. Narayanaswamy, and D. Madeka, “A Multi-

Horizon Quantile Recurrent Forecaster,” 2017, [Online] Available:

https://arxiv.org/pdf/1711.11053.pdf

[56] A. Alexandrov, K. Benidis, and M. B. Schneider, et al., “GluonTS:

Probabilistic Time Series Models in Python,” 2019, [Online] Available:

https://arxiv.org/pdf/1906.05264.pdf

[57] A. Paszke, S. Gross, S. Chintala, et al., “Automatic differentiation in

PyTorch,” NIPS 2017 Workshop Autodiff, Oct. 2017.

[58] F. Pedregosa, G. Varoquaux, A. Gramfort, et al, “Scikit-Learn: Machine

Learning in Python,” J. Machine. Learn. Res., vol. 12, no. 85, pp. 2825–

2830, 2011.

[59] G. Ke, Q. Meng, T. Finley, et al., “LightGBM: A Highly Efficient

Gradient Boosting Decision Tree,” Proceed. the 31st Int. Conf. NIPS, vol.

30, pp. 3149–3157, 2017.

[60] H. Hans, “Decomposition of the Continuous Ranked Probability Score

for Ensemble Prediction Systems,” Weather and Forecasting, vol. 15, no.

5, pp.559–570, 2000.

[61] C. Wan, Z. Xu, P. Pinson, Z. Y. Dong and K. P. Wong, “Probabilistic

Forecasting of Wind Power Generation Using Extreme Learning

Machine,” IEEE Trans Power Syst., vol. 29, no. 3, pp. 1033-1044, May

2014.

VII. APPENDIX

A. The ACE results

Fig. 14 The Averaged Coverage Error for the 3 studied wind farms.

B. The Temporal Error Pattern of Multi-source NWP for other two wind farms

Fig.15. The temporal error pattern of multi-source NWP wind speed (Case 1 and Case 3)

(Four NWP sources and two kinds of time series prediction methods are studied. The one-year dataset is used.)

(a)

(b)

(c)

MTTLA‐DLW: Multi‐task TCN‐Bi‐LSTM transfer learning approach with dynamic loss weights based on feature correlations of the training samples for short‐term wind power prediction

Article

Full-text available

May 2024
WIND ENERGY

Wind power prediction for newly built wind farms is usually faced with the problem of no sufficient historical data. To efficiently extract the useful features from related wind farms, a novel transfer learning method based on temporal convolutional network (TCN)‐Bi‐long short‐term memory (LSTM) with dynamic loss weights is proposed. Firstly, a novel multi‐task TCN‐Bi‐LSTM model is designed to extract common features. The separate TCNs, and common Bi‐LSTM layers of the proposed model are designed to extract the temporal features from related wind farms. Secondly, in the pre‐training stage, to optimize the training process of the neural networks, a dynamic loss‐weighting strategy is proposed for multi‐task learning (MTL) to select the most related features, which increase the prediction accuracy by providing a suitable optimization object. Thirdly, the multi‐task TCN‐Bi‐LSTM model is re‐trained based on the samples from the target wind farm. Finally, a dataset of seven wind farms was employed to evaluate the efficiency of the proposed MTL structure and the dynamic loss‐weighting strategy. The result shows that the root mean squared error of the 12‐h short‐term prediction can be decreased by 4.19% compared with the traditional single‐task learning model, which verifies the validity of the proposed multi‐task transfer learning method.

Hybrid model based on similar power extraction and improved temporal convolutional network for probabilistic wind power forecasting

Article

Jun 2024
ENERGY

Bias correction of wind power forecasts with SCADA data and continuous learning

Article

Full-text available

Jun 2024

Wind energy plays a critical role in the transition towards renewable energy sources. However, the uncertainty and variability of wind can impede its full potential and the necessary growth of wind power capacity. To mitigate these challenges, wind power forecasting methods are employed for applications in power management, electricity trading, or maintenance scheduling. In this work, we present, evaluate, and compare four machine learning-based wind power forecasting models. Our models correct and improve 48-hour forecasts extracted from a numerical weather prediction (NWP) model. The models are evaluated on datasets from a wind park comprising 65 wind turbines. The best improvement in forecasting error and mean bias was achieved by a convolutional neural network, reducing the average NRMSE down to 22%, coupled with a significant reduction in mean bias, compared to a NRMSE of 35% from the strongly biased baseline model using uncorrected NWP forecasts. Our findings further indicate that changes to neural network architectures play a minor role in affecting the forecasting performance, and that future research should rather investigate changes in the model pipeline. Moreover, we introduce a continuous learning strategy, which is shown to achieve the highest forecasting performance improvements when new data is made available.

A novel meta-learning approach for few-shot short-term wind power forecasting

Article

May 2024
APPL ENERG

A novel dynamic spatio-temporal graph convolutional network for wind speed interval prediction

Article

May 2024
ENERGY

A Wind Power Prediction Model Based on Recurrent Highway Network and Multi-Layer Semantic Fusion Attention Mechanism

Conference Paper

Feb 2024

A novel adaptively combined model based on induced ordered weighted averaging for wind power forecasting

Article

Mar 2024
RENEW ENERG

An Adaptive Approach for Probabilistic Wind Power Forecasting Based on Meta-Learning

Article

Jul 2024

This paper studies an adaptive approach for probabilistic wind power forecasting (WPF) including offline and online learning procedures. In the offline learning stage, a base forecast model is trained via inner and outer loop updates of meta-learning, which endows the base forecast model with excellent adaptability to different forecast tasks, i.e., probabilistic WPF with different lead times or locations. In the online learning stage, the base forecast model is applied to online forecasting combined with incremental learning techniques. On this basis, the online forecast takes full advantage of recent information and the adaptability of the base forecast model. Two applications are developed based on our proposed approach concerning forecasting with different lead times (temporal adaptation) and forecasting for newly established wind farms (spatial adaptation), respectively. Numerical tests were conducted on real-world wind power data sets. Simulation results validate the advantages in adaptivity of the proposed methods compared with existing alternatives.

Error Compensation-Considered Wind Power Forecasting: A Hybrid Deep Learning Method

Conference Paper

Nov 2023

A New Combination Model for Offshore Wind Power Prediction Considering the Number of Climbing Features

Conference Paper

Mar 2024

The accurate identification of offshore wind power ramp events has great effects on wind power forecast. In order to improve the prediction accuracy of offshore wind power, this paper proposes an XGBoost-GRU combined forecasting model considering the number of climbing features. Firstly, the adaptive revolving door algorithm is used to identify the wind power climbing event, as well as data compression and feature extraction. Then, the XGBoost decision tree and gating loop unit are used to make preliminary power prediction. In case studies, the results are weighted and combined in detail. It is proved that the proposed model has a terrific performance on the offshore wind power prediction.

Uncertain wind power forecasting using LSTM-based prediction interval

Article

Full-text available

Oct 2020
IET RENEW POWER GEN

Estimating prediction intervals (PIs) is an efficient and reliable way of capturing the uncertainties associated with wind power forecasting. In this study, a state of the art recurrent neural network (RNN) known as long short-term memory (LSTM) is used to produce reliable PIs for one-hour ahead wind power uncertainty forecast using the non-parametric lower upper bound estimation framework. Two realistic hourly stamped wind power data sets are obtained and by using mutual information and false nearest neighbours techniques, the data are made suitable for model inputs. A novel comprehensive objective function consisting of the coverage probability, the average width of the PIs, symmetricity and variational synchronicity is developed to train the LSTM model using intelligent optimisation techniques. The standard of the PIs generated for the test set as well as for different seasons are evaluated based on the indices used to design the objective function for model training, with one of them being modified. The performance of the proposed LSTM model is found to outperform typical RNN models like Elman, non-linear auto-regressive with exogenous models and other benchmarking models while tested on the real-world data sets.

Deep learning approach for wind speed forecasts at turbine locations in a wind farm

Article

Full-text available

Sep 2020
IET RENEW POWER GEN

In a wind farm, individual turbines disturb the wind field by generating wakes, so wind speeds at various turbine locations are different. From the perspective of wind farm control, there is an interest in dynamic optimization of the power reference for each individual wind turbine, and the wind speed forecast at each turbine location is hence required. This paper develops a joint model of convolutional neural network (CNN) and the gated recurrent units (GRU) to forecast the wind speed at turbine locations. This model employs a two‐layer architecture. At the lower‐layer, the spatial features are automatically extracted by CNN. The extracted spatial features describe the spatial correlations among multiple wind turbines. At the upper‐layer, GRU learns the temporal correlations across the extracted spatial features. This joint model is trained in an integrated manner. A salient characteristic of this model is that it extracts high‐level spatial‐temporal features from wind data. These automatically learnt features capture the spatial‐temporal wind dynamics and interactions in a wind farm, thus being informative and appropriate for the forecasting at specific turbine locations. The simulation on actual data demonstrates the effectiveness of the presented model.

Evaluation of wind power forecasts—An up‐to‐date view

Article

Full-text available

Mar 2020
WIND ENERGY

Wind power forecast evaluation is of key importance for forecast provider selection, forecast quality control, and model development. While forecasts are most often evaluated based on squared or absolute errors, these error measures do not always adequately reflect the loss functions and true expectations of the forecast user, neither do they provide enough information for the desired evaluation task. Over the last decade, research in forecast verification has intensified, and a number of verification frameworks and diagnostic tools have been proposed. However, the corresponding literature is generally very technical and most often dedicated to forecast model developers. This can make forecast users struggle to select the most appropriate verification tools for their application while not fully appraising subtleties related to their application and interpretation. This paper revisits the most common verification tools from a forecast user perspective and discusses their suitability for different application examples as well as evaluation setup design and significance of evaluation results.

Improved Deep Mixture Density Network for Regional Wind Power Probabilistic Forecasting

Article

Full-text available

Feb 2020

The unsteady motion of the atmosphere incurs nonlinear and spatiotemporally coupled uncertainties in the wind power prediction (WPP) of multiple wind farms. This brings both opportunities and challenges to wind power probabilistic forecasting (WPPF) of a wind farm cluster or region, particularly when wind power is highly penetrated within the power system. This paper proposes an Improved Deep Mixture Density Network (IDMDN) for short-term WPPF of multiple wind farms and the entire region. In this respect, a deep multi-to-multi (m2m) mapping Neural Network model, which adopts the beta kernel as the mixture component to avoid the density leakage problem, is established to produce probabilistic forecasts in an end-to-end manner. A novel modified activation function and several general training procedures are then introduced to overcome the unstable behavior and NaN (Not a Number) loss issues of the beta kernel function. Verification of IDMDN is based on an open-source dataset collected from seven wind farms, and comparison results show that the proposed model improves the WPPF performance at both wind farm and regional levels. Furthermore, a laconic and accurate probabilistic expression of predicted power at each time step is produced by the proposed model.

Combining Probability Density Forecasts for Power Electrical Loads

Article

Full-text available

Sep 2019

Researchers have proposed various probabilistic load forecasting models in the form of quantiles, densities, or intervals to describe the uncertainties of future energy demand. Density forecasts can provide more uncertainty information compared with quantile and interval. This paper proposes a novel and easily-implemented approach to combine density probabilistic load forecasts to further improve the performance of the final probabilistic forecasts. The combination problem is formulated as an optimization problem to minimize the continuous ranked probability score of the combined model by searching the weights of different individual methods. Under Gaussian mixture distribution assumption of the density forecasts, the problem is cast to a linearly constrained quadratic programming problem and can be solved efficiently. Case studies on the electric load datasets of eight areas verify the effectiveness of our method.

Data driven wind speed forecasting using deep feature extraction and LSTM

Article

Full-text available

Jun 2019
IET RENEW POWER GEN

Wind speed forecasting is important for high-efficiency utilisation of wind energy and management of grid-connected power systems. Due to the noise, instability and irregularity of atmosphere system, the current models based on raw historical data have encountered many problems. In this study, a deep novel feature extraction approach is developed based on stacked denoising autoencoders and batch normalisation. Then the deep features extracted from raw historical data are fed to long short-term memory (LSTM) neural networks for prediction. Meanwhile, density-based spatial clustering of applications with noise is employed to process the numerical weather prediction data. By picking out the abnormal samples, the representative training samples are selected to improve the efficiency of the model. For illustration and verification purposes, the proposed model is used to predict the wind speed of Wind Atlas for South Africa (WASA). Empirical results show that deep feature extraction can improve the forecasting accuracy of LSTM 49% than feature selection, indicating that proper feature extraction is crucial to wind speed forecasting. And the proposed model outperforms other benchmark methods at least 17%. Hence, the proposed model is promising for wind speed forecasting.

Probabilistic Prediction of Regional Wind Power Based on Spatiotemporal Quantile Regression

Article

May 2020

Different from power prediction for a single wind farm, the regional wind power prediction is to predict the total power of multiple wind farms located in the specific region. The regional wind power prediction involves more data which implicate abundant information on spatiotemporal correlations and nonlinearity. So that addressing the massive data and extracting representative features become the crucial issues to construct an effective regional wind power prediction model. This paper proposes a spatiotemporal quantile regression (SQR) algorithm to perform short-term nonparametric probabilistic prediction of regional wind power, incorporating the advantages of the hybrid neural network (HNN) and quantile regression (QR). In the approach, the high dimensional input data are reorganized into a feature graph that is ready for feature extraction by the HNN. Therefore, the advantages of HNN can be utilized to extract the representative features and construct nonlinear regression models. Meanwhile, by following the QR rules, the model obtains quantiles and perform probabilistic prediction. By properly addressing the explanatory variable selection issue, the approach provides a specific solution for regional wind power probabilistic prediction with the massive input data. Test results on a region with 10 wind farms demonstrate the effectiveness of the proposed approach.

Hybrid machine intelligent SVR variants for wind forecasting and ramp events

Article

Jul 2019
RENEW SUST ENERG REV

Wind speed and power forecast is an essential component to ensure grid stability and reliability. The traditional forecasting methods fail to address the non-linearity in the wind speed time-series, thus paving way for machine intelligent algorithms. This paper discusses a hybrid machine intelligent wind forecasting model utilizing different variants of Support Vector Regression (SVR) built on wavelet transform. Various performance indices are evaluated to identify the possible best one among four different machine learning regressors for wind forecasting application. Apart from standard ε-SVR and LS-SVR, two new regression models, namely, ε-Twin Support vector regression (ε-TSVR) and Twin Support vector regression (TSVR) are used to forecast short-term wind speed and are compared with Persistence model for four wind farm sites. The effect of the larger dataset on forecasting performance is evaluated for two wind farm sites from USA and India. Further, wind power ramp events are investigated at different hub heights and the forecasting performance of different variants of SVR is compared for five wind farm sites.

Short-Term Wind Speed Interval Prediction Based on Ensemble GRU Model

Article

Jul 2019

Wind speed interval prediction plays an increasingly important role in wind power production. The intermittent and fluctuant characteristics of wind power make high-quality prediction interval (PI) challenging. In this paper, a novel hybrid model based on gated recurrent unit (GRU) neural network and variational mode decomposition (VMD) is proposed for the wind speed interval prediction. At first, VMD is employed to decompose the complex wind speed time series into simplified modes, interval prediction model (IPM) and point prediction model (PPM) based on GRU are designed to conduct interval prediction on primary mode and point prediction on the rest modes before a composition and construction of the prediction interval. Then, error prediction model based on GRU is proposed to enhance the model performance by error correction. Eight cases from two wind fields are used to test and verify the proposed method. The results indicate that the proposed method is a high qualified method which is in possession of much higher PI coverage probability and narrower PI width.

A hybrid deep learning-based neural network for 24-h ahead wind power forecasting

Article

Sep 2019
APPL ENERG

Wind power generation is always associated with uncertainties as a result of fluctuations of wind speed. Accurate predictions of wind power generation are important for the efficient operation of power systems. This paper presents a hybrid deep learning neural network for 24 h-ahead wind power generation forecasting. This novel method is based on a Convolutional Neural Network (CNN) that is cascaded with a Radial Basis Function Neural Network (RBFNN) with a double Gaussian function (DGF) as its activation function. The CNN is utilized to extract wind power characteristics by convolution, kernel and pooling operations. The supervised RBFNN, incorporating a DGF, deals with uncertain characteristics. Realistic wind power generations, measured on a wind farm, were used in simulations. The proposed method is implemented using TensorFlow and Keras Library. Comparative studies of different approaches are shown. Simulation results reveal that the proposed method is more accurate than traditional methods for 24 h-ahead wind power forecasting.

Multi-Source and Temporal Attention Network for Probabilistic Wind Power Prediction

Abstract

Recommended publications

An Adaptive Approach for Probabilistic Wind Power Forecasting Based on Meta-Learning

Improved Deep Mixture Density Network for Regional Wind Power Probabilistic Forecasting

Sparse Variational Gaussian Process Based Day-Ahead Probabilistic Wind Power Forecasting

Forecasting the High Penetration of Wind Power on Multiple Scales Using Multi-to-Multi Mapping