ArticlePDF Available

Deep adversarial transfer neural network for fault diagnosis of wind turbine gearbox Deep adversarial transfer neural network for fault diagnosis of wind turbine gearbox

Authors:

Abstract and Figures

Labeling the fault data is a time-consuming and expensive operation. Therefore, the monitoring data obtained from wind farms are rarely accurately labeled. The method of deep adversarial transfer neural network for diagnosis of gearbox in wind turbine was put forward, which used the auxiliary data set and solved the problem of large data distribution differences with the help of domain adversarial method to transfer the features learned by auxiliary data set to the data from wind turbines. The fault diagnosis model under the condition of unsupervised was established, which, to a certain extent, reduced the dependence of the deep learning model to the labeled data obtained from wind turbine. The effectiveness of proposed method was verified by using vibration data from bearing failure test at Case Western Reserve University and measured vibration data from the gearbox in wind turbine. The results showed that this method was effective in realizing the cross-domain transfer mission of the fault diagnosis model between similar domains and provided a new direction for constructing the data-driven fault diagnosis model. ARTICLE HISTORY
Content may be subject to copyright.
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=ljge20
International Journal of Green Energy
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/ljge20
Deep adversarial transfer neural network for fault
diagnosis of wind turbine gearbox
Yuanchi Ma, Yongqian Liu, Zhiling Yang, Ming Cheng & Hang Meng
To cite this article: Yuanchi Ma, Yongqian Liu, Zhiling Yang, Ming Cheng & Hang Meng
(2023): Deep adversarial transfer neural network for fault diagnosis of wind turbine gearbox,
International Journal of Green Energy, DOI: 10.1080/15435075.2023.2194375
To link to this article: https://doi.org/10.1080/15435075.2023.2194375
Published online: 31 Mar 2023.
Submit your article to this journal
View related articles
View Crossmark data
Deep adversarial transfer neural network for fault diagnosis of wind turbine gearbox
Yuanchi Ma, Yongqian Liu, Zhiling Yang, Ming Cheng, and Hang Meng
State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources, North China Electric Power University, Beijing, China
ABSTRACT
Labeling the fault data is a time-consuming and expensive operation. Therefore, the monitoring data
obtained from wind farms are rarely accurately labeled. The method of deep adversarial transfer neural
network for diagnosis of gearbox in wind turbine was put forward, which used the auxiliary data set and
solved the problem of large data distribution dierences with the help of domain adversarial method to
transfer the features learned by auxiliary data set to the data from wind turbines. The fault diagnosis
model under the condition of unsupervised was established, which, to a certain extent, reduced the
dependence of the deep learning model to the labeled data obtained from wind turbine. The eective-
ness of proposed method was veried by using vibration data from bearing failure test at Case Western
Reserve University and measured vibration data from the gearbox in wind turbine. The results showed
that this method was eective in realizing the cross-domain transfer mission of the fault diagnosis model
between similar domains and provided a new direction for constructing the data-driven fault diagnosis
model.
ARTICLE HISTORY
Received 16 September 2022
Accepted 4 January 2023
KEYWORDS
Wind turbine gearbox; fault
diagnosis; transfer learning;
domain adversary; deep
learning
1. Introduction
Operating in the severe nature environment for a long time,
the wind turbines have higher fault rate than conventional
generator system (Pang et al. 2020; Pérez-Pérez et al. 2022).
Gearbox is the crucial equipment in the wind turbine, which
connects the drive system and the generation system (Nejad,
Odgaard, and Moan 2018). Therefore, it’s common to see the
faults of gearbox like tooth breaking or wear of tooth surface
happening under complex loads. Compared with other com-
ponents of the wind turbine, the fault rate of gearbox is not the
highest, but it will cause a lot of inconvenience to maintain the
turbine once it couldn’t normally work. In this way, the shut-
down time and economic loss caused by gearbox always come
to the first among all the parts of the wind turbine (Aafif et al.
2022; Dabrowski and Natarajan 2017).
The gearbox of wind turbine is a specific complex rotating
machinery equipment with high reliability (Feng et al. 2013).
However, the conventional fault diagnosis technologies are
difficult to adapt this kind of complex system with a long life
and high reliability (Tang et al. 2022; Zhu et al. 2022). In
general, the fault of gearbox in wind turbine can only be
recorded once or twice over years and takes tens of the thou-
sands of hours to collect monitoring data. Therefore, it is
difficult or even impossible to establish the gearbox fault
diagnosis system based on supervised learning methods due
to little failure information from field monitoring data.
The failure or fault of wind turbine gearbox is caused by
performance degradation and appertains to degenerative fail-
ure. Its performance is gradually degraded before failure,
which will be determined as failure or fault to a certain extent.
If the wind turbines are manufactured and used under the
same condition, its failure levels will be the same, then the
degradation orbit and failure state should also be the same.
However, the real situation is not like that, even the same batch
of turbines in the same wind farm do not work under the same
condition and environment. Due to the influence of different
terrains and wakes of the other turbines, the wind speed,
direction and turbulence intensity of different turbine loca-
tions are all distinguishing. Sometimes the differences are even
wide. Therefore, the loads borne by the gearbox are quite
different. The degradation process of gearbox is always inter-
fered by various fluctuations, and finally leads to the fuzziness
of failure and fault judgment (Dhiman et al. 2021; Rahimilarki
et al. 2022). Then, it’s difficult to diagnose the failure and fault
state with clear criterion.
The fault diagnosis of wind turbine is a challenging pro-
blem. At present, domestic and overseas scholars have car-
ried out a lot of work in relevant directions and have made
some meaningful achievements. According to the classifica-
tion of diagnosis methods, the fault diagnosis methods of
wind turbine gearbox are divided into the physical model
and data-driven model. According to the classification of
signal classification, including vibration signal, acoustic sig-
nal, electrical signal, temperature and oil composition. The
vibration analysis is the most commonly applied condition
monitoring technology for rotating machinery and is the
most effective method for fault diagnosis of wind turbine
drive trains (Isham et al. 2019). Time domain analysis,
frequency domain analysis and the time frequency domain
analysis are the main methods of traditional vibration. R.
Uma Maheswari et al. (Maheswari and Umamaheswari
2017) concluded the feature extraction and fault classifica-
tion of non-linear and non-stationary signals in variable
CONTACT Yongqian Liu yqliu@ncepu.edu.cn State Key Laboratory of Alternate Electrical Power System with Renewable Energy Sources, North China Electric
Power University, Beijing 102206, China
INTERNATIONAL JOURNAL OF GREEN ENERGY
https://doi.org/10.1080/15435075.2023.2194375
© 2023 Taylor & Francis Group, LLC
speed drive such as wind turbines drive chain to improve the
fault diagnosis accuracy. However, considering of the limita-
tions of existing methods in reality, the fault identification of
wind turbine gearbox still depends on the expert’s experi-
ence to make the final judgment, which is subjective, and is
difficult to describe it clearly in a formalized way, which
means that these methods will not be versatile and general-
izable in wind farms with multitude turbines of different
models and operating conditions.
At present, the fault diagnosis and health evaluation
methods of wind turbine based on data-driven have become
a research hotspot in this field. Ruonan Liu (Liu et al. 2015)
has studied and summarized different artificial intelligence
techniques like k-nearest neighbor, naive bayes, support
vector machine and artificial neural network on the problem
of fault diagnosis of rotating machinery. Although the con-
ventional machine learning is widely studied in fault diag-
nosis, the training process has high requirements for the
amount of fault data. However, the fault data of large com-
ponents of wind turbine are scarce, especially for the data
with fault labels, which has impeded the application of
supervised learning in condition monitoring due to the
lack of fault samples. In addition, with the aging of the
condition of the wind turbine gearbox, its performance will
gradually decrease, and the signal representing the condition
of gearbox will change its distribution accordingly. It will
seriously affect the generalization ability of diagnostic model
if it is not able to adaptively adjust.
In view of this limitation, unsupervised learning is advanta-
geous in cases of fault identification of the wind turbine gear-
box. The current research on unsupervised learning in pattern
recognition mainly includes denoising autoencoder and clus-
tering algorithm. Guoqian Jiang et al. (Jiang et al. 2017) pro-
posed a novel feature representation learning approach based
on stacked multilevel-denoising autoencoder (SMLDAEs),
which enabled to learn more general fault feature simulta-
neously at different scales from complex frequency spectra of
raw vibration data and improved the accuracy of fault identi-
fication of gearbox. Cameron Sobie (Sobie, Freitas, and Nicolai
2018) addressed the challenges for diagnosing roller bearings
with race faults by generating training data using information
gained from high resolution simulations of roller bearing
dynamics. The problem of vibration-based damage detection
for a population of nominally identical structures is considered
by K.J. Vamvoudakis-stefanou (Vamvoudakis-Stefanou,
Sakellariou, and Fassois 2018) via unsupervised statistical
time series type methods in order to solve the problem of
vibration damage monitoring, which realized a significant
improvement on its performance. Zhuang Li (Li 2016) opti-
mized the fuzzy kernel clustering neural network by using
particle swarm algorithm and the input unlabeled data is
classified in the high dimensional mapping space, the gearbox
fault features with high universality are identified and different
operational states of gearbox are distinguished successfully.
However, the effect of unsupervised learning based on cluster-
ing algorithm depends on the degree of similarity between
different data samples, so it is difficult to distinguish specific
types of faults, and the unsupervised learning is still difficult to
apply in actual fault diagnosis. In addition, the current
unsupervised method relies on the known fault samples to
train the network, so the problem of the lack of labeled data
is not solved.
The method of domain adaption in transfer learning pro-
vides a new solving direction of the task transformation
between different domains (Pan and Yang 2010; Zhuang
et al. 2015). Therefore, the data in the source environment
can transfer the task to the target data. Inspired by this idea, it
is considerable to introduce the bearing data with fault labels
to indicate the feature identification and fault classification on
the problem of gearbox fault diagnosis of wind turbine so as to
avoid the increased wind power operation cost in collecting
a large number of gearbox fault samples. At present, the
research of the transfer learning method in fault diagnosis
field is few. Chao Chen et al. (Chen, Shen, and Yan 2017)
carried out an enhanced LSSVM transfer learning strategy
based on supplemental data, which applied in bearing fault
diagnosis when data volume is insufficient. Aiming at the
problem of the motor diagnosis under variable speeds and
variable loads, the diagnostic method was proposed by Fei
Shen (Shen, Chen, and Yan 2017) based on singular value
decomposition of autocorrelation matrix, which combined
with feature extraction and transfer learning classifier.
Zgraggen et al. explore the main challenges of domain adapta-
tion for fault detection based on wind turbine SCADA data
and focus on fault detection algorithms for newly installed
turbines, or turbines with little historical data under diverse
operating conditions (Zgraggen et al. 2021). Jamil et al. pro-
posed an instance-based deep transfer learning method that
updates the weights of the source and the target machine
training samples separately. The results show that the pro-
posed method ignores negative transfer and achieves higher
accuracy compared to standard deep learning and deep trans-
fer learning methods. (Jamil et al. 2022). With the popularity of
deep learning methods, more and more researchers utilize the
transfer learning with deep neural network in this area.
Compared with traditional non-deep transfer learning, deep
transfer learning directly improves the learning effect on dif-
ferent tasks (Long et al. 2015, 2017; Tzeng et al. 2014; Yosinski
et al. 2014).
In this article, inspired by Wasserstein GAN (Arjovsky,
Chintala, and Bottou 2017; Gulrajani et al. 2017) and
domain adversarial neural network approaches (Ganin
et al. 2016; Shen et al. 2018; Tzeng et al. 2017), we propose
a method of deep adversarial transfer neural network for the
fault diagnosis of gearbox in wind turbine. This method
utilizes auxiliary data set, transfers the features learned
from the auxiliary data to the actual monitoring data and
establishes the fault diagnosis model under unsupervised
condition, which to some extent reduces the dependence of
the deep neural network model on the actual monitoring
labeled data of wind turbine. The paper consists of four
parts. The first section introduces the background and
development status of fault diagnosis technology of the
wind turbine drive chain. The second section introduces
modeling process of the deep adversarial transfer neural
network. The third sector presents the effect of the method
on the fault diagnosis task of wind turbine gearbox. The
conclusion is drawn in the final section.
2Y. MA ET AL.
2. Deep adversarial neural network model
2.1. Formal description on the problem of fault transfer
diagnosis
Aiming at the above problems of wind turbine gearbox fault
diagnosis, this paper proposes a transfer diagnosis method for
the fault of wind turbine gearbox. This method uses the test
data with labels from laboratory and the field data without
label from wind farm to proceed the feature transferring and
eventually implements the fault diagnosis of wind turbine
gearbox under the condition of unsupervised learning mode.
This task comes down to the problem of domain adaption in
the transfer learning field.
In order to be consistent with the notation of transfer
learning, we use s and t to refer to the source domain and
target domain respectively, and the auxiliary data set is
denoted as the source domain Ds, and the monitoring data
set of wind turbine as target domain Dt. The source domain
contains the auxiliary data set obtained under the control of
laboratory; the target domain is the data set of actual wind
turbine, which usually contains only a little even no annotated
data. The data samples in source and target domain respec-
tively the data sample sets in source and target domain respec-
tively Xs and Xt represent ys and yt represent respectively the
actual category of source domain and the target domain. Ys
and Yt respectively represent the category space of source
domain and the target domain.
In the view of the above marking ways, the
transfer diagnosis problem can be described in
a formal language as: given an annotated source domain
Xs¼ fxðiÞ
s;yðiÞ
sgn
i¼1 and an unannotated target domain
Xt¼ fxðjÞ
tgm
j¼1,and they have the same feature space, i.e.
Xs¼ Xt, and they also have the same category space
Ys¼ Yt. However, these two domains have different mar-
ginal distributions, i.e. Psxs
ð ÞPtxt
ð Þ, and the conditional
probability distributions are also different, i.e.
Psysjxs
ð ÞPtðytjxtÞ. Because target category yt cannot be
observed in advance, it is incapable to implement super-
vised learning in target domain. In the approach of adver-
sarial transfer learning, the annotated source domain is
first used in training a source mapping fs:Rn!Rd and
classifier fc:Rd!R to realize the classification of the
source domain; the unannotated target data set Xt is
utilized to learn the feature mapping of the target domain
ft:Rn!Rd so as to minimize the distance of feature
distribution between source and target domains, i.e.
fsxs
ð Þ and ftxt
ð Þ; Finally, the feature of target domain is
input into the trained classifier fc to predict the labels of
target domain yt2 Yt. The Unsupervised domain adaption
is achieved in this way.
2.2. Adversarial transfer model
The challenge of unsupervised domain adaption comes from
the different distribution of data in source domain and target
domain. The literature
[26]
combined Generative Adversarial
Networks (GANs) proposed Adversarial Discriminative
Domain Adaption (ADDA), which is a domain adversarial
framework applying the technology of GANs. On this basis,
we propose a new approach to address the distribution differ-
ence between source and target domain, by minimizing the
Wasserstein distance of feature distribution from source and
target domain, which makes the distribution of features pro-
duced by the target domain feature extractor are close to that
produced by the source domain feature extractor so as to
predict the labels of target domain with source domain
classifier.
2.2.1. Wasserstein distance
Before we get into the domain adversarial model, it is neces-
sary to take a brief look at the Wasserstein distance.
Wasserstein distance is used to measure the distance between
two distributions, which is defined as:
W P1;P2
ð Þ ¼ inf
γ,P1;P2
ð Þ Ex;yð Þ,γjjxyjj½ (1)
In the formula, P1;P2
ð Þ is the set of all possible joint prob-
ability distribution combined P1 and P2. x and y can be sampled
from Each possible joint distribution γ, and then figure out the
distance jjxyjj, and the expected value of distance between
samples can be calculated in this joint probability distribution γ.
The lower bound to this expected value in all possible joint
distributions is the Wasserstein distance.
We can visualize the probability distributions P1 and P2 as
two piles of soil, Eðx;yÞ,γ½jjxyjj can be seen as the consump-
tion required to move pile P1 to pile P2 under the path γ. While
Wasserstein distance is the minimum consumption under
optimal path. Therefore, Wasserstein distance is also called
Earth-Mover Distance. Using Wasserstein distance as the loss
function can avoid the problem of gradient disappearing.
Kantorovich-Rubinstein duality principle is applied in
Wasserstein GAN
[25]
, and Wasserstein distance is approxi-
mated as the solution of the following optimization problem:
W P1;P2
ð Þ max
f2DEx,P1f xð Þ½ Ex,P2f xð Þ½ (2)
In the formula, D is the set of functions satisfied the
1-Lipschitz constraint, i.e. the set of all functions satisfied the
following constraint jjfðx1Þ fðx2Þjj jjx1x2jj.
2.2.2. Adversarial transfer loss
According to the formal description of the problem of fault
transfer diagnosis, the adversarial transfer model includes two
objects, one is to minimize the Wasserstein distance between
the feature distributions of source domain and target domain,
the other is to minimize the classification error of source
domain classifier. Therefore, the loss function of adversarial
transfer model expressed as Equation (3).
L ¼ Lcxs;ys
ð Þ þ λLdxs;xt
ð Þ (3)
Lrepresents the ultimate loss of the domain adversarial model;
Lcxs;ys
ð Þ represents the classification loss of source domain; λ
is the parameter weighting two parts.
First of all, establishing the model of Lcxs;ys
ð Þ. Lcneeds to
reflect the classification loss of the annotated dataset, which is
completely consistent with the conventional model, and the
INTERNATIONAL JOURNAL OF GREEN ENERGY 3
Cross Entropy Loss function can be adopted, as shown in
Equation (4).
Lcxs;ys
ð Þ ¼ 1
m1X
m1
i¼1X
C
k¼1
1yðiÞ
s¼k
log fcfsxðiÞ
s
k(4)
In the formula, 1 yið Þ
s¼k
is the indicator function, fsxið Þ
s
represents the feature of source domain, fcfsxið Þ
s
k corre-
sponds to the kth dimension of output category distribution
fcfsxið Þ
s
, m1 represents the sample size of source domain,
C is category size.
Next, establish the model of Ldðxs;xtÞ. According to
Equation (2), the Wasserstein distance calculation needs to
satisfy the 1-Lipschitz constraint. If the function fd with the
parameter θd satisfies the constraint 1-Lipschitz, then the
Wasserstein distance between source domain feature fxxs
ð Þ
and target domain feature ftxt
ð Þ can be expressed as:
Lwdðxs;xtÞ ¼ 1
m1X
xs2Xs
fdðfsðxsÞÞ 1
m2X
xt2Xt
fdðftðxtÞÞ (5)
In the formula, fsxs
ð Þand ftxt
ð Þrepresent respectively as source
domain feature and target domain feature, while the source
domain feature fxxs
ð Þ is fixed and m1 and m2 represent respec-
tively the sample size of source domain and target domain.
In order to enforce compliance with the 1-Lipschitz con-
straint, the author of the literature suggests using the method
of weights clip to address this problem. The proposed method
adds the constraint of gradient norm to the objective function,
which can approximately satisfy the 1-Lipschitz constraint.
The gradient penalty term can be expressed as:
Lpenaltyð^
hÞ ¼ Ñ^
hfdð^
hÞ
21
2(6)
In the formula, the feature representation ^
h not only stands for
the feature of source domain and target domain, but also the
features of the region between the source domain and target
domain feature. Therefore, the specific form of LdXs;Xt
ð Þ can
be described as:
Ldxs;xt
ð Þ ¼ max
θdLwd γLpenalty
(7)
In the formula, γ is the weight coefficient of equilibrium.
So far, the domain adversarial model has been transformed
to solve the following optimization problems:
min
θs;θt;θcLcxs;ys
ð Þ þ max
θdLwd γLpenalty
(8)
In the formula, θs;θt;θc;θd respectively correspond to func-
tions fs;ft;fc;fd with parameters.
2.3. Deep adversarial transfer neural network
Inspired by Generative Adversarial Neural Networks (GANs),
the above four functions fs;ft;fc;fd with parameters can be
separately implemented by neural network. Figure 1 shows
the forward propagation and back propagation process of the
network, where fs;ft;fc;fd respectively correspond to the
source domain feature extractor, target domain feature extrac-
tor, classifier and discriminator. Unlike GANs, the function of
generator changes, which no longer generates new samples,
but instead plays the role of feature extraction: it constantly
learns the features of domain data, which makes it impossible
for the discriminator to distinguish between the two domains.
The main function of the feature extractor is mapping the
given data to the feature distribution space. It is realized
through a series of simple transformation of data mapping
from the input to the features. Compared with other network
Figure 1. The forward and back propagation process of deep adversarial transfer neural network.
4Y. MA ET AL.
structures, the feature extractor has deeper layers and more
complex layer structures, which ensures that the distinguish-
able features can be generated from complex original data,
while the specific network layer structure of feature extractor
relies on the form of input data. The classifier is used to carry
out the final classification task. Its network layer is fully con-
nected with shallow depth, usually only one or two layers. The
discriminator mainly evaluates the distribution differences of
input features so as to determine whether the input feature
comes from the source domain or the target domain. The
structure of discriminator is also relatively simple, which
usually has two layers of fully connected layers network. The
detailed configurations of each neural network’s structure are
given in the following experimental cases.
It needs to be noted that the network structures of source
and target domain feature extractor are the same. The target
feature extractor is trained under unsupervised conditions.
Compared with the source domain feature extractor, training
difficulty of target feature extractor is greatly increased, while
data distribution of source domain and target domain has
a certain similarity. The trained parameters in source domain
are taken as initial values to start fine-tuning
[22]
the para-
meters of target domain feature extractor, which could not
only transfers the feature structure information learned from
source domain, but make the target domain feature extractor
converge faster to a reasonable result.
Since the left and right terms in (8) respectively correspond
to different optimization parameters, they can be divided into
two optimization sub problems (9) and (10) to be solved
separately. While sub-problem (9) corresponds to
a supervised learning process in the source domain, the sub-
problem (10) corresponds to the WGAN. Therefore, the sol-
ving process of adversarial transfer model can be divided into
two stages: the pre-training stage and the domain adversarial
Figure 2. The training and inferencing procedures of deep adversarial transfer neural network.
Figure 3. The flow chart of the algorithm.
INTERNATIONAL JOURNAL OF GREEN ENERGY 5
training stage. Figure 2 shows the training and inferencing
procedures of deep adversarial transfer neural network.
cmin
θs;θcLcxs;ys
ð Þ (9)
cmin
θt
max
θdLwd γLpenalty
(10)
Based on the above modeling process, the training algorithm
of the deep adversarial transfer neural network is as follows.
Figure 3 shows the flow chart of the algorithm.
Algorithm 1 Deep Adversarial Transfer Neural Network
Require: source data Xs and label Ys; target data Xt; minibatch
size m; discriminator training step nd; coefficient γ;λ; learning
rate for source domain feature extractor α1, classifier α2, target
domain feature extractor α3 and discriminator α4.
1. Initial source domain feature extractor parameters θs ran-
domly, initial classifier parameters θc randomly.
2. while θs;θc have not converged do
3. Sample minibatch pair xið Þ
s;yið Þ
s
n om
i¼1 from Xs and Ys.
4. θs θsα1ÑθsLcxs;ys
ð Þ
5. θc θcα2ÑθcLcxs;ys
ð Þ
6. end while
7. Initial target domain feature extractor parameters θt θs,
initial discriminator parameters θd randomly.
8. while θt;θd have not converged do
9. for t¼1;. . . ;n do
10. Sample minibatch xið Þ
s
n om
i¼1 xið Þ
t
n om
i¼1from Xs and Xt.
11. Sample a random number ,U½0;1.
12. hs fsxs
ð Þ, ht ftxt
ð Þ
13. ^
h hsþ ð1Þht
14. θd θdþα3ÑθdLwd xs;xt
ð Þ γLpenalty ^
h
h i
15. end for
16. θt θtα4Ñθt1mPm
i¼1fdftxt
ð Þð Þ
17. end while
3. Case Analysis of wind turbine gearbox fault
diagnosis
In this paper, two kinds of data sets are used for case analysis.
One is the auxiliary data set with fault labels (referred as source
domain data in this paper), and the other is the gearbox data of
wind turbine without fault labels (referred as target domain
data in this paper). The purpose of this paper is to realize the
fault diagnosis of gearbox in the target domain under the
unsupervised learning by applying the fault diagnosis model
based on deep adversarial transfer neural network proposed in
this paper.
The source data used in this paper is from the bearing fault
data of the bearing data center of Case Western Reserve
University in America
[29]
. The test bearings support the
motor shaft. Single point faults were introduced to the test
bearings using elector-charge machining with fault diameters
of 7 mils, 14 mils, 21 mils, 28 mils, and 40 miles. SKF bearings
were used for the 7, 14, and 21 mils diameters fault, and NTN
equivalent bearings were used for the 28 mil and 40 mil faults.
The experiments were conducted for drive end bearings with
inner and outer raceway faults and the outer raceway faults
were located at 3 o’clock, at 6 o’clock, and at 12 o’clock. The
accelerometers were placed at the 12 o’clock position at the
motor housing to collect the normal and fault vibration signals
with the 16 channel DAT recorder. Digital data was collected
at 48,000 samples per second, and data was also collected at
48,000 samples per second. Speed and horsepower data were
collected using the torque transducer and were recorded in real
time. Since the fault category space of bearing is different from
gearbox, the fault types of bearing are merged into one class to
meet the requirement of the model. The source domain is
divided into two categories, normal and the fault, and labeled
respectively with 0 and 1. The size and percentage of various
fault samples in the source domain are shown in Table 1.
The target domain data used in this implementation is from
the monitoring gearbox vibration data of a 1.5 MW wind
turbine in north China. Considering the complexity and varia-
bility of the actual working condition of wind turbines, 406 sets
of gearbox vibration velocity signals in the radial and axial
direction of high-speed shaft and low-speed shaft ends were
selected at the rotational speed of 908, 914, 929, 949, 1013,
1069 and 1498rpm. The gearbox dataset contains 4 states,
which are normal, gear wearing, broken tooth and mechanical
loose. The sampling frequency of the selected data in the
implement is 5120 Hz, and the sampling amount of each
sample is 8192 points. In order to correspond to category
space of source domain, the dataset of target domain is also
divided into two categories, normal and fault, and labeled with
0 and 1 respectively. The sample size and percentage of various
fault samples in the target domain are shown in Table 2.
Table 1. The sample size and percentage of various fault samples in the source domain.
Fault location Normal Inner race Ball Outer race(3:00) Outer race(6:00) Outer race(12:00)
Sample Size 1696 3390 3389 2298 2181 1456
Percentage 11.8% 23.5% 23.5% 15.9% 15.1% 10.1%
Fault Label 0 1 1 1 1 1
Table 2. The sample size and percentage of various fault samples in the target domain.
Fault Location Normal Gear Wearing Broken Tooth Mechanical Loose
Sample Size 232 56 72 46
Percentage 57.1% 13.8% 17.7% 11.3%
Fault Label 0 1 1 1
6Y. MA ET AL.
Although there are many differences between the
laboratory bearing fault simulation data and the actual
wind turbine gearbox data, they still have the similarity
to a certain extent. The main simulation of bearing fault
experiment of Case Western Reserve University is bearing
the surface defect. For the rotating machinery, the surface
defect is usually caused by fatigue spalling, which has the
same mechanism as gear wearing and tooth fracturing. In
addition, if there is an early surface defect of the bearing,
an impact will be generated when the bearing contact
passes through the defect, which will stimulate the corre-
sponding feature frequency. While the harmonic compo-
nent will be also generated in vibration signal, when the
faults like gearbox wearing, tooth fracturing, mechanical
loose occur in gearbox. Therefore, it is convincing that
these faults can trigger corresponding feature frequency.
Even the frequency is quite different. Based on the above
two reasons and empirical analysis, we think that there is
certain similarity between the data of source domain and
target domain.
3.1. Data pre-processing
Through the observation of the source domain data and
target domain data, it is found that the two types of data
have great differences. The specific differences are as follows:
(1) Sampling frequency; (2) Data quality (SNR); (3) Data
size; (4) Data dimension; (5) Data distribution. To facilitate
modeling, the following data pre-processing steps are
performed:
(1) Implement down sampling for source domain data. In
view of the difference of sampling frequency in two
domains, the source domain data is down sampled to
reach 5120 Hz to be consistent with the target data sample
rate.
(2) Randomly divide training sets and testing sets. The
source domain and target domain were respectively
divided into training sets and testing sets with the
proportion of 75% (training) and 25% (testing), i.e.
source domain training data, source domain testing
data, target domain training data, target domain
testing data. Among them, source domain training
data and target domain training data are used to
train model, while the source domain testing data is
set to test the effect of pre-training, and the target
domain testing data is to test the final diagnosis
effect on target domain.
(3) Split each data with invariant time window to obtain
the vibration fragments of the same dimension. Since
the data size of the source domain is large, in order to
facilitate the analysis, the source domain data is
divided into a set of every 2048 points and a total
of 3077 sets of vibration acceleration fragments. The
original data in the target domain was also divided
into a set of every 2048 points and a total of 406 sets
of fragments.
(4) Implement the short-term Fourier transform (STFT)
to the vibration acceleration fragments. Although the
vibration acceleration signals of bearing and gearbox
are time-varying signals, the frequency component
varies little over time. Therefore, the short-term
Fourier transform can be applied to achieve a good
result. What’s more, the time-frequency spectrum
can be used in advanced neural network structure,
such as the two-dimensional conventional neural
network.
Figure 4 and Figure 5 respectively present the images of the
vibration acceleration fragments of source domain and target
domain after the pre-processing, which are drawn in time
domain and frequency domain. Figures 6 and Figure 7 respec-
tively present the time-frequency spectrum of source domain
and target domain.
3.2. The conguration of neural network
Since the input of the feature extractor is the time-frequency
spectrum of vibration acceleration fragments, which has the
data structure in the form of two-dimensional matrix, the
conventional neural network structure is adopted to the feature
extractor; while the classifier and discriminator are relatively
simple, the fully connected neural network can meet the
requirement of them. The configuration of each network is
shown in Table 3.
According to Table 3, the feature extractor is consist of 7
layers of network structure. The first layer (convolution layer) is
composed of 8 mappings(channels). Each neuron specifies
a receiving domain of size 3 × 3, and the neurons share 3 × 3
weight parameters; The second layer, batch normalization, sets 8
Table 3. The configuration of network structure.
Feature extractor network structure
(same in source domain and target domain) Classifier network structure Discriminator network structure
Network input (51×55 vibration spectrum) Network input (1×18, feature vector) Network input (1×18, feature vector)
3×3 conv,8
Batch Normalization,8
Max pooling/4
ReLU(Activation function)
3×3 conv,16
Batch Normalization,16
Max pooling/4
ReLU(Activation function)
FC,18
FC,18
ReLU(Activation function)
FC,2
Softmax(Activation function)
FC,18
ReLU(Activation function)
FC,18
ReLU(Activation function)
FC,2
Network output(18 dimensions feature vector) Network output (2 dimensions feature vector) Network output (2 dimensions feature vector)
INTERNATIONAL JOURNAL OF GREEN ENERGY 7
mapping channels; The third layer, pooling layer, reduces the
size by 4 times; The fourth layer (convolution layer) is composed
of 16 mappings(channels). Each neuron specifies a receiving
domain of size 3 × 3, and the neurons share 3 × 3 weight para-
meters; The fifth layer, batch normalization, sets 16 mapping
channels; The sixth layer, pooling layer, reduces the size by 4
times; The seventh layer, the fully connected layer before output
function, consists of 18 output neurons, which are constructed
together as the output vector.
The classifier is composed of two layers of network struc-
ture. The first layer (fully connected layer) consists of 18
neurons and introduces activation function ReLU;
The second layer consists of 2 neurons and activation function
Softmax is introduced to output the final classification results.
The discriminator is composed of three layers of network
structure: The first layer (fully connected layer) consists of 18
neurons and introduces activation function ReLU; The second
layer consists of 2 neurons and introduces activation function
Softmax; The third layer consists of 2 neurons and outputs the
domain discriminative result.
3.3. Case analysis
3.3.1. Case design
In order to fully validate the applicability and advantages of the
model proposed in this paper, four cases C1,C4
ð Þ of different
testing conditions are designed and described as follows:
Figure 4. The images of source domain samples in time domain and frequency domain.
8Y. MA ET AL.
Figure 5. The images of target domain samples in time domain and frequency domain.
Figure 6. The time-frequency spectrum of source domain samples.
INTERNATIONAL JOURNAL OF GREEN ENERGY 9
C1: combine the source domain feature extractor and clas-
sifier, implement supervised learning on the source domain,
and validate the diagnostic effect on source domain.
C2: replace the target domain feature extractor with the
trained feature extractor on the source domain and validate
the diagnostic effect on target domain.
C3: train the network with the fault diagnosis method of
deep adversarial transfer neural network proposed in this
paper and validate the diagnostic effect on target domain.
C4: combine the target domain feature extractor and classi-
fier, implement supervised learning on the target domain, and
validate the diagnostic effect on target domain.
The above four scenarios C1,C4 are transition from
supervised learning on the source domain to the supervised
learning on the target domain. Case 1 is the pre-training
process in essence. Since there are numerous labeled data
on the source domain, high diagnostic performance is easily
achieved in C1. As the comparison case, the purpose of
scenario C2 is to reflect the data distribution difference
between source domain and target domain. If the distribu-
tions of these two domains are close, then the diagnostic
performance of C2 can be ideal, otherwise will be disappoint-
ing. The scenario C3 is designed for the deep adversarial
transfer neural network proposed in this paper. In this sce-
nario, the deep adversarial transfer neural network method
transfers the diagnostic information from the source domain
to the target domain to achieve the unsupervised learning on
the target domain. The scenario C4 is the supervised learning
on the target domain. Similarly, the purpose is to be com-
pared with scenario C3, and theoretically scenario C4 will
realize the best result on the target domain.
3.3.2. Evaluation criterion
In order to facilitate the evaluation of the model performance,
the single-number evaluation metric is needed to reflect the
diagnostic performance of the model. The combination of
Precision and Recall cannot be used as Single-number
Evaluation Metric, because they present two values to estimate
the classifier, while the application of multi-numbers evaluation
metric increases the difficulty of comparing the diagnostic
performance.
Accuracy is the single-number evaluation metric, which has
been commonly used to evaluate the diagnostic performance of
classifier. However, the accuracy is not suitable as the perfor-
mance evaluation criterion for fault diagnostic model, since the
amount of fault category in testing set is far less than the normal
category. If the accuracy is adopted as the standard, the diag-
nostic model is not able to accurately reflect the performance of
the less category (i.e. fault category). In contrast, F1 is the
harmonic mean of precision and recall value, which can reflect
the average level of the diagnostic system on imbalance dataset.
Therefore, F1 value is adopted as the single-number evaluation
metric in this paper, the calculation formulas of F1 value are as
follows:
P¼TP
TPþFP (11)
R¼TP
TPþFN (12)
F1¼2PR
PþR(13)
In the formulas, TP denotes the sample size of the positive
categories predicted as positive; FN denotes the sample size of
Figure 7. The time-frequency spectrum of target domain samples.
10 Y. MA ET AL.
the positive categories predicted as negative; FP denotes the
sample size of the negative categories predicted as positive; TN
denotes the sample size of the negative categories predicted as
negative; P represents accuracy rate; R represents recall rate.
3.3.3. Result analysis
The software environment of the case implement is
Ubuntu16.04, Python3.5, PyTorch0.3, and the hardware envir-
onment is two Intel Xeon E5-2680v4 Server Processor@ 2.4 GHz,
128GB memory, two NVIDIA GTX-1080 Graphics Processor.
The results of four cases are shown in Table 4, and F1 score,
precision value, recall value of each type of state in each
scenario are given in Table 4. It can be seen from the result
of scenario C1. If the supervised training is implemented with
labeled bearing data, the classification of the normal and fault
state can be perfectly achieved as expected.
From the result of scenario C2, we can tell that, it has
a significant influence on diagnostic performance when apply-
ing the source domain feature extractor directly on the target
domain. The classifier tends to determine the samples as fault
in this case, which indicates the difference between the dis-
tribution of the source domain and target domain data is
relatively large on the one hand. On the other hand, it is
demonstrated that the transfer performance of conventional
supervised learning is undesirable. Therefore, more advanced
technology must be adopted to address the problem of
distribution difference between source domain and target
domain.
It can be seen from the result of scenario C3 that the
diagnostic performance F1 score on average of deep adversar-
ial transfer neural network model achieved 90% under the
condition of unsupervised learning, and the precision rate
and recall rate are 100% and 78% respectively, which means
this method has a extremely low false alarm rate and accepta-
ble missed alarm rate. This result demonstrates the excellent
feature extractor ability and transfer learning ability.
Compared with the realization conditions of the other cases,
the method proposed in this paper only has the requirement
for the labeled dataset in the similar field and realizes
a satisfactory performance, which greatly reduces the time
and cost for developing the fault diagnosis system and demon-
strates the feasibility to some extent.
In the result of scenario C4, the F1 value also reached 100%,
which indicates that sufficiency of labeled fault samples can
significantly improve the performance of the diagnosis system.
However, marking the real fault data of gearbox is time-
consuming and costly. Therefore, the labeled fault data col-
lected in real operation is not enough to drive the supervised
learning of deep neural network. Due to this realistic reason,
scenario C4 cannot be applied to real gearbox fault diagnosis
problem. From the comparison of C3 and C4, it has been found
that the method proposed in this paper offers a seminal
Table 4. Case result.
Case Actual state Precision rate Recall rate Harmonic mean F1Sample size
C1Normal 1.00 1.00 1.00 114
Fault 1.00 1.00 1.00 656
Average/Total 1.00 1.00 1.00 770
C2Normal 0.00 0.00 0.00 56
Fault 0.45 1.00 0.62 45
Average/Total 0.20 0.45 0.27 101
C3Normal 0.85 1.00 0.92 56
Fault 1.00 0.78 0.88 45
Average/Total 0.92 0.90 0.90 101
C4Normal 1.00 1.00 1.00 56
Fault 1.00 1.00 1.00 45
Average/Total 1.00 1.00 1.00 101
Figure 8. the visualized distribution of dimensionality reduction feature on the source and target domaina) conventional supervised learning method (C1andc2) b)
deep adversarial transfer neural network(c3).
INTERNATIONAL JOURNAL OF GREEN ENERGY 11
thought and direction for the development of data-driven fault
diagnosis system.
3.3.4. Feature visualization
In order to further illustrate the advantage of proposed method
in solving the problem of data distribution differences, the
t-SNE dimensionality reduction method (van der Maaten
and Hinton 2008) is adopted to visualize the output feature
distribution of feature extractor, where red and blue respec-
tively represent the positive and negative samples (i.e., fault
and normal) of the source domain, and purple and green
respectively represent the positive and negative samples of
the target domain.
As can be observed from Figure 8a, the features of source
domain have been obviously distinguished in the scenario C1.
While in the scenario C2, the features of the target domain
couldn’t be completely distinguished, and there are still some
purple and green points mixed in the upper left corner, which
indicates the huge difference of feature marginal distribution.
In addition, it also can be seen from Figure 8a that the dis-
tribution distance of positive samples in the source domain
and target domain, and that of negative samples in the source
domain and target domain are extremely large, and the coin-
cidence rate is also quite low, which reflects the huge difference
in the conditional probability distribution between source and
target domain in feature space.
As can be seen from the feature distribution in Figure 8b, in
the scenario C3 the distribution correlation of the target
domain data and source domain data has been clearly
improved in feature space. The distances of positive samples
and negative samples increase greatly and the distances of the
same type are much smaller, which indicates that the deep
adversarial transfer neural network not only realizes the
matching of the marginal distribution for the source domain
and target domain, but the matching of conditional probability
distribution as well. It is found from the comparison of the left
and right figures that the deep adversarial transfer neural net-
work has made considerable improvement in addressing the
problem of large data distribution difference in cross-domain
transferring.
3.4. Results and discussion
Through the analysis of the four cases, the deep adversarial
transfer neural network diagnostic model proposed in this
paper shows two advantages of the application of wind turbine
gearbox fault diagnosis. One is the excellent effect of unsuper-
vised learning, the other is the improvement on huge differ-
ences data distribution problem in cross-domain transferring.
It is noteworthy that the category space is only divided into
normal type and fault type in the case analysis, while the
method proposed in this paper is suitable for multi-class
fault diagnosis problems. In addition, the case analysis is
mainly implemented in the scenario of unsupervised learning,
while this method is also applicable in the semi-supervised
scenarios.
Besides, taking the advantage of the similarity of wind
turbine structure of the same type and some operating para-
meters, this method can transfer the diagnostic experience of
one wind turbine to the state diagnostic tasks of similar wind
turbines and realize the sharing and promotion of knowledge.
Apart from the above advantages, since the adversarial
training technology is adopted in this method, it is inevi-
table to find that the existing problem of training diffi-
culty. Although the Wasserstein distance with better
performance is applied to the proposed model to measure
the difference between domains, it is still necessary to
adjust the hyper-parameter in the training process and
take long training time. While, as the theory of GANs
and the training technology become increasingly mature,
it is expected that these problems will be solved in the near
future.
4. Conclusion
In this paper, we put forward the method of deep adversarial
transfer neural network and apply it to the fault diagnosis of
wind turbine gearbox. The conclusions are as follows:
(1) Thanks to the powerful feature representation ability of
the deep learning model, this method can discover the
ignored mechanism, laws and knowledge of perfor-
mance degradation of the wind turbine from the mon-
itoring data and the auxiliary data. Better fault
representation can also be extracted automatically
from the vibration data, avoiding the limitation of the
artificial feature engineering, and realizing the gener-
icized technology of fault feature extraction.
(2) This approach is inspired by the thought of the transfer
learning technology, and creatively uses the auxiliary
data from laboratory, transfers the features learned
from the auxiliary data to the actual monitoring data
and establishes the fault diagnosis model under unsu-
pervised condition, which to some extent reduces the
dependence of the deep learning model on the actual
monitoring labeled data of the wind turbine.
(3) To some extent, the method solves the problem of huge
data distribution differences in cross-domain transfer-
ring, realizes the transfer of fault diagnosis model in
similar domains, provides the approach to realize the
transfer of diagnostic experience in similar domains,
and provides a new direction for establishing the fault
diagnosis model based on the data-driven method.
Disclosure statement
No potential conflict of interest was reported by the author(s).
Funding
The work was supported by the the National Key Research and
Development Program of China [No.2019YFE0104800].
References
Aafif, Y., A. Chelbi, L. Mifdal, S. Dellagi, and I. Majdouline. 2022. Optimal
preventive maintenance strategies for a wind turbine gearbox. Energy
Reports 8:803–14. doi:10.1016/j.egyr.2022.07.084.
12 Y. MA ET AL.
Arjovsky, M., S. Chintala, and L. Bottou. 2017. Wasserstein GAN. arXiv.
doi:10.48550/arXiv.1701.07875.
Chen, C., F. Shen, and R. Q. Yan. 2017. Enhanced least squares support
vector machine-based transfer learing strategy for bearing fault
diagnosis. Chinese Jpurnal of Scientific Instrument 38 (01):33–40.
doi:10.19650/j.cnki.cjsi.2017.01.005.
Dabrowski, D., and A. Natarajan. 2017. Identification of loading condi-
tions resulting in roller slippage in gearbox bearings of large wind
turbines. Wind Energy 20 (8):1365–87. doi:10.1002/we.2098.
Dhiman, H. S., D. Deb, S. M. Muyeen, and I. Kamwa. 2021. Wind turbine
gearbox anomaly detection based on adaptive threshold and twin
support vector machines. IEEE Transactions on Energy Conversion
36 (4):3462–69. doi:10.1109/TEC.2021.3075897.
Feng, Y., Y. Qiu, C. J. Crabtree, H. Long, and P. J. Tavner. 2013.
Monitoring wind turbine gearboxes. Wind Energy 16 (5):728–40.
doi:10.1002/we.1521.
Ganin, Y., E. Ustinova, H. Ajakan, P. Germain, H. Larochelle,
F. Laviolette, M. Marchand, and V. Lempitsky. 2016. Domain-
adversarial training of neural networks. arXiv. http://arxiv.org/abs/
1505.07818 .
Gulrajani, I., F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville.
2017. Improved training of wasserstein GANs. In Advances in
neural information processing systems 30 (Nips 2017), ed. I. Guyon,
U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan,
and R. Garnett. Vol. 30. La Jolla: Neural Information Processing
Systems (nips). https://www.webofscience.com/wos/alldb/sum
mary/eb1a14a1-2b45-4785-9ddd-4c974b661fbe-4e83ebd9/rele
vance/1 .
Isham, M. F., M. S. Leong, M. H. Lim, and Z. A. Bin Ahmad. 2019.
Intelligent wind turbine gearbox diagnosis using VMDEA and ELM.
Wind Energy 22 (6):813–33. doi:10.1002/we.2323.
Jamil, F., T. Verstraeten, A. Nowé, C. Peeters, and J. Helsen. 2022. A deep
boosted transfer learning method for wind turbine gearbox fault
detection. Renewable Energy 197:331–41. doi:10.1016/j.renene.2022.
07.117.
Jiang, G., H. He, P. Xie, and Y. Tang. 2017. Stacked multilevel-denoising
autoencoders: A new representation learning approach for wind tur-
bine gearbox fault diagnosis. IEEE Transactions on Instrumentation
and Measurement 66 (9):2391–402. doi:10.1109/TIM.2017.2698738.
Li, Z. 2016. Research on methods of intelligent fault diagnosis for wind
turbine drive train based on unsupervised learning. Doctor, North
China Electric Power University. https://kns.cnki.net/kcms/detail/
detail.aspx?dbcode=CDFD&dbname=CDFDLAST2017&filename=
1016271653.nh&uniplatform=NZKPT&v=
0MMvOoFCNWmGd2Z5aFTxyAPVdUOABbPYBc9SeiSCX_
4ZLWD703ok5wDkpxYmUqYX.Liu .
Liu, R., B. Yang, E. Zio, and X. Chen. 2015. Artificial intelligence for fault
diagnosis of rotating machinery: a review. Mechanical Systems and Signal
Processing 108 (August):33–47. doi:10.1016/j.ymssp.2018.02.016.
Long, M., Y. Cao, J. Wang, and M. I. Jordan. 2015. Learning transferable
features with deep adaptation networks. In International Conference on
Machine Learning, ed. F. Bach and D. Blei, vol. 37, 97–105. San Diego:
Jmlr-Journal Machine Learning Research. https://www.webofscience.
com/wos/alldb/summary/2a9f5ee2-f030-45e6-9172-b4487f3c11ad
-4e83e6f9/relevance/1 .
Long, M., H. Zhu, J. Wang, and M. Jordan. 2017. Deep transfer learning
with joint adaptation networks. In International Conference on
Machine Learning, ed. D. Precup and Y. W. Teh, Vol. 70, San Diego:
Jmlr-Journal Machine Learning Research. https://www.webofscience.
com/wos/alldb/summary/24c008a7-291c-4548-bab9-c2e275e5f9be
-4e83dbdb/relevance/1 .
Maheswari, R. U., and R. Umamaheswari. 2017. Trends in non-stationary
signal processing techniques applied to vibration analysis of wind turbine
drive train - a contemporary survey. Mechanical Systems and Signal
Processing 85 (February 15):296–311. doi:10.1016/j.ymssp.2016.07.046.
Nejad, A. R., P. F. Odgaard, and T. Moan. 2018. Conceptual Study of
a gearbox fault detection method applied on a 5-MW spar-type floating
wind turbine. Wind Energy 21 (11):1064–75. doi:10.1002/we.2213.
Pan, S. J., and Q. Yang. 2010. A survey on transfer learning. IEEE
Transactions on Knowledge and Data Engineering 22 (10):1345–59.
doi:10.1109/TKDE.2009.191.
Pang, Y., L. Jia, X. Zhang, Z. Liu, and D. Li. 2020. Design and implemen-
tation of automatic fault diagnosis system for wind turbine. Computers
& Electrical Engineering 87 (October 1):106754. doi:10.1016/j.comp
eleceng.2020.106754.
Pérez-Pérez, E. -J., F. -R. López-Estrada, V. Puig, G. Valencia-Palomo,
and I. Santos-Ruiz. 2022. Fault diagnosis in wind turbines based on
ANFIS and takagi–sugeno interval observers. Expert Systems with
Applications 206 (November 15):117698. doi:10.1016/j.eswa.2022.
117698.
Rahimilarki, R., Z. Gao, N. Jin, and A. Zhang. 2022. Convolutional neural
network fault classification based on time-series analysis for bench-
mark wind turbine machine. Renewable Energy
185 (February 1):916–31. doi:10.1016/j.renene.2021.12.056.
Shen, F., C. Chen, and R. Q. Yan. 2017. Application of SVD and teansfer
learing strategy on motorfault diagnosis. Journal of Vibration
Engineering 30 (01):118–26. (in Chinese).
Shen, J., Y. Qu, W. Zhang, and Y. Yu. 2018. Wasserstein distance guided
representation learning for domain adaptation. Thirty-Second Aaai
Conference on Artificial Intelligence/Thirtieth Innovative Applications
of Artificial Intelligence Conference/Eighth Aaai Symposium on
Educational Advances in Artificial Intelligence, 4058–65, Palo Alto,
Assoc Advancement Artificial Intelligence. https://www.webofscience.
com/wos/alldb/summary/e2d6ebb2-a58e-4ff9-8d82-4fe3363e9e75-
4e841b3a/relevance/1 .
Sobie, C., C. Freitas, and M. Nicolai. 2018. Simulation-driven machine
learning: bearing fault classification. Mechanical Systems and Signal
Processing. 99 (15):403–19. January. doi:10.1016/j.ymssp.2017.06.025.
Tang, X., Y. Xu, X. Sun, Y. Liu, Y. Jia, F. Gu, and A. D. Ball. 2022. Intelligent
fault diagnosis of helical gearboxes with compressive sensing based
non-contact measurements. ISA Transactions. (July 21). https://www.
sciencedirect.com/science/article/pii/S0019057822003779 .
Tzeng, E., J. Hoffman, K. Saenko, and T. Darrell. 2017. Adversarial
discriminative domain adaptation. 30th Ieee Conference on Computer
Vision and Pattern Recognition (Cvpr 2017), 2962–71. New York, Ieee.
doi:10.1109/CVPR.2017.316
Tzeng, E., J. Hoffman, N. Zhang, K. Saenko, and T. Darrell. 2014. Deep
domain confusion: maximizing for domain invariance (version 1).
arXiv. doi:10.48550/arXiv.1412.3474.
Vamvoudakis-Stefanou, K. J., J. S. Sakellariou, and S. D. Fassois. 2018.
Vibration-based damage detection for a population of nominally iden-
tical structures: unsupervised multiple model (MM) statistical time
series type methods. Mechanical Systems and Signal Processing
111 (October 1):149–71. doi:10.1016/j.ymssp.2018.03.054.
van der Maaten, L., and G. Hinton. 2008. Visualizing data using T-SNE.
Journal of Machine Learning Research 9 (November):2579–605. doi:10.
1016/j.ymssp.2016.07.046.
Yosinski, J., J. Clune, Y. Bengio, and H. Lipson. 2014. How transferable
are features in deep neural networks? In Advances in neural informa-
tion processing systems 27 (Nips 2014), ed. Z. Ghahramani, M. Welling,
C. Cortes, N. D. Lawrence, and K. Q. Weinberger. vol. Vol. 27. La Jolla:
Neural Information Processing Systems (nips). https://www.
webofscience.com/wos/alldb/summary/b8582f5b-cc4d-430e-b974
-96ca305ff333-4e83e1da/relevance/1 .
Zgraggen, J., M. Ulmer, E. Jarlskog, G. Pizza, and L. Goren Huber. 2021.
Transfer learning approaches for wind turbine fault detection using
deep learning. PHM Society European Conference 6 (1):12. doi:10.
36001/phme.2021.v6i1.2835.
Zhu, X., R. Wang, Z. Fan, D. Xia, Z. Liu, and Z. Li. 2022. Gearbox fault
identification based on lightweight multivariate multidirectional
induction network. Measurement 193 (April 1):110977. doi:10.1016/j.
measurement.2022.110977.
Zhuang, F. Z., P. Luo, Q. He, and Z. Z. Shi. 2015. Survey on transfer
learning research. Journal of Software 26 (01):26–39. doi:10.13328/j.
cnki.jos.004631.
INTERNATIONAL JOURNAL OF GREEN ENERGY 13
... For example, Lu et al., (2023aLu et al., ( , 2023b introduced MK-MMD technique in shared features channels for common feature extraction. The second is relying on adversarial learning, which introduces a domain discriminant and extracts common features through adversarial training (Ma et al., 2023a(Ma et al., , 2023bShe et al., 2023;Wu et al., 2023). For example, Wang et al., (2023aWang et al., ( , 2023b) developed a multiple source DA module with anchoring adaptor to obtain generalized domain independent diagnostic expertise. ...
... For example, Wang et al., (2023aWang et al., ( , 2023b) developed a multiple source DA module with anchoring adaptor to obtain generalized domain independent diagnostic expertise. The third one is the reconstructionbased method, which maps the source domain features to the target domain and reconstructs them in the target domain (Chen et al., , 2023b(Chen et al., , 2023cGuo et al., 2022;Ma et al., 2023aMa et al., , 2023b. For example, Cao et al., (2020) presented a deep model for guiding multi-task learning, which consisted of a convictional classifier network and a reconstruction network. ...
Article
Full-text available
Transfer learning methods have received abundant attention and extensively utilized in cross-domain fault diagnosis, which suppose that the label sets in the source and target domains are coincident. However, the open set domain adaptation problem which include new fault modes in the target domain is not well solved. To address the problem, an unknown-class recognition adversarial network (UCRAN) is proposed for the cross-domain fault diagnosis. Specifically, a three-dimensional discriminator is designed to conduct domain-invariant learning on the source domain, target known domain and target unknown domain. Then, an entropy minimization is introduced to determine the decision boundaries. Finally, a posteriori inference method is developed to calculate the open set recognition weight, which are used to adaptively weigh the importance between known class and unknown class. The effectiveness and practicability of the proposed UCRAN is validated by a series of experiments. The experimental results show that compared to other existing methods, the proposed UCRAN realizes better diagnosis performance in different domain transfer task.
... In terms of wind turbine condition monitoring (WTCM), based on different signal types such as vibration signal (Han et al. 2022;Pichika et al. 2022;Zemali et al. 2023), oil signal (López de Calle et al. 2019), sound signal (Z. Ma et al. 2023), thermal imaging (Dollinger et al. 2018), laser signal (Dilek et al. 2019), ultrasonic signal (Oliveira et al. 2020), image (Wu et al. 2019), and pressure signal (Dimassi et al. 2021), many condition monitoring methods of wind turbine key components were proposed by using signal analysis and image recognition technologies. However, the above-mentioned condition monitoring methods require additional hardware equipment (i.e., acquisition, transmission, and storage) to be installed on wind turbines, which is costly and difficult to apply on a large scale. ...
... In terms of wind turbine fault diagnosis (WTFD), machine learning algorithms such as k-nearest neighbor (KNN) (Ng and Tiong Lim 2022), support vector machine (SVM) (Dhiman, Dipankar Deb, and Kamwa 2021), random forest (RF) , isolated forest (IF) (Meyer 2022), deep belief network (DBN) (H. Wang et al. 2020), convolutional neural network (CNN) (Meyer 2022;Yang et al. 2022),and transfer learning (TF) Ma et al. 2023) have got a widespread application. Yang et al. (Yang et al. 2022)constructed a bearing anomalies detection model with high noise immunity by combining a two-dimensional convolutional neural network (2DCNN) with the random forest algorithm. ...
Article
Full-text available
To reduce the significant economic losses caused by the fault deterioration of wind turbine generators, it is urgent to detect and diagnose the early faults of generators. The existing condition monitoring and fault diagnosis (CMFD) methods have disadvantages of less considering data temporal characteristic, acquiring early faults with difficulty, and having lower diagnostic accuracy. To address those limitations, a novel LSDAE-stacking CMFD method of generators was proposed. Specifically, a multivariate spatio-temporal condition monitoring model (LSDAE) was established by combining the LSTM and SDAE networks, which can detect generator early anomalies through real-time monitoring the reconstruction residual. Then, based on the stacking ensemble algorithm, a multi-classification fault diagnosis model (Stacking) was constructed to identify early fault types, which can integrate advantages of different base-classifiers to achieve a better diagnostic accuracy. Case studies on three actual generator failures were employed to validate the effectiveness and accuracy of the proposed LSDAE-stacking method. The results illustrated that, compared with conventional SDAE model, the proposed LSDAE model had higher reconstruction precision and superior early-fault-warning capacities. And compared with traditional algorithms such as SVM, RF, AdaBoost, GBDT and XGBoost, the constructed Stacking model can effectively identify the fault types of generators and had higher diagnostic accuracy.
Article
Full-text available
Wind park operators start to recognize the cost-effectiveness of intelligent maintenance solutions for wind turbines based on the readily available 10-minute SCADA data. In particular, recent advances have shown that deep learning algorithms can enhance the performance and robustness of fault detection algorithms which are fed with such SCADA data. In order to deploy deep learning fault detection algorithms, a large amount of historical data is needed. In case the data is not available for a certain turbine, training the algorithms becomes challenging. The common approaches in this case are referred to as transfer learning or domain adaptation methods, which attempt to allow the transfer of knowledge between different machines. In this paper we explore the main challenges of domain adaptation for fault detection based on wind turbine SCADA data. We focus on practical use cases, stemming from the commercial need to deploy fault detection algorithms for newly installed turbines, or turbines with little historical data under diverse operating conditions. We analyze different reasons for domain shifts between turbines, which require the development of new domain adaptation approaches beyond the ones familiar for other PHM applications, and present results for several of these challenging cases.
Article
Full-text available
This paper investigates two maintenance strategies for wind turbine gearboxes. The first one is frequently adopted in practice. It consists in monitoring the state of the gearbox through its temperature. As soon as the latter reaches a predefined threshold level, production rate is drastically reduced by slowing down the wind turbine while cooling the gearbox for a certain period before recovering the desired output rate. As it becomes more frequent with time, the wind turbine operators will decide to renew the gearbox. The latter is replaced by a new identical one or submitted to an overhaul based only on the judgement of the maintenance agents. For this first strategy, an analytical model is developed to optimize the renewal period of the gearbox considering the balance between the cost of production loss and cooling each time the threshold temperature is reached, and the cost of renewal. The second strategy is a new one proposed in this paper. It suggests performing an imperfect preventive maintenance (PM) action each time the temperature threshold is reached, reducing hence the failure rate of the gearbox to a value between the current one and the one of a new gearbox. The imperfect preventive action is performed N times before the gearbox must be renewed. A mathematical model is also developed to simultaneously find the optimal number of PM actions to be performed before renewing the gearbox, and the optimal period for the maintenance crew to start the PM or renewal action after the instant at which the temperature threshold level is exceeded. This period being longer or shorter depending on the logistics in place to move the maintenance crew to the site and prepare for the intervention. Numerical examples are presented, a sensitivity analysis is performed, and the two strategies are compared. Optimal solutions are obtained for each strategy. Also, the result of the comparison shows that each strategy can be more economical depending on the reliability of the gearbox, and the different costs incurred, particularly the PM and the renewal related logistics costs.
Article
Full-text available
Deep learning methods have become popular among researchers in the field of fault detection. However, their performance depends on the availability of big datasets. To overcome this problem researchers started applying transfer learning to achieve good performance from small available datasets, by leveraging multiple prediction models over similar machines and working conditions. However, the influence of negative transfer limits their application. Negative transfer among prediction models increases when the environment and working conditions are changing continuously. To overcome the effect of negative transfer, we propose a novel deep transfer learning method, coined deep boosted transfer learning, for wind turbine gearbox fault detection that prevents negative transfer and only focuses on relevant information from the source machine. The proposed method is an instance-based deep transfer learning method that updates the weights of the source and the target machine training samples separately. The weights of different source training samples are gradually decreased to reduce the impact on the final model. The proposed method is verified by the Case Western Reserve University bearing and real field wind farm datasets. The results show that the proposed method ignores negative transfer and achieves higher accuracy compared to standard deep learning and deep transfer learning methods.
Article
Full-text available
Data-driven condition monitoring reduces downtime of wind turbines and increases reliability. Wind turbine operation and maintenance (O&M) cost is a significant factor that calls for automated fault detection systems in wind turbines. In this manuscript, the anomaly detection problem for wind turbine gearbox is formulated based on adaptive threshold and twin support vector machine (TWSVM). In this work, SCADA data from wind farms located in the UK is considered with samples from twelve months before failure, and from one month before failure. Gearbox oil and bearing temperatures are used as two univariate time-series for analyzing adaptive threshold. The effectiveness of the proposed method is compared with standard classifiers like support vector machines (SVM), k-nearest neighbors (KNN), multi-layer perceptron neural network (MLPNN), and decision tree (DT). Anomaly detection of wind turbine gearbox using TWSVM and adaptive threshold results in an accurate performance, thus increasing the reliability. The missed failure and false positive rate that indicate the proposed methodology's ability is also investigated to discriminate between false alarms, and comparison with previous studies shows superior performance.
Article
Domain adaptation aims at generalizing a high-performance learner on a target domain via utilizing the knowledge distilled from a source domain which has a different but related data distribution. One solution to domain adaptation is to learn domain invariant feature representations while the learned representations should also be discriminative in prediction. To learn such representations, domain adaptation frameworks usually include a domain invariant representation learning approach to measure and reduce the domain discrepancy, as well as a discriminator for classification. Inspired by Wasserstein GAN, in this paper we propose a novel approach to learn domain invariant feature representations, namely Wasserstein Distance Guided Representation Learning (WDGRL). WDGRL utilizes a neural network, denoted by the domain critic, to estimate empirical Wasserstein distance between the source and target samples and optimizes the feature extractor network to minimize the estimated Wasserstein distance in an adversarial manner. The theoretical advantages of Wasserstein distance for domain adaptation lie in its gradient property and promising generalization bound. Empirical studies on common sentiment and image classification adaptation datasets demonstrate that our proposed WDGRL outperforms the state-of-the-art domain invariant representation learning approaches.
Article
Helical gearboxes play a critical role in power transmission of industrial applications. They are vulnerable to various faults due to long-term and heavy-duty operating conditions. To improve the safety and reliability of helical gearboxes, it is necessary to monitor their health conditions and diagnose various types of faults. The conventional measurements for gearbox fault diagnosis mainly include lubricant analysis, vibration, airborne acoustics, thermal images, electrical signals, etc. However, a single domain measurement may lead to unreliable fault diagnosis and the contact installation of transducers is not always accessible, especially in harsh and dangerous environments. In this article, a Compressive Sensing (CS)-based Dual-Channel Convolutional Neural Network (CNN) method was proposed to accurately and intelligently diagnose common gearbox faults based on two complementary non-contact measurements (thermal images and acoustic signals) from a mobile phone. The raw acoustic signals were analysed by the Modulation Signal Bispectrum (MSB) to highlight the coupled modulation components relating to gear faults and suppress the irrelevant components and random noise, which generates a series of two-dimensional matrices as sparse MSB magnitude images. Then, CS was used to reduce the image redundancy but retain key information owing to the high sparsity of thermal images and acoustic MSB images, which significantly accelerates the CNN training speed. The experimental results convincingly demonstrate that the proposed CS-based Dual-Channel CNN method significantly improves the diagnostic accuracy (99.39% on average) of industrial helical gearbox faults compared to the single-channel ones.
Article
Wind turbine power generation is becoming one of the most critical renewable energy sources. As wind power grows, there is a need for better monitoring and diagnostic strategies to maximize energy production and increase its security. In this paper, a fault diagnosis approach based on a data-driven technique, which represents the system behavior employing a Takagi–Sugeno (TS) model, is developed. An adaptive neuro-fuzzy inference system (ANFIS) method is used to obtain a set of polytopic-based linear representations and a set of membership functions to interpolate the linear models of the convex TS model. Then, considering the TS model, a fault diagnosis strategy based on convex state observers generate residuals to detect and isolate sensor faults. Unlike other methods, this proposal only needs to be trained with fault-free data. The proposed methodology is tested under different fault scenarios on a well-known wind turbine benchmark built upon fatigue, aerodynamics, structures, and turbulence (FAST). The results demonstrate the method’s effectiveness in detecting and isolating different sensor faults.
Article
The fault diagnosis of the wind turbine gearbox is of great significance for improving the safety of the unit operation and reducing the downtime. Therefore, aiming at the contradiction between diagnostic accuracy and complexity of diagnostic model in a noisy environment, this paper studies it and proposes Lightweight multivariate and multi-directional induction network (LM-MDINet). This method designs dense separable blocks (DS- Blocks) to enhance deep feature extraction. At the same time, by decoupling the mapping relationship between the space and the channel, the amounts of parameters are reduced. In addition, a multivariate and multi-directional induction (M-MDI) layer has been added to guide the network towards the expression of effective fault information to enhance the network's ability to learn effective information. The experimental results show that the proposed method has outstanding comprehensive performance in noisy environment to compare with other methods.
Article
Fault detection and classification are considered as one of the most mandatory techniques in nowadays industrial monitoring. The necessity of fault monitoring is due to the fact that early detection can restrain high-cost maintenance. Due to the complexity of the wind turbines and the considerable amount of data available via SCADA systems, machine learning methods and specifically deep learning approaches seem to be powerful means to solve the problem of fault detection in wind turbines. In this article, a novel deep learning fault detection and classification method is presented based on the time-series analysis technique and convolutional neural networks (CNN) in order to deal with some classes of faults in wind turbine machines. To validate this approach, challenging scenarios, which consists of less than 5% performance reduction (which is hard to identify) in the two actuators or four sensors of the wind turbine along with sensors noise are investigated, and the appropriate structures of CNN are suggested. Finally, these algorithms are evaluated in simulation based on the data of a 4.8 MW wind turbine benchmark and their accuracy approves the convincing performance of the proposed methods. The proposed algorithm are applicable to both on-shore and off-shore wind turbine machines.
Article
Operation of wind turbines under fault state will directly affect the power output efficiency of wind farms. This paper proposes a new automatic fault diagnosis method for wind turbines. A fault diagnosis system framework is constructed and data of vibration status of wind turbines collected is processed and used for fault diagnosis. Firstly, wavelet coefficients are obtained using a discrete wavelet transform (DWT) for vibration acceleration signals collected from wind turbines. Then, the wavelet coefficients are sequentially subjected to phase space reconstruction (PSR) and singular value decomposition (SVD) to extract the fault features. Finally, an extreme learning machine (ELM) is used to classify the faults. Experimental results show that the proposed method is more effective and accurate than other fault diagnosis methods for wind turbines, such as support vector machine (SVM) and multiscale convolutional neural network (MSCNN).