ArticlePDF Available

Intelligent fault diagnosis of machines with small & imbalanced data: A state-of-the-art review and possible extensions

Authors:

Abstract and Figures

The research on intelligent fault diagnosis has yielded remarkable achievements based on artificial intelligence-related technologies. In engineering scenarios, machines usually work in a normal condition, which means limited fault data can be collected. Intelligent fault diagnosis with small & imbalanced data (S&I-IFD), which refers to build intelligent diagnosis models using limited machine faulty samples to achieve accurate fault identification, has been attracting the attention of researchers. Nowadays, the research on S&I-IFD has achieved fruitful results, but a review of the latest achievements is still lacking, and the future research directions are not clear enough. To address this, we review the research results on S&I-IFD and provides some future perspectives in this paper. The existing research results are divided into three categories: the data augmentation-based, the feature learning-based, and the classifier design-based. Data augmentation-based strategy improves the performance of diagnosis models by augmenting training data. Feature learning-based strategy identifies faults accurately by extracting features from small & imbalanced data. Classifier design-based strategy achieves high diagnosis accuracy by constructing classifiers suitable for small & imbalanced data. Finally, this paper points out the research challenges faced by S&I-IFD and provides some directions that may bring breakthroughs, including meta-learning and zero-shot learning.
Content may be subject to copyright.
Please cite this article as: T. Zhang, J. Chen, F. Li et al., Intelligent fault diagnosis of machines with small & imbalanced data: A state-of-the-art review and possible extensions.
ISA Transactions (2021), https://doi.org/10.1016/j.isatra.2021.02.042.
ISA Transactions xxx (xxxx) xxx
Contents lists available at ScienceDirect
ISA Transactions
journal homepage: www.elsevier.com/locate/isatrans
Research article
Intelligent fault diagnosis of machines with small & imbalanced data:
A state-of-the-art review and possible extensions
Tianci Zhang a, Jinglong Chen a,, Fudong Li a, Kaiyu Zhang a, Haixin Lv a, Shuilong He b,,
Enyong Xu c,d
aState Key Laboratory for Manufacturing and Systems Engineering, Xi’an Jiaotong University, Xi’an 710049, PR China
bSchool of Mechanical and Electrical Engineering, Guilin University of Electronic Technology, Guilin 541004, China
cSchool of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China
dDongfeng Liuzhou Motor Co., Ltd., Liuzhou 545005, China
article info
Article history:
Received 22 October 2020
Received in revised form 24 February 2021
Accepted 24 February 2021
Available online xxxx
Keywords:
Intelligent fault diagnosis
Small & imbalanced data
Data augmentation
Feature learning
Classifier design
Meta-learning
Zero-shot learning
abstract
The research on intelligent fault diagnosis has yielded remarkable achievements based on artifi-
cial intelligence-related technologies. In engineering scenarios, machines usually work in a normal
condition, which means limited fault data can be collected. Intelligent fault diagnosis with small &
imbalanced data (S&I-IFD), which refers to build intelligent diagnosis models using limited machine
faulty samples to achieve accurate fault identification, has been attracting the attention of researchers.
Nowadays, the research on S&I-IFD has achieved fruitful results, but a review of the latest achievements
is still lacking, and the future research directions are not clear enough. To address this, we review the
research results on S&I-IFD and provides some future perspectives in this paper. The existing research
results are divided into three categories: the data augmentation-based, the feature learning-based, and
the classifier design-based. Data augmentation-based strategy improves the performance of diagnosis
models by augmenting training data. Feature learning-based strategy identifies faults accurately by
extracting features from small & imbalanced data. Classifier design-based strategy achieves high
diagnosis accuracy by constructing classifiers suitable for small & imbalanced data. Finally, this paper
points out the research challenges faced by S&I-IFD and provides some directions that may bring
breakthroughs, including meta-learning and zero-shot learning.
©2021 ISA. Published by Elsevier Ltd. All rights reserved.
1. Introduction
Fault diagnosis plays an essential link in machine health man-
agement as it builds a bridge between machine monitoring data
and its health conditions. Intelligent fault diagnosis utilizes arti-
ficial intelligence technologies in the process of fault diagnosis to
make it intelligent and automatic [1]. Recently, deep neural net-
work such as deep auto-encoder (DAE) [2,3], deep convolutional
neural network (DCNN) [4,5], and other deep networks [6,7], have
been widely used to build end-to-end intelligent diagnosis mod-
els, which reduces the dependence on manual labor and expert
knowledge, and greatly promotes the development of intelligent
fault diagnosis [8].
Intelligent fault diagnosis with small & imbalanced data (S&I-
IFD) refers to build intelligent diagnosis models using a few
machine faulty samples to achieve accurate fault identification.
Corresponding authors.
E-mail addresses: jlstrive2008@mail.xjtu.edu.cn (J. Chen),
xiaofeilonghe@guet.edu.cn (S. He).
Generally speaking, intelligent diagnosis models with deep net-
works are built on sufficient machine monitoring data analy-
sis [8]. The more sufficient the training data is, the more abundant
the fault types in the training set are, the higher the diagnosis
accuracies of intelligent diagnosis models are. However, in engi-
neering scenarios, it is difficult to build an ideal dataset for the
training of intelligent diagnosis models for the following three
reasons.
(1) In engineering scenarios, machines usually work in a nor-
mal condition and faults are rare. Therefore, despite the
condition monitoring system composed of multiple sensors
can collect data from machines constantly, the majority
of the collected data is healthy data, and the volume of
the fault data is small. Thus, it is hard to obtain sufficient
fault data from engineering scenarios directly to support
the training of intelligent diagnosis models.
(2) It is expensive to carry out fault simulation experiments to
collect machine fault data in the laboratory. For example,
to obtain fault data of gears in the laboratory, researchers
need to purchase gear specimens and manufacture faults
https://doi.org/10.1016/j.isatra.2021.02.042
0019-0578/©2021 ISA. Published by Elsevier Ltd. All rights reserved.
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
by wire-electrode cutting or other ways artificially. More-
over, it is necessary to build a fault simulation test bench
to collect data. Such an experiment is not only expensive
but also consumes a lot of human labor. Besides, some
common faults like gear tooth surface bonding are difficult
to be simulated by artificial fault manufacturing. Thus, it is
difficult to collect fault data by conducting fault simulation
experiments in the laboratory.
(3) The fault data obtained by computer simulation is not prac-
tical. Some fault simulation software can simulate faults
of equipment and output fault data. For example, Gasturb
is a performance calculation software of aero-engines [9].
Researchers use Gasturb to simulate faults of aero-engines
to obtain fault data. However, although Gasturb can per-
form precise mathematical operations, it cannot simulate
the complex working environment of aero-engines. Differ-
ent working environments and working conditions have a
significant impact on fault data. Therefore, the fault data
obtained by simulation is usually not practical enough.
In short, intelligent fault diagnosis in engineering scenarios
is a typical small & imbalanced data problem. In this case, if
the intelligent diagnosis model is trained with limited fault data
directly, it is prone to poor generalization performance and low
fault identification accuracy. Therefore, the lack of fault samples
makes it difficult to build an effective intelligent diagnosis model
and achieve accurate fault identification in engineering scenarios.
How to solve the S&I-IFD problem has been the research
interest of scholars for a long time. For example, some researchers
use Synthetic Minority Over-sampling Technique [10] to expand
faulty sample number or develop fault classifiers with Support
Vector Machines [11], so diagnosis models can have relatively
high identification accuracies under the condition of insufficient
fault data samples. Recently, the research on S&I-IFD has yielded
fruitful achievements with new machine learning algorithms. For
instance, researchers use generative adversarial networks (GAN)
to emulate data distributions of machine faulty samples so that
more faulty samples are generated to expand the limited fault
dataset [12]. Besides, transfer learning-related diagnosis models
reuse the previously learned diagnosis knowledge to the new
diagnosis task, so that accurate fault identification can also be
achieved using a few faulty samples [13].
At present, there have been many research achievements on
S&I-IFD, however, the research directions for future development
are not clear enough, and a review for existing results is still
lacking. Although some reviews about intelligent fault diagnosis
have been published, these reviews mainly aim at the utilization
of some theory like deep learning to specific objects like induction
motors [14,15], not on the problem of lacking fault data samples.
There is no doubt that small & imbalanced data learning is a com-
mon problem in many areas of the real world, such as medical,
financial, and so on [16]. For example, the detection of invalid
transactions and financial fraud in the trading system of banks
is also a typical small & imbalanced data problem. Therefore,
many reviews on imbalanced data classification have also been
published [1619]. However, these existing reviews pay little
attention to the new machine learning theory and algorithms
like GAN and transfer learning, which have been widely applied
to S&I-IFD in recent years. Moreover, these existing reviews are
mainly a summary of research methods and do not take mechan-
ical equipment as a special research object. From the perspective
of data analysis, the analysis of machine monitoring data often
involves frequency domain analysis, and so on, which is different
from other data analysis such as image data analysis. Besides,
as far as the authors know, similar review papers for S&I-IFD
are neither under consideration nor already published in another
venue. Therefore, it is necessary to present a review for S&I-IFD
to summarize the existing achievements and give some future
directions for further exploration.
This paper provides a review of S&I-IFD. The contributions of
this paper include two aspects. First, this paper focuses on the
small & imbalanced data problem in intelligent machine fault
diagnosis, which is a significant research point, but the related
review is still lacking in intelligent fault diagnosis. Taking me-
chanical equipment as the research object, this paper reviews the
related work on S&I-IFD in the past 10 years, and focuses on the
latest research results represented by GAN and transfer learn-
ing. Different from other reviews on small & imbalanced data
learning [1619], this paper divides the achievements of S&I-IFD
into three categories: data augmentation-based strategy, feature
learning-based strategy, and classifier design-based strategy, ac-
cording to the general process of machine fault diagnosis (MFD),
as shown in Fig. 1. In particular, MFD contains three main stages:
data preprocessing, feature extraction, and conditions classifi-
cation [1]. For S&I-IFD, solutions can also be found from the
three steps, as shown in Fig. 1. From the perspective of data
preprocessing, scholars augment the limited fault data through
data generation or data oversampling, and the augmented data
can be directly used to train intelligent diagnosis models. In terms
of feature extraction, fault features can be learned from limited
fault data directly by designing regularized neural networks or
feature adaptation without data augmentation. On the aspects of
conditions classification, the health conditions of machines can be
classified directly by designing fault classifiers suitable for small
& imbalanced data without data augmentation or the designing of
feature extraction models. Compared with the other reviews on
small & imbalanced data learning [1619], the presented review
has stronger field characteristics due to the special classification
mode of the research achievements. As a result, this paper may
be more enlightening for researchers in this field.
Second, based on the existing research results and the latest
machine learning theories, this paper provides some research
challenges and directions for further development. Specifically, in
the aspect of data augmentation, the current researches mainly
focused on expanding the number of fault samples, while how
to measure and enhance the samples’ quality needs to be paid
more attention to. How to prevent negative transfer in the diag-
nosis models is a key to the application in engineering scenarios.
Besides, as a new machine learning theory, meta-learning [20]
has initially shown its advantages in dealing with small sample
problems. Thus, the applications of meta-learning theory on S&I-
IFD may increase greatly. Finally, zero-shot learning [21] may
bring a breakthrough for S&I-IFD in the extreme case where there
are no fault samples available at all.
For the rest of this review, Section 2describes both the re-
search methodology and the initial data analysis. Section 3, 4, and
5 review the research achievements from the perspective of the
data augmentation, the feature learning, and the classifier design
respectively. Section 6gives some possible extensions for S&I-IFD
in the future. Section 7presents a conclusion for this review.
2. Research methodology and initial analysis
2.1. Research methodology
This paper mainly searched and collected the publications on
S&I-IFD published from 2010 to November 2020. Four library
databases covering the natural science research field were se-
lected for the literature search, which are Science Direct, IEEE
Xplore, Springer, and ACM. Besides, the Scopus and the Web of
Science were also used to search the papers in some individual
publishers [22].
2
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
Fig. 1. The process of machine fault diagnosis and the three strategies for S&I-IFD.
Fig. 2. The two-level keywords tree.
Inspired by [16], a two-level keywords tree was constructed to
collect published papers on S&I-IFD as comprehensive as possible,
as given in Fig. 2. Since intelligent fault diagnosis in the small &
imbalanced data case was reviewed in this paper, the search key-
word of the first level was restricted to intelligent fault diagnosis.
For S&I-IFD, some scholars regard it as the problem of imbalanced
data classification [23,24], because the volume of health data is
larger than fault data. On the other hand, some scholars regard it
as the problem of small sample classification [12,25,26], that is,
the volume of health data is set to be the same as the fault data
to avoid the problem of data distribution imbalance. Therefore,
the search keywords of the second level were divided into two
parts, i.e., small sample learning and imbalanced data learning
respectively, as shown in Fig. 2. A total of 249 English journal
papers were collected in the initial search. After further review,
145 papers were related to the theme of this paper, which will
be the main data source of this review. Besides, in the citations of
these papers, we found 9 related conference papers and included
them in the references for this review.
In the process of literature search, due to the inaccurate or
incomplete keywords, there may be a lack of some related lit-
erature. For example, some scholars regard the imbalanced data
as ‘‘skewed data’’ [16]. However, in the literature search process,
we did not list ‘‘skewed data’’ as the search keyword, which is the
main limitation and threat to the validity of the literature search.
2.2. Initial analysis
Fig. 3 shows the number of S&I-IFD-related publications in
2010–2020. It can be seen that there are few English journal
Fig. 3. The publishing trends of S&I-IFD.
papers about S&I-IFD from 2010 to 2015, while the number of
published papers increased rapidly since 2016, which is mainly
due to the emergence and application of new machine learning
models like GAN [12]. The trends in Fig. 3 show that S&I-IFD is
a valuable research problem and may continue to be a research
hotspot in the next few years.
After a careful review, the collected papers are classified into
data augmentation-based strategy, feature learning-based strat-
egy, and classifier design-based strategy, as shown in Fig. 1.
Inspired by the general process of machine fault diagnosis, the
classification mode of the collected papers in this paper has
stronger field characteristics than that in the existing related
reviews [1619]. Specifically, in the aspect of data augmentation,
the data generation and data over-sampling models can effec-
tively expand the fault dataset [25,27,28], and the data reweight-
ing methods based on transfer learning can also augment the
limited fault data with the help of other related datasets [13,29].
The research achievements indicate that augmented data improve
the diagnosis accuracies in S&I-IFD effectively. In the aspect of
feature learning, fault features can be extracted directly from
small & imbalanced data by designing the regularized neural
networks [23,30,31], and feature adaptation based on transfer
learning is also useful for learning features from limited fault data
to achieve accurate fault identification [3234]. In the aspect of
classifier design, it is expected to achieve accurate fault iden-
tification by modifying SVM or designing a cost-sensitive fault
3
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
Fig. 4. Structure of GAN.
classifier [3540]. Besides, the classifier design scheme based on
parameter-transfer learning also shows effectiveness in the case
of limited fault samples [4143].
3. Data augmentation-based strategy for S&I-IFD
3.1. Motivation
Data-driven intelligent fault diagnosis has been widely stud-
ied. Research results have demonstrated that data-driven intelli-
gent fault diagnosis models can usually achieve good diagnosis
performance [44]. However, in engineering scenarios, machine
faulty samples are hard to be collected, which is an important
factor restricting the data-driven intelligent diagnosis models
to be utilized. As an efficient approach to enhance the neural
networks’ generalization performance, data augmentation [17]
presents a good solution for S&I-IFD. With a few faulty data
for data generation [45,46], data over-sampling [47,48], or data
reweighting [13,29], the limited fault dataset can be augmented
to train the intelligent diagnosis models effectively. As a result, in-
telligent diagnosis models are expected to have strong diagnosis
ability in the case of lacking fault data samples.
3.2. Data generation using generative models
Recently, data generation models represented by generative
adversarial networks (GAN) [49] and Variational Auto-Encoder
(VAE) [50] have been deeply studied and shown bright results in
many fields [51]. Fortunately, these generative models can also be
used to generate mechanical signals, providing a powerful tool for
data augmentation in S&I-IFD [52].
3.2.1. GAN-Based methods
3.2.1.1. Introduction to GAN. GAN has two multi-layer neural net-
work modules named Generator and Discriminator, as depicted in
Fig. 4. Generator samples random noise zfrom distribution pzand
then generates data xg, while Discriminator outputs a probability
scalar quantity to distinguish real data xrand generated data xg.
Given G(·)is the operation in Generator, D(·)is the operation in
Discriminator, LGis the objective function of Generator.
LG=Ezpz[log (1D(G(z)))].(1) (1)
For Discriminator, LDis the objective function.
LD= −Expr[log D(x)]Ezpz[log (1D(G(z)))](2)
where prrepresents the real data distribution.
As a result, GAN has the overall objective function as follows:
min
Gmax
DWG,D=Expr[log D(x)]+Ezpz[log (1D(G(z)))].(3)
The specific training process of GAN can be described as fol-
lows:
Based on the original GAN, scholars have made many im-
provements on it and created many variants since its birth. For
example, Deep Convolutional GAN (DCGAN) [53] uses deep con-
volutional neural networks to build Generator and Discriminator,
which makes it possible to generate high-quality images. Wasser-
stein GAN (WGAN) [54] applies Wasserstein distance to modify
the original loss function, which makes the training process more
stable than the original GAN. Wasserstein GAN with Gradient
Penalty (WGAN-GP) [55] applies gradient penalty to Discrimi-
nator to further stabilize the training process. Conditional GAN
(CGAN) [56] introduces the class information of the real data
into the training of GAN, which enables the model to generate
labeled data samples. Auxiliary Classifier GAN (ACGAN) [57] adds
a classifier to Discriminator to generate labeled data samples.
Semi-supervised GAN (SSGAN) [58] realizes semi-supervised data
classification by constructing pseudo labels for the unlabeled data
samples. Information maximizing GAN (infoGAN) [59] can learn
the disentangled feature representation by inputting latent code
into Generator so that the learned features are interpretable.
For simplicity, we use G and D to represent Generator and
Discriminator. Q represents classifier. c is the class information of
the input data. k is the class number. λand αare real numbers
less than 1. cand c′′ denote the input latent code and the recon-
structed latent code. LI(·)represents the calculation of mutual
information. As shown in Fig. 5, we summarize several common
variants of GAN, and their objective functions are given in Table 1.
3.2.1.2. Applications of GAN to data generation. The applications
of GAN to generate data for S&I-IFD are summarized in Table 2.
The research achievements show that the fault data augmented
by GAN can improve the faults identification performance of
gears [65], bearings [66], rotors [52], and other components [67]
effectively in the case of limited fault data. According to the
data dimension, these research results can be divided into two
categories: one-dimensional samples (1-D) generation and two-
dimensional samples (2-D) generation. Among them, the gener-
ation of 1-D data can be classified into three types. The first is
to generate raw signals directly [12,27,52,6064,77]. GAN and
its variants are applied to generate the monitoring signals of
machines, and the generated signals can be used to train the in-
telligent diagnosis models directly. For example, Zhang et al. [12]
used a deep gradient penalized GAN to generate bearings’ vi-
bration data, which expands training datasets effectively. The
presented work in [12] was one of the earliest research using
GAN for mechanical signals augmentation, which also designed
an index based on correlation coefficients to measure the gen-
erated samples’ quality. The second is to generate the frequency
spectrum of the monitoring signals [6573]. Compared with raw
monitoring data, the frequency spectrum also contains abundant
fault information and is widely used in machine fault identifica-
tion. For example, Wang et al. [65] adopted GAN to generate the
4
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
Fig. 5. Variants of GAN.
gearbox’s signal frequency spectrums. The generated frequency
spectrums with the real ones were used to train a Stacked Auto-
encoder (SAE) together, which achieves high diagnostic accuracy
and good anti-noise ability. The third is to generate the ex-
tracted data features [46,74]. The generated fault features can
also be used to train the fault classifier directly. For instance,
Zhou et al. [74] used auto-encoder (AE) to extract fault features
from monitoring data, and the extracted features were generated
by a global optimization GAN. The generated and the real fault
features were used for accurate fault identification by deep neural
5
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
Table 1
The objective functions of the variants of GAN.
Name Objective function
DCGAN LDCGAN
D= −Expr[log D(x)]Ezpz[log (1D(G(z)))]
LDCGAN
G=Ezpz[log (1D(G(z)))]
WGAN LWGAN
D= −Expr[D(x)]+Ezpz[D(G(z))]
LWGAN
G= −Ezpz[D(G(z))]
WGAN_GP LWGAN_GP
D=LWGAN
D+λE(x,z)(Pr,Pz)(|D(αx(1αG(z)))|1)2
LWGAN_GP
G=LWGAN
G
CGAN LCGAN
D= −Expr[log D(x,c)]Ezpz[log (1D(G(z),c))]
LCGAN
G=Ezpz[log (1D(G(z),c))]
ACGAN LACGAN
D=LDCGAN
DExPr[P(class =c|x)]EzPz[P(class =c|G(z) )]
LACGAN
G=LDCGAN
GEzPz[P(class =c|G(z) )]
SSGAN
LSSGAN
D=LWGAN
DExPr[P(class =c|x,c<k+1)]
LSSGAN
G=LWGAN
G+
ExPrf(x)EzPzf(G(z))
2
infoGAN LinfoGAN
D=LDCGAN
DλLIc,c′′
LinfoGAN
G=LDCGAN
GλLIc,c′′
Table 2
Applications of GAN to generate data in S&I-IFD.
Data dimension Data types Models References
1-D
Raw signal
GAN/WGAN/WGAN-GP Zhang et al. [12], Liu et al. [27], Yin et al.
[60], Gao et al. [61], Zhang et al. [62], Zhang
et al. [63]
ACGAN Shao et al. [52]
infoGAN Wu et al. [64]
Frequency spectrum
GAN/WGAN/WGAN-GP Wang et al. [65], Zou et al. [66], Wang et al.
[67], Ding et al. [68], Mao et al. [69]
CGAN Wang et al. [70], Zheng et al. [71], Zheng
et al. [72]
ACGAN Li et al. [73]
Extracted feature GAN/WGAN/WGAN-GP Pan et al. [46], Zhou et al. [74]
2-D Time–frequency spectrum
GAN/WGAN/WGAN-GP Cabrera et al. [75]
CGAN Liu et al. [26], Yu et al. [45]
SSGAN Liang et al. [76]
networks. Since the dimension of features is generally lower
than that of raw data, the generation of data features is easier
and faster than that of raw data. However, the fault information
contained in the generated features may not be as rich as the one
in the raw data, which is one of the drawbacks of fault feature
generation.
On the other hand, GAN is used for 2-D image generation
originally, therefore, it is handy for processing 2-D data. In the
field of machine fault diagnosis, researchers usually use wavelet
transform (WT) and other methods [45,75,76] to obtain the time–
frequency domain features of raw signals, which are 2-D data.
GAN can generate time–frequency features of raw monitoring
signals to serve the training of intelligent diagnosis models. Cabr-
era et al. [75] presented a deep diagnosis scheme based on
GAN for imbalanced fault diagnosis, in which the 2-D time–
frequency features are extracted using wavelet packet transform
and augmented by GAN. Liang et al. [76] used continuous wavelet
transform to extract time–frequency features of gearboxes’ vibra-
tion data and a GAN was adapted to expanding the number of 2-D
time–frequency features to train the diagnosis model.
As a popular data generation method, GAN has the ability
to generate faulty samples similar to the real faulty samples
collected from engineering scenarios, thus expanding the training
dataset of the intelligent diagnosis model. However, there are
still two problems when GAN is applied for fault data generation.
First, GAN is difficult to train. In order to generate sufficient fault
data, GAN consumes a large number of computing resources and
needs a long training time. Second, although GAN can expand
the volume of fault data, the data generation ability is limited
when the training data is insufficient. Specifically, the original
GAN needs massive data for training. The more training data
is, the closer the data distribution learned by GAN is to the
real data distribution. However, when only a few training data
is available, it is easy to fall into mode collapse [55]. In this
case, the generated samples approximate the copies of the real
samples, which means that the fault information contained in the
generated data is very limited. As a result, the fault identification
accuracy of the diagnosis model cannot meet the requirement of
engineering using such low-quality generated samples as training
data. Therefore, despite many achievements have been yielded
using GAN, there is huge research space on how to reduce the
consumption of computing time and improve the data generation
ability when the training data is insufficient.
3.2.2. VAE-Based methods
3.2.2.1. Introduction to VAE. Variational Auto-Encoder (VAE) [50]
is another commonly used deep generative model, as shown in
Fig. 6. In terms of data generation, VAE can sample from hidden
variables and then generate more data.
6
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
Fig. 6. Structure of VAE.
The input of the encoder is data x, the output is the hidden
variable z, which is composed of µand σ, and the weights
and biases of the encoder are θ. In the training, the posterior
distribution qθ(z|x)will be learned by the encoder. The hidden
variable zwill be input into the decoder to reconstruct data, and
the weights and biases of the decoder are ϑ. The distribution
pϑ(x|z)will be learned by the decoder.
The objective function can be expressed as
Li(θ , ϑ)= −Ezqθ(z|xi)[log (pϑ(xi|z))]+KL (qθ(z|xi)p(z)) (4)
where p(z)is the hidden variable’s prior distribution. KL (·)de-
notes the Kullback–Leibler divergence. In VAE, p(z)is the nor-
mal distribution N(z;0,1).qθ(z|xi)is the normal distribution
Nz;µi, σ 2
i. Thus, the KL (·)between qθ(z|xi)and p(z)can be
described as
KLqθ(z|xi)
p(z)
= 1
2
J
j=11+log σj
i2µj
i2
σj
i2(5)
where Jis the dimension of the hidden variable z.
In Eq. (5),µiand σican be computed by the encoder directly.
The hidden variable zis calculated by
zi=µi+σiε(6)
where εN(0,1)is a noise variable, as given in Fig. 6.
In VAE, the output data has a high similarity to the input
because the data reconstruction loss is optimized in the training
process. Meanwhile, due to the addition of the noise variable ε,
the generated data will not be completely consistent with the
input data, thus achieving data augmentation.
3.2.2.2. Applications of VAE to data generation. In intelligent fault
diagnosis, VAE has been utilized to generate fault data of gear-
boxes [70] and bearings [25,78]. For example, in [70], a diagnosis
scheme based on VAE and GAN was proposed for imbalanced
fault diagnosis, in which VAE was applied to generate the fre-
quency spectrums of gearbox in different working conditions.
Different from the traditional GAN, this scheme used VAE as the
data generator and further improved the data generation ability
of VAE through adversarial training. Dixit et al. [25] adopted a
Conditional Variational Auto-encoder (CVAE) to generate faulty
data of bearings, in which a centroid loss term was added to
the original loss function of VAE. Zhao et al. [78] proposed an
intelligent diagnosis model suitable for small and unbalanced
monitoring data, in which a VAE was used to generate the vibra-
tion signals of machines. The signals generated by VAE had high
similarity to the real signals in terms of time–frequency domain,
which made the proposed diagnosis method possible to obtain
higher accuracy than related works.
Similar to GAN, VAE can also be used for fault data gen-
eration, and the research achievements above have proved the
effectiveness of VAE in S&I-IFD. Compared with GAN, the train-
ing process of VAE is more stable, and there is no problem of
mode collapse [79]. However, due to the difference in the loss
function, the data generated by VAE is usually not as real as
the data generated by GAN [80]. As a result, the application of
GAN to data augmentation is more popular than that of VAE [80].
Some scholars have tried to combine VAE and GAN to generate
mechanical data [70]. In the future, how to make the data samples
generated by VAE more real is a problem that needs to be solved.
3.3. Data over-sampling using sampling techniques
Although deep generative models like GAN and VAE can gener-
ate fault data to support the training of intelligent diagnosis mod-
els, these deep generative models are often difficult to train and
require a large number of computing resources [51]. Taking into
account this problem, data over-sampling using sampling tech-
niques is another important way to augment limited data [19].
Some sampling techniques like Synthetic Minority Over-sampling
Technique (SMOTE) [10], have yielded many achievements in
S&I-IFD.
3.3.1. SMOTE-Based methods
3.3.1.1. Introduction to SMOTE. In general, researchers over-
sample the minority classes or under-sample the majority classes
to balance the dataset [19]. However, under-sampling will lose
some valuable information that might be useful for data clas-
sification. On the other hand, over-sampling replicates training
data randomly, which may lead to overfitting of classifiers [18].
Based on the random over-sampling, an improved method named
SMOTE is proposed [81]. By analyzing the samples in the minority
classes, SMOTE can synthesize more new samples. As given in
Fig. 7, the process of SMOTE is described as follows:
(1) The Euclidean distance between the sample xand all the
samples in the same class is calculated to obtain the k-
nearest neighbors.
(2) For each sample x,nsamples {xi}n
i=1are randomly chosen
within the range of the k-nearest neighbors.
(3) For each sample xi, the new synthesized sample xnewcan
be obtained as follows:
xnew=x+rand (0,1)(xix).(7)
3.3.1.2. Applications of SMOTE to data over-sampling. Some schol-
ars [28,47,48,82] have introduced SMOTE and its modified vari-
ants to over-sample the machine faulty samples. For example,
Martin-Diaz et al. [10] used SMOTE to synthesize the fault sam-
ples of induction motors (IMs), in which the stator current sig-
nals in the minority classes were synthesized to balance the
dataset. The results indicated that the balanced data constructed
by SMOTE could help to improve the diagnosis performance effec-
tively. An effective imbalanced data learning scheme named Easy-
SMT was presented in [82]. Easy-SMT used SMOTE to augment
the minority fault classes of wind turbines and Easy-Ensemble
algorithm to transfer the imbalanced fault classification problem
7
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
Fig. 7. New samples synthesized using SMOTE.
to a balanced one, making it possible to achieve good diagno-
sis performance. In [47], Wu et al. proposed an expectation–
maximization minority over-sampling method based on SMOTE,
in which a local-weighted strategy was applied to the expec-
tation–maximization algorithm to learn and identify the hard-to-
learn informative fault samples.
Compared with deep generative models, SMOTE requires
fewer computing resources, so it is able to synthesize a large
quantity of fault data samples to meet the demand of intelligent
diagnosis models. However, SMOTE has the problem of data dis-
tributional marginalization when it is applied to synthesize data
in the minority class. Specifically, if a fault sample is at the edge
of the fault data distribution, the samples synthesized using this
fault sample will also be at the edge of the distribution, which will
blur the classification boundary [83]. Therefore, despite SMOTE
improves the balance of the training dataset, it may increase the
difficulty of fault classification when it falls into distributional
marginalization.
3.4. Data reweighting using transfer learning
In addition to data generation and data over-sampling, data
augmentation can also be achieved by reweighting data samples
using transfer learning-based approaches with the help of other
related datasets [13,29,84].
3.4.1. Introduction to transfer learning
In the case of lacking fault data, it is difficult to train a new
intelligent diagnosis model [85]. However, this problem could be
solved if the existing diagnosis knowledge learned by the trained
diagnosis model could be reused. For example, we can use the
bearing fault data collected in the laboratory to train a diagnosis
model. The bearing fault diagnosis knowledge learned by this
diagnosis model may be helpful for bearing fault identification
in engineering scenarios. Transfer learning, which means that the
knowledge learned from one task is reused in another task, is a
promising tool for achieving this goal [86].
Generally speaking, transfer learning has three categories:
instance-based transferring, feature-based transferring, and
parameter-based transferring, depending on the components be-
ing transferred [86]. Among them, instance-based transferring
aims to select some data samples from the source domain to
improve the target task’s performance in the case of limited
target samples. Data reweighting is one of the most commonly
used strategies of instance-based transferring. The weights of the
selected target domain data samples will be increased while the
weights of the selected source domain ones will be decreased.
TrAdaBoost [87] is the most representative data reweighting
algorithm in transfer learning.
3.4.2. TrAdaBoost-based methods
The source domain samples and the target domain ones will
be reweighted by TrAdaBoost, so the contributions of the source
and the target domain samples to the diagnosis model training
can be balanced. In TrAdaBoost, if a target domain sample is
misclassified by the diagnosis model, the weight of this sample
will be increased because this sample is hard to be classified
correctly. On the other hand, if a source domain sample is mis-
classified by the diagnosis model, the weight of this sample will
be decreased because this sample is considered to be of little
help to the training of the diagnosis model. Consequently, the
classification boundary is moved to the direction of accurately
identified the target data, as given in Fig. 8. As a result, the
obtained diagnosis model based on TrAdaBoost algorithm will
have a good classification accuracy on the target diagnosis task.
In intelligent fault diagnosis, TrAdaBoost algorithm has been
used to handle the small sample condition. For example, Xiao
et al. [13] presented a transfer learning scheme for machine fault
diagnosis under the small sample condition, in which a TrAd-
aBoost algorithm was applied to assign weights to each training
sample. The weighted samples helped to train a convolutional
neural network-based learner. The proposed scheme obtained
the highest diagnosis accuracy compared with related works in
the case of inadequate target data. Shen et al. [29] applied the
TrAdaBoost algorithm to update the weights of the selected auxil-
iary samples, the experimental results showed that the presented
work was effective in bearing fault identification using small
target data samples.
As a data reweighting algorithm, TrAdaBoost only operates
on the data and does not participate in feature extraction and
conditions identification. Therefore, it is easy to be combined
with various advanced data classification models like deep be-
lief networks and convolutional neural networks. However, the
performance of data reweighting is connected with the similarity
of the source and the target domain data distributions, if there
is a large deviation between them, the TrAdaBoost-based data
reweighting strategy may lead to negative transfer in the target
diagnosis task [8], which means the reweighted fault samples
may lead to a poor diagnosis performance.
3.5. Epilog
This section reviews the research results using data aug-
mentation-based strategy in S&I-IFD. The data augmentation-
based strategy in S&I-IFD has three categories: data generation
using generative model, data over-sampling using sampling tech-
nique, and data reweighting using transfer learning. The first
two methods can expand the volume of fault data effectively.
However, they have the following two problems to be solved.
First, deep generative models like GAN and VAE are often difficult
to train and require many computing resources, which means
they are not friendly to practical application. Moreover, when
only a few samples are available for training, the generated faulty
samples’ quality is too low to meet the requirement of intelligent
fault diagnosis models because these deep generative models
usually need massive data to learn an authentic data distribution.
Second, the sampling techniques represented by SMOTE have the
problem of data marginal distribution, which may even increase
the difficulty of accurate fault classification. Based on transfer
learning, data reweighting can also augment limited fault data
samples by increasing the selected data samples’ weights with
the help of other related datasets. However, data reweighting
relies on the similarity of the source and the target domain
data distributions, which is prone to reduce the performance
of the diagnosis model. Therefore, it is necessary to find new
data augmentation methods with high efficiency to improve the
diagnosis performance on S&I-IFD further.
8
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
Fig. 8. Illustration of TrAdaBoost: (a) the diagnosis model training with the source and the target domain samples directly, and (b) the diagnosis model training
based on TrAdaBoost algorithm.
Fig. 9. Structure of AE.
4. Feature learning-based strategy for S&I-IFD
4.1. Motivation
In intelligent fault diagnosis, fault feature learning from ma-
chine monitoring data is the core link. The quality of the learned
fault feature will affect the performance of machine fault diag-
nosis to a great extent. In addition to data augmentation, the
problem S&I-IFD can also be solved if diagnosis models can learn
effective fault features from small & imbalanced data. Scholars
have done many works on how to learn fault features from small
& imbalanced data. According to the existing results, the research
ideas are mainly divided into the following two kinds. First, by
designing regularized neural networks like sparse ones [23,30,
31], diagnosis models can extract fault features from small &
imbalanced data directly. Second, with the help of other related
datasets, feature adaptation based on transfer learning can also
learn fault features from small & imbalanced data to achieve
accurate fault identification [3234].
4.2. Feature extraction using regularized neural networks
The use of neural networks for fault feature extraction
from monitoring data has been studied deeply. Recent research
achievements show that regularized neural networks can process
small & imbalanced data effectively [8891]. Moreover, in these
achievements, deep auto-encoders (DAE) and deep convolutional
neural networks (DCNN) are favored as a basic model.
4.2.1. DAE And DCNN-based methods
4.2.1.1. Introduction to DAE. As shown in Fig. 9, Auto-encoder
(AE) is a typical unsupervised model [8], which can reconstruct
input data through the operation of encoder and decoder. The
input is xi,weand beare the weight and bias of the encoding
layer. The data features of the hidden layer hiare expressed as
hi=fe(we·xi+be)(8)
where feis the activation function in the encoder network. The
weight and bias of the decoding layer are wdand bd, the recon-
structed data
xican be defined as
xi=fd(wd·hi+bd)(9)
where fdis the activation function in the decoder network. By
minimizing the loss L(xi,
xi), the input data can be reconstructed
by AE.
L(xi,
xi)=1
n
n
i=1
xi
xi2(10)
where nis the data points number.
In the decoder network, the low-dimensional data hiis used
to reconstruct the high-dimensional data
xi. Thus, hican be
regarded as the features of input data xi. By stacking multi-
ple encoding layers and multiple decoding layers, DAE is con-
structed. Deep features of the input data can be collected using
DAE through pre-training layer by layer, and the collected deep
features are available for data classification using classifiers like
Softmax [12].
4.2.1.2. Introduction to DCNN. Compared to AE, convolutional
neural network (CNN) has fewer training parameters and stronger
feature extraction ability [92]. CNN contains convolutional and
pooling layers. The convolutional layer learns the feature vector
of the input data by convolution operation. As given in Fig. 10(a),
in the mth convolutional layer, the convolution kernel km
W×D×His used to learn the feature vector xm, where Wis the
kernel number, Dis the kernel depth. Hrepresents the kernel
height. The wth feature vector xm
wis obtained by
xm
w=σ
d
km
w,d×xm1
d+bm
w(11)
where σdenotes the activation function. d=1,2,...,D,w=
1,2,...,W,xm1
dis the dth feature vector in the m1th layer,
and bm
wis the bias of the wth layer.
9
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
Fig. 10. Illustration of CNN. (a) The convolution operation, and (b) the pooling operation.
On the other hand, the pooling layer plays a role of down-
sampling, it can reduce the size of the feature vector and the
number of parameters, which is meaningful for accelerating con-
vergence. Max pooling is the most common used pooling method,
as shown in Fig. 10(b), in the mth pooling layer, the max-pooling
is calculated by
xm
w=down xm1
w,s(12)
where down (·)is the function of down-sampling, and sis the
pooling size.
Similar to DAE, DCNN can also be built by stacking convolu-
tional and pooling layers. Benefiting from deep network structure,
DCNN has stronger feature extraction capability than the shallow
CNN so the high-dimensional complex data can be processed
handily with DCNN.
4.2.1.3. Applications of regularized DAE and DCNN to feature extrac-
tion. DAE and DCNN with deep network structures often need
a large volume of data for training, so they are not suitable for
processing small & imbalanced data directly. Fortunately, regu-
larization can help the training of DAE and DCNN with fewer
training data while ensuring generalization ability. In intelligent
fault diagnosis, regularized neural networks can extract fault
features from a few fault samples and realize accurate fault
classification. There are three commonly used regularized neural
networks, i.e. sparse ones [23,30,31], normalized ones [24,93,94],
and ensemble ones [9597]. Among them, sparse neural net-
works will reduce the parameters of the network to decrease
the risk of overfitting through weight decay, thus ensuring the
generalization ability with limited training data. For example,
Saufi et al. [31] presented a stacked sparse auto-encoder (SSAE)
for gearbox fault diagnosis with limited fault data. Taking the
Kullback–Leibler divergence as the sparse penalty term, the pa-
rameters to be trained in SSAE was reduced so diagnosis model
can achieve better generalization performance and higher di-
agnosis accuracy than other deep neural networks using fewer
training samples. Second, normalized neural networks will reduce
the adverse effect of data imbalance on the training process
by normalizing the weights, which ensures strong data classi-
fication ability in the case of imbalanced data distribution. For
example, normalized convolutional networks (DNCNN) were used
in [94] for imbalanced bearing faults identification. By applying
a weights normalization strategy to construct the normalized
convolutional and fully connected layer, the proposed DNCNN
reduced the negative impact of data imbalance on fault classifica-
tion. As a result, the proposed DNCNN was more effective in deal-
ing with imbalanced fault classification than traditional CNNs.
Finally, ensemble neural networks fuse data to prevent networks
from overfitting in the case of small sample. In particular, there
are two kinds of fused data, i.e. the extracted features [23,95]
and the classification results [96,97]. For instance, Ren et al. [95]
used a capsule network-based auto-encoder (CaAE) for intelligent
fault identification of bearings, in which different local features
were fused to construct the feature capsules. The feature capsules
were input into a classifier for faults identification, and the exper-
imental results showed that fused feature capsules were easier
to obtain better diagnosis accuracies with small training samples
than independent local features. An ensemble convolutional neu-
ral network (EnCNN) was proposed in [96] for imbalanced faults
identification of machinery. In EnCNN, the imbalanced raw data
were spilt into different training subsets to train a CNN-based
classifier, the classification results from multiple basic classifiers
were integrated by voting strategy. The integrated results were
more conducive to realize accurate fault identification than a
single result in the case of imbalanced training data.
In summary, DAE and DCNN have powerful data processing
capability and can extract fault features from massive monitoring
data automatically. However, such deep models update parame-
ters by minimizing empirical risk, which means they are prone
to overfitting when the training samples are insufficient [8].
Although recent studies have shown that regularized networks
can improve their generalization ability, it must be noted that
how to design high-quality regularization schemes for deep neu-
ral networks is a difficult problem requiring a large amount of
research experience because there are many choices of regu-
larization methods. Moreover, compared with the standard DAE
and DCNN, the regularized network structure is generally more
complex and difficult to train due to the introduction of other
factors such as sparse penalty term.
4.2.2. Other algorithms-based methods
In addition to regularized DAE and DCNN, other neural net-
works have also achieved some results in feature learning from
small and imbalanced data [8991,98100]. For example, Geng
et al. [98] presented a diagnosis method based on a residual
network with 17 convolutional layers for faults identification of
bogie under imbalanced data condition. The deep residual learn-
ing framework with stacked non-linear rectification layers made
it possible to learn discriminative fault features from imbalanced
Fast kurtogram images of mechanical signals. Liu et al. [99] used
the noise-assisted empirical mode decomposition for fault feature
extraction from raw signals, and the extracted features were
input into an enhanced fuzzy network for faults classification.
Qian et al. [100] proposed an imbalanced learning scheme based
on sparse filtering for fault feature extraction, which introduced
a balancing matrix to balance the feature learning abilities of
different classes. The results demonstrated that the presented
feature learning model was effective for bearing fault diagnosis. In
short, by modifying neural networks, fault features can be learned
from small and imbalanced data, which is an important means to
deal with S&I-IFD.
10
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
Fig. 11. Feature adaptation based on transfer learning.
4.3. Feature adaptation using transfer learning
In addition to extracting fault features directly, feature adapta-
tion with the help of other related datasets is another important
way to learn fault features from small and imbalanced data. In
the transfer learning scenario, the volume of target domain data
samples is usually much smaller than that in the source domain.
Moreover, because of the difference between the source domain
and the target domain data distributions, their features are gen-
erally different. Feature adaptation based on transfer learning
attempts to minimize the discrepancy between the feature distri-
butions in the two domains, so the feature of the target domain
data can also be learned well by models, as shown in Fig. 11.
Not only transfer component analysis (TCA) [101], many achieve-
ments in S&I-IFD have also been achieved by using joint distri-
bution adaptation (JDA) [102], deep neural networks (DNN) [34],
and other ways [103].
4.3.1. TCA And JDA-based methods
4.3.1.1. Introduction to TCA and JDA. TCA is a traditional feature
adaptation method [101]. When the source domain data XShas
different distributions with the target domain data XT, a feature
mapping Φis utilized to map them to high-dimensional Hilbert
spaces, where the target domain data has the minimized distance
with the source domain data.
The maximum mean discrepancy (MMD) is used by TCA to
calculate the distance between ΦXSand ΦXT, which is de-
scribed as follows
dist ΦXS,ΦXT=
1
nS
nS
i=1
Φxs
i1
nT
nT
i=1
ΦxT
i
(13)
By using a kernel matrix Kand L, the MMD between ΦXS
and ΦXTis rewritten to another form.
K=KS,SKS,T
KT,SKT,T,Lij =
1
(nS)2,xi,xjXS
1
(nT)2,xi,xjXT
1
nSnT,otherwise
(14)
dist ΦXS,ΦXT=trace (KL )λtrace (K)(15)
where λis a tradeoff parameter. λcan be used to keep the balance
of distributions adaptation and parameters complexity.
Finally, the optimization goal of TCA can be described as
min
Wtrace(WTKLKW )+λtrace(WTW)
s.t.WTKHKW =Im
(16)
where H=InS+nT1/nS+nT11Tis a centering matrix, and
1RnS+nTis an nS+nTdimensional column vector with elements
of 1.
JDA is an improved variant based on TCA [102]. TCA only
adapts the marginal probability distribution, while JDA adapts not
only the marginal probability but also the conditional probability
distribution between the source and the target domain data. As a
result, the optimization goal of JDA is described as
min
W
C
c=0
trace(WTX LcXTW)+λW2
F
s.t.WTXH X TW=I
(17)
where cis the class information. And Lcis
(Lc)ij =
1
(nS,c)2,xi,xjXS,c
1
(nT,c)2,xi,xjXT,c
1
nS,cnT,c,xiXS,c,xjXT,c
xiXT,c,xjXS,c
0,otherwise
(18)
where the sample number in class cfrom the source domain is
nS,cand that from the target domain is nT,c.
4.3.1.2. Applications of TCA and JDA to feature adaptation. Some
scholars introduced TCA and JDA to their transfer learning scheme
for feature adaptation. For instance, Chen et al. [104] used a
transfer learning faults identification method for rolling bearings
using a few faulty samples, in which TCA was applied for feature
adaptation to learn the transferable fault features from raw data.
Xie et al. [105] and Duan et al. [106] extracted transferable
fault feature from gearbox vibration signals using TCA, and the
experimental results showed that their models were effective for
gearbox faults identification in the small sample case. Besides,
Han et al. [107] and Qian et al. [108] applied JDA for transferable
features learning considering the problem of lacking target do-
main samples, the effectiveness of feature adaptation was verified
using a bearing dataset and a gearbox dataset respectively.
Traditional TCA and JDA based feature adaptation approaches
are simple in the calculation and can reduce the discrepancy
of the feature distributions in the two domains. However, both
TCA and JDA narrow the difference between two distributions by
mapping low-dimensional raw data to high-dimensional Hilbert
space. When they meet complex high-dimensional mechanical
data, they cannot fit them well. Thus, the diagnosis accuracy of
TCA and JDA related models on the complex diagnosis task is
usually poor.
4.3.2. Deep neural networks-based methods
Different from TCA and JDA, deep neural networks can learn
data features from the original data samples directly by mini-
mizing the distribution discrepancy of the target and the source
domain features. As a basic distance metric of distribution dis-
crepancy, some scholars built deep transfer diagnosis models
11
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
based on Kullback–Leibler (KL) divergence to achieve feature
adaptation. For example, a transfer network was constructed by
Qian et al. [109] for machine faults identification, in which a
distribution discrepancy measuring metric named auto-balanced
KL divergence (AHKL) was developed for fault feature adaptation.
After feature extraction, the first and the higher-order moment
discrepancies of the features from two domains was measured
by AHKL, and the discrepancies between them were reduced by
min
N
i=1
µi·Li
1+1iµi·
n
j=1
Li
j
s.t.0µi1 (19)
where the data points number of each sample is N. The order
moments number is n. The discrepancy vector of the nth order
moment is Ln.µiis a parameter vector to weigh between L1and
n
j=2Lj.
In addition to KL divergence, another distance metric for mea-
suring distribution discrepancies is the maximum mean discrep-
ancy (MMD). Many research achievements based on feature adap-
tation using deep neural networks have applied MMD to develop
their diagnosis scheme to deal with the small sample prob-
lem [110]. For example, Li et al. [32] developed a deep balanced
feature adaptation model with multiple convolutional layers for
gearboxes fault diagnosis using limited labeled data samples. The
fault features were extracted from raw data, and then MMD was
applied to measure the discrepancy of the conditional and the
marginal probability distributions of the extracted features. The
presented network was optimized by
min
θ
N
j=1λDj
θXS
M,XT
M+(1λ)
n
i=1Di
θXS
Ci,XT
Ci(20)
where Dj
θXS
M,XT
Mis the discrepancy of the marginal probability
distribution in the jth network layer. Di
θXS
Ci,XT
Ciis the discrep-
ancy of the conditional probability distribution in ith class. The
network layers number and the class number are Nand n.λis a
real number less than 1. To further improve the performance of
feature adaptation, many variants based on the original MMD are
proposed by scholars. For instance, Yang et al. [33] constructed
a convolutional adaptation scheme by minimizing multi-kernel
MMD. A multi-layer MMD based feature adaptation framework
was presented by Li et al. [34] to identify bearing faults using a
few faulty samples.
Despite MMD is effective in measuring distribution discrep-
ancy, the computational cost of MMD increases fast as the num-
ber of samples increases. Compared with MMD, Wasserstein dis-
tance is a more reasonable distance metric when measuring
distribution discrepancy, which has also been used in the fea-
ture adaptation tasks. Cheng et al. [111] used a deep feature
adaptation scheme for faults classification using a few labeled
target samples, in which Wasserstein distance was utilized to
calculate the discrepancy between the target and the source do-
main features. The proposed method was trained by minimizing
the Wasserstein distance between the features from the two
domains, which can be described as
min
θ
1
nS
nS
i=1
fLfθxs
i1
nT
nT
i=1
fLfθxT
i (21)
where fθdenotes the convolutional feature extractor. fLis the
Lipschitz function to satisfy the gradient constraint in calculating
Wasserstein distance. The samples number in the source and the
target domain are nSand nTrespectively.
In addition to minimizing distance metric, another way for
feature adaptation using deep neural networks is adversarial
training. Inspired by GAN, adversarial training can also reduce the
distribution discrepancy of two distributions. For example, Han
et al. [103] constructed an adversarial transfer learning model for
wind turbine fault diagnosis using limited training samples. In
the presented work, the feature descriptor composed of multiple
convolutional layers extracted fault features from the samples in
the two domains. The discrepancy of the two feature distributions
was minimized by a discriminative classifier through adversar-
ial training. And the health conditions were output by a fault
classifier in the end. The proposed method was trained by
min
θ
1
nS
nS
i=1
Jys
i,
ys
i
1
nS
nS
i=1
log Dθxs
i+1
nT
nT
j=1
log 1DθxT
j
(22)
where the classification loss is the first term and the adversarial
loss between the two feature distributions is the second term.
After adversarial training, the diagnosis model could also serve
well in the target diagnosis tasks.
Due to the strong data processing ability, deep neural
networks-based feature adaptation approaches can usually out-
put better diagnosis results than the traditional TCA and JDA.
Nevertheless, the feature adaptation ability depends on the dis-
tance metric sometimes. Besides, deep neural networks-based
feature adaptation schemes assume that the feature spaces of the
two domains overlap to some extent, however, existing studies
cannot tell whether there is overlap between them. The diag-
nosis models may perform poorly on the target diagnosis task if
the discrepancy of the feature distributions cannot be described
explicitly.
4.4. Epilog
The achievements on S&I-IFD using feature learning-based
strategy are reviewed in this section, which are divided into
two classes. The first is to use regularized neural networks like
sparse ones to extract fault features from limited fault data di-
rectly. The second is feature adaptation with the help of other
related datasets based on transfer learning. Through feature adap-
tation, transferable fault features are expected to be learned by
diagnosis models to achieve accurate fault classification. How-
ever, the feature learning-based strategy also has shortcomings.
First, since the fault information provided by a small number
of fault data is always limited, the diagnosis performance im-
proved by the feature learning-based models is also limited.
Second, feature adaptation based on transfer learning requires
the similarity of feature distributions between different datasets.
However, in engineering scenarios, it is difficult to construct
an auxiliary transferable dataset. Moreover, feature adaptation
usually involves the selection of the distance metric, which makes
it hard to achieve the optimal diagnosis results.
5. Classifier design-based strategy for S&I-IFD
5.1. Motivation
In the process of intelligent fault diagnosis, fault identification
using a fault classifier is the last step. The classification perfor-
mance of the fault classifier is an important index to determine
the fault identification accuracy. In the case of lacking fault data,
the trained classifier is usually over-fitted and the classification
accuracy is low. If the fault classifier can be designed to have
strong generalization ability for small and imbalanced data, it
is hopeful to achieve accurate fault identification in the case
12
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
of lacking machine fault data. Scholars have also done a lot of
work on S&I-IFD from the perspective of fault classifier design.
According to whether the auxiliary datasets are used or not,
the design of the fault classifier follows two ideas. The first is
to use the small and imbalanced data to modify the original
fault classifier directly, such as constructing a cost-sensitive faults
classifier [3840]. The second is to pre-train the classifier with
the help of other related datasets based on transfer learning to
achieve good classification performance [41,42,112].
5.2. Fault classifier design using small and imbalanced data
In this part, fault classifiers are designed based on small and
imbalanced data directly. As a specialized model for processing
small samples, support vector machine (SVM) [8] and its variants
can improve the faults classification accuracy with limited faulty
data samples [113]. Besides, cost-sensitive learning [19] is dedi-
cated to learning information from imbalanced data distributions
by applying the cost-sensitive loss function. The cost-sensitive
learning-based fault classifier can also provide an effective solu-
tion for S&I-IFD.
5.2.1. SVM-based methods
5.2.1.1. Introduction to SVM. SVM is a classical data classifier. As
given in Fig. 12, SVM aims at finding a hyperplane in the features
space, which is expected to correctly classify data samples as far
as possible.
For a training dataset {xi,yi}M
i=1,xiis the ith sample and the
sample label is yi[1,1]. The hyperplane H(x)can be de-
scribed as
H(x)=w·x+b=
M
i=1
w·xi+b=0 (23)
where the parameters of H(x)are wand b. Moreover, to classify
the data samples into two classes (the positive one and the
negative one), H(x)should be subject to
yiH(xi)=yi(w·xi+b)1,i=1,2,...,M.(24)
As given in Fig. 12,H(x)and H′′ (x)are the two hyperplanes
satisfying the constraints in Eq. (24). The distance from xito H(x)
can be calculated as di.
di=yi(w·xi+b)
w.(25)
Therefore, the margin γbetween H(x)and H′′ (x)is 2
w. As a
result, SVM will find the hyperplane H(x)between H(x)and
H′′ (x), which can maximize the margin γby optimizing the
objective loss function L.
L=arg max
w,bmin yi(w·xi+b)
w=arg max
w,b2
w.(26)
For the convenience of calculation, the loss function Lis rewritten
as follows:
L=min
w,b
1
2w2
s.t.yi(w·xi+b)1,i=1,2,...,M
.(27)
5.2.1.2. Applications of SVMs to faults classification. Some
researchers utilized SVM and its variants to classify limited fault
data [114119]. For example, a K-means based SVM-tree and
SVM-forest were developed in [114], in which the K-means al-
gorithm was introduced to SVM for sensitive samples selection
from an imbalanced dataset. The results indicated that the pre-
sented network improved the diagnosis performance using a few
faulty data samples. Xi et al. [116] proposed a least-squares SVM
Fig. 12. Illustration of SVM.
(LSSVM-CIL) with parameter regularization for the imbalanced
fault detection of aircraft engines, in which the size of sup-
port vectors was reduced and the representative fault samples
were retained using recursive strategy. The experimental results
proved that LSSVM-CIL was more effective than related methods
in imbalanced fault detection. Based on the traditional SVM,
He et al. [118] presented a nonlinear support tensor machine
containing dynamic penalty factor (DC-NSTM) for faults identifi-
cation of machines in the limited faulty samples case. A tensor
kernel function was added to the DC-NSTM so that it could
process the nonlinear separable problem and improve the overall
classification accuracy with small training samples.
Generally, the SVMs based fault classifiers are optimized by
minimizing the overall structural risk of training samples [8],
so they are more suitable for dealing with the limited fault
data compared to deep neural networks, which are optimized by
minimizing the empirical risk. However, two drawbacks restrict
the applications of SVM. First, the diagnosis accuracy of SVM is
sensitive to the setting of kernel parameters. How to choose a
set of high-quality kernel parameters is one of the core issues
when using the SVM-based fault classifier. Second, although SVM
is good at handling small sample problems, it is difficult to fit
massive monitoring data. With the development of data acqui-
sition technologies, the monitoring data of machines increases
rapidly, which will bring computing challenges to the SVM-based
fault classifier.
5.2.2. Cost-sensitive classifier-based methods
5.2.2.1. Introduction to cost-sensitive learning. As a learning
paradigm, Cost-sensitive learning [19] will give different misclas-
sification losses to different classes contained in a classification
task. Cost-sensitive learning aims at reducing all misclassification
costs on the whole dataset. In other words, cost-sensitive learning
will give more attention to the samples in the minority classes
to improve the overall classification performance on imbalanced
datasets.
Given a training dataset {xi,yi}M
i=1containing Mtraining sam-
ples, the ith sample is xiand the ith sample label is yi
[1,2,...,K]. Assume a misclassification loss Cu,v, which repre-
sents the loss or the penalty of misclassifying the sample xiin
class uto class v. For a classification task, the minimum misclas-
sification loss should be achieved when classifying the sample
xiinto a class. Specifically, the misclassification loss L(u|xi)of
sample xiclassified into class ucan be described as
L(u|xi)=
K
v=1
P(v|xi)Cu,v (28)
13
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
where P(v|xi)denotes the probability distribution of classifying
the sample xiinto the class v. Moreover, if u=v,Cu,v =0, which
is the loss of classifying sample xicorrectly. Therefore, the overall
expected misclassification cost on the training dataset {xi,yi}M
i=1
can be described as
L(C)=
M
i=1
K
v=1
P(v|xi)Cu,v .(29)
Finally, the ideal classifier will make a decision by minimizing the
overall expected misclassification cost L(C).
5.2.2.2. Applications of cost-sensitive classifier to faults classifica-
tion. In S&I-IFD, how to design and assign the misclassification
loss Cu,v is the key to the applications of cost-sensitive learning.
For an imbalanced dataset, the imbalance ratio is an important
index to measure the imbalance degree. For a training dataset
{xi,yi}M
i=1, the ith sample is xiand the ith sample label is yi
[1,2,...,K]. The imbalance ratio ru,v of the class uto the class v
is defined as
n(k)=
M
i=1
1{yi=k}(30)
ru,v =n(v)
n(u)(31)
where n(k)represents the number of samples in class kand 1 {·}
is an indicator function returning 1 if yi=kand 0 otherwise. It is
a common choice that the misclassification loss Cu,v is designed
based on the data imbalance ratio, i.e., if u̸= v,Cu,v =ru,v ,
and if u=v,Cu,v =0. By this design, the classification model
will pay more attention to the minority classes to improve the
identification accuracy of the minority classes. Many studies have
shown the effectiveness of applying the imbalance ratio into
the design of the cost-sensitive loss function [94,98,120]. For
example, Geng et al. [98] presented a diagnosis scheme using
deep residual feature learning, in which the imbalance-weighted
cross-entropy (IWCE) was used for imbalanced fault classification.
The original cross-entropy (CE) can be described as
CE = −
K
i=1
yilog ˆ
Pi(32)
where the class number is K.
yiis the one-hot vector represent-
ing labels information and ˆ
Pidenotes the output of the softmax
classifier. Based on the original CE, IWCE used the data imbalance
ratios to weight the minority classes to enhance the samples’
influence in the minority classes.
IWCE = −
K
i=1
wi
yilog ˆ
Pi(33)
where wiis a function just related to the data imbalance ratios.
Besides, some researchers have combined the real-time classi-
fication results and the data imbalance ratios to design the cost-
sensitive loss function because the real-time training results are
thought to be able to indicate the updating of parameters [38,39,
121]. For instance, Dong et al. [38] adopted a cost-adaptive net-
work structure for imbalanced mechanical data classification, in
which the cost-sensitive loss function Lwas designed as follows
L= −
K
i=1
ti
yilog ˆ
Pi(34)
where tiis a function related to the data imbalance ratios, the
evaluation metric Gmean, and the Euclidean distance Ed.
ti=riexp Gmean
2exp 1
2Ed(35)
Gmean =TP
TP +FN TN
TN +FP (36)
Ed=1
n(k)
n(k)
i=1
yiˆ
Pi2
(37)
where riis the data imbalance ratio. True positive, false positive,
true negative and false negative are represented by TP,FP,TN ,
and FN.
On the whole, cost-sensitive learning pays more attention to
the fault samples in the minority classes through misclassification
losses assignment, which ensures the fault identification accuracy
of the minority fault samples. The output of the cost-sensitive
fault classifier is sensitive to the design of the cost-sensitive
loss function. Most of the current research achievements set the
cost-sensitive loss function based on the data imbalance ratios,
which is indeed effective, but how to update it to obtain better
results is still worth exploring. In the future, one of the possible
solutions is to set the cost-sensitive loss function automatically
using the attention mechanism [122], which has been applied in
sensitive information selection and adaptive weights assignment
successfully.
5.3. Fault classifier design using transfer learning
In this part, fault classifiers are designed with the help of
other related datasets. In the transfer learning scenario, some
model parameters can be shared by the target and the source
domain data [86]. Based on this, scholars use parameter transfer-
based approaches to design the classifier. After pre-training with
the source domain data, the parameters of the faults classifier
are fine-tuned using a few target domain samples. As a result,
the fine-tuned fault classifier will be expected to achieve high
classification accuracies in the diagnosis tasks.
5.3.1. Parameter transfer-based methods
In parameter transfer-based methods, the parameters of diag-
nosis models are first pre-trained using sufficient source domain
data. After that, the classification layers of the pre-trained mod-
els are fine-tuned using a few target domain data. The idea of
parameter transfer-based approaches is relatively simple, but it
is widely used. For example, Kim et al. [43] and Li et al. [123]
constructed parameter transfer-based fault classifiers with deep
convolutional neural network (DCNN), the parameters were pre-
trained using an existing dataset. After pre-training, the Softmax
classifier in the last layer of DCNN was fine-tuned using an-
other small dataset. The fine-tuned Softmax classifier was able
to classify data samples in the new dataset. The methods had
good diagnosis performance for bearings using small training
samples. Similarly, to identify gearbox faults using small training
samples, Cao et al. [124] and Wu et al. [125] applied parameters
based transfer learning for fault classifier design with DCNN, the
experimental results showed that the pre-trained fault classifiers
could obtain high accuracy on target gearbox diagnosis tasks after
fine-tuning.
Besides, some scholars believe that updating all the model
parameters in the fine-tuning stage will be more helpful for
accurate fault identification than just updating the classifier lay-
ers. Therefore, the fault classifier can be obtained after global
parameters fine-tuning in this case. For example, He et al. [112]
applied a transfer learning model based on multi-wavelet deep
auto-encoder for the gearbox fault classifier design, in which all
the model parameters were pre-trained using vibration data from
one working condition and fine-tuned with vibration data from
another working condition. After fine-tuning, the obtained clas-
sifier could achieve high diagnosis accuracy in the new working
14
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
condition. Similarly, Li et al. [41] and He et al. [42] adopted deep
transfer auto-encoders to design the faults classifier of bearings,
and the fault classifiers were obtained after fine-tuning with a
few target domain samples.
In general, the size of the source domain dataset will influence
the classification accuracies of the fault classifiers obtained using
parameter transfer-based approaches. The larger the source do-
main dataset used for pre-training is, the better the performance
of the obtained fault classifier is. However, it is difficult to con-
struct an ideal pre-training dataset in practice, which is one of the
major problems in the application of parameter transfer-based
fault classifier design. If the source domain dataset is not large
enough, the fault classifier obtained in this way will have poor
diagnosis performance on the target diagnosis tasks.
5.4. Epilog
This section reviews the achievements of dealing with S&I-
IFD based on the classifier design strategy. According to whether
auxiliary datasets are used or not, fault classifier design-based
strategies have two ways. The first is designing fault classifiers
using small and imbalanced data directly, such as optimizing SVM
or developing cost-sensitive classifiers. This kind of method gen-
erally depends on the engineering experience of the researchers,
especially the design of cost-sensitive loss functions, so the op-
timal results are difficult to be achieved. The second is to use
auxiliary datasets to pre-train diagnosis models and then fine-
tuning the classifier with a few fault data to get the final fault
classifier. The performance of the fault classifier obtained in this
way depends on the quality of the auxiliary dataset. When the
auxiliary dataset is not large enough, the classification ability of
the fault classifier is usually not strong enough.
6. Future challenges and possible extensions for S&I-IFD
In the end, we try to discuss future challenges and provide
some possible extensions for S&I-IFD based on the existing re-
search achievements.
6.1. How to improve the quality of the augmented samples in S&I-
IFD?
Benefiting from new machine learning theories and technolo-
gies like GAN and VAE, many existing achievements have proved
that the performance of S&I-IFD can be improved by expanding
the size of the training samples set using data generation and
over-sampling. However, by reviewing these research achieve-
ments, it can be found that the existing researches mainly focus
on expanding the size of fault data samples and lack attention to
the quality of the samples. Specifically, when the size of training
samples is too small, the samples generated by generative mod-
els are too similar to the real samples, which means the fault
information increased by this way is very limited. For the data
over-sampling models like SMOTE, the synthesized fault samples
have a strong linear relationship with the training samples due
to the problem of distributional marginalization [83]. Although
these generated samples can expand the size of training samples,
it is not clear how much fault information they can provide
for the training of the diagnosis models. If they cannot provide
more fault information, the low-quality generated samples will
have a limited improvement in the diagnosis performance of the
intelligent diagnosis models.
In future researches, the authors believe that researchers need
to pay attention not only to the size of samples but also to
the quality of samples. First, in addition to data generation,
data over-sampling, and data reweighting, more different data
augmentation ways can be applied [126129]. For example, Yu
et al. [126] tried seven kinds of data augmentation strategies
via hand-crafted rules to augment the vibration signals of rolling
bearings, including local data reversing, local random reversing,
global data reversing, local data zooming, global data zooming,
local segment splicing, and noise addition. Compared with other
data augmentation strategies such as data generation, these data
augmentation methods require less computing resources and less
computing time. Moreover, experimental results showed that
these data augmentation methods could also improve the diagno-
sis performance of S&I-IFD significantly. Besides, the existing data
augmentation strategies are often tailor-made for each dataset
and cannot be easily used in other datasets [17]. To address this,
scholars proposed AutoAugment [130], which can automatically
learn a data augmentation strategy for neural network. Inspired
by this, the fault data samples augmented through AutoAugment
may provide a good solution for S&I-IFD. In addition to the data
augmentation methods mentioned above, some researchers used
semi-supervised learning-based models to select data samples
with target labels from a large unlabeled dataset to expand
the target dataset directly [131]. In engineering scenarios, the
unlabeled monitoring datasets are easier to collect and usually
have a larger size than the labeled datasets. Therefore, the use
of unlabeled datasets is also helpful to expand the limited target
datasets and improve the performance of diagnosis models.
Second, how to establish the samples quality evaluation in-
dexes is also an important issue. In [12] and [52], researchers
used the Pearson correlation coefficient to evaluate the similarity
of the generated data and the real data. However, the excessive
similarity of the generated and the real data will lead to infor-
mation redundancy, which has a very limited improvement on
the generalization ability of the diagnosis models. Therefore, it is
not appropriate to evaluate the generated samples’ quality only
from similarity. From the aspect of data augmentation, it is also
significant to establish a relatively objective and reliable evalu-
ation index for the generated samples to improve the diagnosis
performance of S&I-IFD.
6.2. How to prevent transfer learning-based approaches from nega-
tive transfer in S&I-IFD?
Among the three strategies, transfer learning-based approa-
ches account for a large proportion, so it is an important theory
for S&I-IFD. However, when negative transfer occurs, the transfer
learning-based models will perform poorly in the case of lacking
data samples. Negative transfer refers to the case that the knowl-
edge extracted in the source domain harms the target task [86].
Negative transfer will occur if the distribution discrepancies of
the target and the source domain data are too big. For example,
when the source domain data is the bearings faulty samples
while the target data is the gears faulty samples, the knowledge
learned in the bearings faulty samples is meaningless or even has
a negative impact to the gears fault diagnosis. In addition, the
transferable components between the two domains are the foun-
dation of transfer learning, like data samples, data features, or
model parameters. In some cases, although the data distributions
in the two domains are similar, negative transfer may also occur
when the diagnosis model fails to find the components that can
be transferred. For example, the physical structures of motors and
generators are similar and their fault data distributions are also
similar. However, if the transfer learning-based diagnosis models
cannot find the components that can be transferred, the diagnosis
knowledge learned from the motor fault data is useless for the
generator fault diagnosis.
It is a big challenge for S&I-IFD to avoid the negative trans-
fer. First, to describe the discrepancies of the data distributions
15
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
in the target and the source domain, reasonable measurement
rules need to be developed. In existing researches, most re-
searchers rely on engineering experience to judge the similarity
of the data distributions in the two domains, however, it lacks
a unified and effective standard. Therefore, developing a distri-
bution similarity metric is worth exploring in future research.
Second, to build effective diagnosis models, the idea of transi-
tive transfer learning is worth trying [132,133]. Different from
traditional transfer learning methods, which involve only two
domains, transitive transfer learning connects multiple related
domains and updates the learned knowledge in a transitive man-
ner, which may provide a feasible idea for the construction of
general transfer learning-based diagnosis models for S&I-IFD.
6.3. Meta-learning theory and its possible applications in S&I-IFD
Meta-learning, or learning to learn, is an outstanding and new
machine learning theory. The purpose of meta-learning is to im-
prove the learning level from data to tasks and enable algorithms
to obtain transferable knowledge from multiple tasks [134]. By
training on various related tasks with few data, knowledge can be
accumulated over several training episodes and used to the new
but related task without fine-tuning [135], which makes meta-
learning-based methods suitable for dealing with small sample
problems.
Generally speaking, meta-learning has three categories:
optimization-based methods, model-based methods, and metric-
based methods [136]. Among them, optimization-based models
aim at the learning of the meta-knowledge, which is the initializa-
tion parameters of the network, and then iterate them with a few
training samples to get good classifiers. Model-Agnostic Meta-
Learning (MAML) [137] is the most famous meta-learning method
based on optimization. Model-based methods are good at data-
efficient few-shot learning [138]. They can embed the current
training dataset into the activation condition and predict the test
data based on this condition. Recurrent neural network [139],
convolutional neural network [140], and hyper-network [141] are
the typical architectures of the model-based meta-learning. Fi-
nally, metric-based methods are trained by comparing the train-
ing datasets with the validation datasets. Siamese network [142],
matching network [143], prototypical network [144], and relation
network [145] are typical meta-learning models based on metric.
On the whole, meta-learning-based models have two obvious
characteristics. The first is that meta-learning-based models are
trained through learning the task of ‘‘N-way K-shot’’, where the
classes number is N and the training samples number in each
class is K. Generally, K is small, which means meta-learning
is suitable for the case of lacking fault samples in engineer-
ing scenarios. The second is that meta-learning-based models
have strong generalization ability. Some models like matching
network [146] can perform well in the classification task even
containing new class data that have not been trained in the
training stage, which means meta-learning is good at dealing with
actual problems in engineering scenarios.
It is worth noting that some scholars have tried to apply meta-
learning theory to solve the S&I-IFD problem and some prelimi-
nary results have been achieved. For example, Chang et al. [20]
presented a faults identification scheme for bearings in satellite
communication antenna, in which a meta-learning module based
on relation network was applied to measure the correlation de-
gree of vibration data so as to realize bearings faults identification
using small sample. In [125], a meta-learning framework based
on the meta-relation net was presented for machine fault diag-
nosis. The experimental results showed that this meta-relation
net-based model was suitable for fault classification with a few
training samples.
At present, intelligent diagnosis models using meta-learning
theory have not been deeply developed. The existing research
results are mainly based on relation network to build diagno-
sis models, however, Siamese network, matching network, and
prototypical network have not been applied yet. In addition to
metric-based approaches, optimization-based and model-based
approaches can achieve good results in image classification in
the small sample case [138]. How to use them to build intelli-
gent diagnosis models is worthy of further exploration. Overall,
meta-learning theory has great potential to solve the problem
of S&I-IFD, so it is one of the important directions for future
research.
6.4. Zero-shot learning theory and its possible applications in S&I-
IFD
Zero-shot learning [147] may bring research breakthroughs
in S&I-IFD. Zero-shot learning uses seen data, which has been
collected in practice, for training and realizes the recognition
of unseen data, which has not been collected. In engineering
scenarios, most collected data are under normal conditions, fault
data are rare. In extreme cases, researchers cannot obtain fault
signals under a certain fault type or under a certain working
condition, which means diagnosis models do not have training
samples from unseen data classes. In intelligent fault diagnosis,
the recognition of the unseen data classes is a quite hard task,
which is difficult to accomplish using common diagnosis models.
Zero-shot learning is a feasible way to recognize unseen data,
which is a valuable direction for further research in S&I-IFD.
Zero-shot learning realizes the recognition of unseen classes
by inferring from seen classes to unseen classes [148], which
has been applied in image recognition widely. Zero-shot learning
mainly includes model embedding [149] and feature genera-
tion [150], etc. Through training on seen classes, the model can
learn the mapping relationship between the data features and
their attributes, while the correlativity between the attributes
and the data labels is predefined. Based on the learned map-
ping relationships between the features and the attributes, the
model could infer the attributes of unseen classes in the testing
stage and realize the recognition of unseen classes through the
correlativity between the attributes and the data labels.
In intelligent fault diagnosis, scholars have begun the prelim-
inary research on the zero-shot data classification. A zero-shot
diagnosis model using contractive auto-encoder was presented
in [151] to identify machine faults without faulty samples. Feng
et al. [152] used a faults description model based on the attribute
transfer strategy for the zero-sample fault classification of com-
plex mechanical systems. Lv et al. [153] proposed a conditional
adversarial de-noising auto-encoder for machine fault identifica-
tion without fault data, which generated unseen classes with the
hybrid attribute as conditions.
At present, the research on intelligent diagnosis using zero-
shot learning theory has obtained preliminary achievements in
the perspective of data attributes description and data features
generation. In machine faults identification, the attributes of ma-
chine monitoring data are related to the monitoring object and
the data type. For example, due to the difference of fault form
and fault mechanism, the attributes of induction motor moni-
toring data are different from that of generator monitoring data.
Moreover, for some complex equipment, such as aero-engines,
their monitoring data include pressure data, temperature data,
flow data, vibration data, and so on. These different types of
monitoring data have different data attributes. Therefore, how to
effectively describe data attributes according to different moni-
toring objects and data types is one of the key research directions
in the future, which is of great value to the application of zero-
shot learning-based diagnosis models. In addition, how to learn
16
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
and generate general data features is an important basis for zero-
shot learning. These existing research results are mainly based
on auto-encoder to generate data features in unseen classes [151,
153]. In the future, how to use other models such as GAN [154]
to achieve feature learning and generation is a necessary research
direction. Generally speaking, zero-shot learning theory has a
strong application value for machine fault diagnosis under small
sample conditions. Although there have been some preliminary
research results, we think it still has a broad research space.
Therefore, how to design effective diagnosis models based on
zero-shot learning is an important research direction for S&I-IFD
in the future.
7. Conclusions
S&I-IFD has attracted the attention of scholars for a long time.
In this paper, we review the research achievements on S&I-IFD,
which can be classified into three categories: data augmentation-
based strategy, feature learning-based strategy, and classifier
design-based strategy. Specifically, data augmentation-based
strategy improves the diagnosis performance on small &imbal-
anced data by generating, over-sampling, or reweighting the
training data samples. Feature learning-based strategy learns the
fault features from small &imbalanced data using regularized
neural networks or feature adaptation. Classifier design-based
strategy achieves high diagnosis accuracy by designing the fault
classifier suitable for small &imbalanced data classification.
For future research, how to enhance the augmented samples’
quality is a problem that needs to be paid more attention to. Be-
sides, how to prevent transfer learning-based diagnosis schemes
from the negative transfer is a challenge for further applications
in engineering scenarios. Finally, meta-learning theory and zero-
shot learning theory have great potential in dealing with the
S&I-IFD problem, which may bring research breakthroughs in the
future.
Declaration of competing interest
The authors declare that they have no known competing finan-
cial interests or personal relationships that could have appeared
to influence the work reported in this paper.
Acknowledgments
The authors would like to sincerely thank all the anonymous
reviewers for the valuable comments that greatly helped to im-
prove the manuscript.
This research is supported financially by the National Natu-
ral Science Foundation of China (No. 91960106, No. 51875436,
No. U1933101, No. 61633001, No. 51421004, No. 51965013),
China Postdoctoral Science Foundation (No. 2020T130509, No.
2018M631145) and Shaanxi Natural Science Foundation, China
(No. 2019JM-041).
References
[1] Pan J, Zi Y, Chen J, Zhou Z, Wang B. LiftingNet: A Novel deep learning
network with layerwise feature learning from noisy mechanical data
for fault classification. IEEE Trans Ind Electron 2018;65:4973–82. http:
//dx.doi.org/10.1109/TIE.2017.2767540.
[2] Jiang W, Zhou J, Liu H, Shan Y. A multi-step progressive fault diagnosis
method for rolling element bearing based on energy entropy theory and
hybrid ensemble auto-encoder. ISA Trans 2019;87:235–50. http://dx.doi.
org/10.1016/j.isatra.2018.11.044.
[3] Xiang Z, Zhang X, Zhang W, Xia X. Fault diagnosis of rolling bearing
under fluctuating speed and variable load based on TCO spectrum and
stacking auto-encoder. Meas J Int Meas Confed 2019;138:162–74. http:
//dx.doi.org/10.1016/j.measurement.2019.01.063.
[4] Zhang K, Chen J, Zhang T, Zhou Z. A compact convolutional neural
network augmented with multiscale feature extraction of acquired mon-
itoring data for mechanical intelligent fault diagnosis. J Manuf Syst 2020.
http://dx.doi.org/10.1016/j.jmsy.2020.04.016.
[5] Chang Y, Chen J, Qu C, Pan T. Intelligent fault diagnosis of wind
turbines via a deep learning network using parallel convolution layers
with multi-scale kernels. Renew Energy 2020. http://dx.doi.org/10.1016/
j.renene.2020.02.004.
[6] Pan T, Chen J, Zhou Z, Wang C, He S. A novel deep learning network
via multiscale inner product with locally connected feature extraction
for intelligent fault detection. IEEE Trans Ind Informatics 2019. http:
//dx.doi.org/10.1109/tii.2019.2896665.
[7] Pan T, Chen J, Pan J, Zhou Z. A deep learning network via shunt-wound
restricted Boltzmann machines using raw data for fault detection. IEEE
Trans Instrum Meas 2020. http://dx.doi.org/10.1109/TIM.2019.2953436.
[8] Lei Y, Yang B, Jiang X, Jia F, Li N, Nandi AK. Applications of machine
learning to machine fault diagnosis: A review and roadmap. Mech Syst
Signal Process 2020;138:106587. http://dx.doi.org/10.1016/j.ymssp.2019.
106587.
[9] Hashmi MB, Majid MAA, Lemma TA. Combined effect of inlet air cooling
and fouling on performance of variable geometry industrial gas turbines.
Alexandria Eng J 2020. http://dx.doi.org/10.1016/j.aej.2020.04.050.
[10] Martin-Diaz I, Morinigo-Sotelo D, Duque-Perez O, De Romero-Troncoso RJ.
Early fault detection in induction motors using adaboost with im-
balanced small data and optimized sampling. IEEE Trans Ind Appl
2017;53:3066–75. http://dx.doi.org/10.1109/TIA.2016.2618756.
[11] Gao L, Ren Z, Tang W, Wang H, Chen P. Intelligent gearbox di-
agnosis methods based on SVM, wavelet lifting and RBR. Sensors
2010;10:4602–21. http://dx.doi.org/10.3390/s100504602.
[12] Zhang T, Chen J, Li F, Pan T. A small sample focused intelligent fault
diagnosis scheme of machines via multi-modules learning with gradient
penalized generative adversarial networks, vol. 0046. 2020, http://dx.doi.
org/10.1109/TIE.2020.3028821.
[13] Xiao D, Huang Y, Qin C, Liu Z, Li Y, Liu C. Transfer learning with convolu-
tional neural networks for small sample size problem in machinery fault
diagnosis. Proc Inst Mech Eng Part C J Mech Eng Sci 2019;233:5131–43.
http://dx.doi.org/10.1177/0954406219840381.
[14] Zhao R, Yan R, Chen Z, Mao K, Wang P, Gao RX. Deep learning and
its applications to machine health monitoring. Mech Syst Signal Process
2019. http://dx.doi.org/10.1016/j.ymssp.2018.05.050.
[15] Gangsar P, Tiwari R. Signal based condition monitoring techniques for
fault detection and diagnosis of induction motors: A state-of-the-art
review. Mech Syst Signal Process 2020. http://dx.doi.org/10.1016/j.ymssp.
2020.106908.
[16] Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G. Learning
from class-imbalanced data: Review of methods and applications. Expert
Syst Appl 2017;73:220–39. http://dx.doi.org/10.1016/j.eswa.2016.12.035.
[17] Wang Y, Yao Q, Kwok JT, Ni LM. Generalizing from a few examples: A
survey on few-shot learning. ACM Comput Surv 2020;53. http://dx.doi.
org/10.1145/3386252.
[18] Sun Y, Wong AKC, Kamel MS. Classification of imbalanced data: A review.
Int J Pattern Recognit Artif Intell 2009;23:687–719. http://dx.doi.org/10.
1142/S0218001409007326.
[19] He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data
Eng 2009. http://dx.doi.org/10.1109/TKDE.2008.239.
[20] Chang YH, Chen JL, He SL. Intelligent fault diagnosis of satellite com-
munication antenna via a novel meta-learning network combining with
attention mechanism. J Phys Conf Ser 2020;1510. http://dx.doi.org/10.
1088/1742-6596/1510/1/012026.
[21] Pan T, Chen J, Qu C, Zhou Z. A method for mechanical fault recognition
with unseen classes via unsupervised convolutional adversarial auto-
encoder. Meas Sci Technol 2020. http://dx.doi.org/10.1088/1361-6501/
abb38.
[22] Govindan K, Jepsen MB. ELECTRE: A comprehensive literature review on
methodologies and applications. Eur J Oper Res 2016;250:1–29. http:
//dx.doi.org/10.1016/j.ejor.2015.07.019.
[23] Yang J, Xie G, Yang Y. An improved ensemble fusion autoencoder model
for fault diagnosis from imbalanced and incomplete data. Control Eng
Pract 2020;98. http://dx.doi.org/10.1016/j.conengprac.2020.104358.
[24] Zhao X, Jia M, Lin M. Deep Laplacian auto-encoder and its application into
imbalanced fault diagnosis of rotating machinery. Meas J Int Meas Confed
2020;152:107320. http://dx.doi.org/10.1016/j.measurement.2019.107320.
[25] Dixit S, Verma NK. Intelligent condition based monitoring of rotary
machines with few samples. IEEE Sens J 2020;1748:1. http://dx.doi.org/
10.1109/jsen.2020.3008177.
[26] Liu J, Qu F, Hong X, Zhang H. A small-sample wind turbine fault detection.
IEEE Trans Ind Informatics 2019;15:3877–88.
[27] Liu Q, Ma G, Cheng C. Data fusion generative adversarial network for
multi-class imbalanced fault diagnosis of rotating machinery. IEEE Access
2020;8:70111–24. http://dx.doi.org/10.1109/ACCESS.2020.2986356.
17
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
[28] Hang Q, Yang J, Xing L. Diagnosis of rolling bearing based on classification
for high dimensional unbalanced data. IEEE Access 2019;7:79159–72.
http://dx.doi.org/10.1109/ACCESS.2019.2919406.
[29] Shen F, Chen C, Yan R, Gao RX. Bearing fault diagnosis based on SVD
feature extraction and transfer learning classification. In: Proc. 2015
progn. syst. heal. manag. conf.. 2016, http://dx.doi.org/10.1109/PHM.2015.
7380088.
[30] Zeng Y, Wu X, Chen J. Bearing fault diagnosis with denoising autoencoders
in few labeled sample case. In: 2020 5th IEEE int conf big data anal.. 2020,
p. 349–53. http://dx.doi.org/10.1109/ICBDA49040.2020.9101321.
[31] Saufi SR, Bin ZA, Leong MS, Lim MH. Gearbox fault diagnosis using a
deep learning model with limited data sample. IEEE Trans Ind Informatics
2020;16:6263–71. http://dx.doi.org/10.1109/TII.2020.2967822.
[32] Li Q, Tang B, Deng L, Wu Y, Wang Y. Deep balanced domain adaptation
neural networks for fault diagnosis of planetary gearboxes with limited
labeled data. Meas J Int Meas Confed 2020;156:107570. http://dx.doi.org/
10.1016/j.measurement.2020.107570.
[33] Yang B, Lei Y, Jia F, Xing S. A transfer learning method for intelligent
fault diagnosis from laboratory machines to real-case machines. In: Proc.
- 2018 int. conf. sensing, diagnostics, progn. control. 2019, http://dx.doi.
org/10.1109/SDPC.2018.8664814.
[34] Li X, Zhang W, Ding Q, Sun JQ. Multi-layer domain adaptation method
for rolling bearing fault diagnosis. Signal Process 2019. http://dx.doi.org/
10.1016/j.sigpro.2018.12.005.
[35] Chen F, Tang B, Chen R. A novel fault diagnosis model for gearbox based
on wavelet support vector machine with immune genetic algorithm.
Meas J Int Meas Confed 2013;46:220–32. http://dx.doi.org/10.1016/j.
measurement.2012.06.009.
[36] Deng S, Lin SY, Chang WL. Application of multiclass support vector
machines for fault diagnosis of field air defense gun. Expert Syst Appl
2011;38:6007–13. http://dx.doi.org/10.1016/j.eswa.2010.11.020.
[37] Chen F, Tang B, Song T, Li L. Multi-fault diagnosis study on roller
bearing based on multi-kernel support vector machine with chaotic
particle swarm optimization. Meas J Int Meas Confed 2014;47:576–90.
http://dx.doi.org/10.1016/j.measurement.2013.08.021.
[38] Dong X, Gao H, Guo L, Li K, Duan A. Deep cost adaptive convo-
lutional network: A classification method for imbalanced mechanical
data. IEEE Access 2020;8:71486–96. http://dx.doi.org/10.1109/ACCESS.
2020.2986419.
[39] Zhang C, Tan KC, Li H, Hong GS. A cost-sensitive deep belief network
for imbalanced classification. IEEE Trans Neural Networks Learn Syst
2019;30:109–22. http://dx.doi.org/10.1109/TNNLS.2018.2832648.
[40] Peng P, Zhang W, Zhang Y, Xu Y, Wang H, Zhang H. Cost sensitive
active learning using bidirectional gated recurrent neural networks for
imbalanced fault diagnosis. Neurocomputing 2020;407:232–45. http://dx.
doi.org/10.1016/j.neucom.2020.04.075.
[41] Li X, Jiang H, Zhao K, Wang R. A deep transfer nonnegativity-constraint
sparse autoencoder for rolling bearing fault diagnosis with few labeled
data. IEEE Access 2019;7:91216–24. http://dx.doi.org/10.1109/ACCESS.
2019.2926234.
[42] He Z, Shao H, Zhang X, Cheng J, Yang Y. Improved deep transfer auto-
encoder for fault diagnosis of gearbox under variable working conditions
with small training samples. IEEE Access 2019;7:115368–77. http://dx.
doi.org/10.1109/access.2019.2936243.
[43] Kim H, Youn BD. A new parameter repurposing method for parameter
transfer with small dataset and its application in fault diagnosis of
rolling element bearings. IEEE Access 2019;7:46917–30. http://dx.doi.org/
10.1109/ACCESS.2019.2906273.
[44] Chen J, Chang Y, Qu C, Zhang M, Li F, Pan J. Intelligent impulse finder:
A boosting multi-kernel learning network using raw data for mechanical
fault identification in big data era. ISA Trans 2020. http://dx.doi.org/10.
1016/j.isatra.2020.07.039.
[45] Yu Y, Tang B, Lin R, Han S, Tang T, Chen M. CWGAN: Conditional
wasserstein generative adversarial nets for fault data generation. In: IEEE
int conf robot biomimetics. 2019, p. 2713–8. http://dx.doi.org/10.1109/
ROBIO49542.2019.8961501.
[46] Pan T, Chen J, Xie J, Zhou Z, He S. Deep feature generating network:
A new method for intelligent fault detection of mechanical systems
under class imbalance. IEEE Trans Ind Informatics 2020;3203:1. http:
//dx.doi.org/10.1109/tii.2020.3030967.
[47] Wu Z, Lin W, Fu B, Guo J, Ji Y, Pecht M. A local adaptive minority
selection and oversampling method for class-imbalanced fault diagnostics
in industrial systems. IEEE Trans Reliab 2019;1–12. http://dx.doi.org/10.
1109/TR.2019.2942049.
[48] Zhang Y, Li X, Gao L, Wang L, Wen L. Imbalanced data fault diagnosis of
rotating machinery using synthetic oversampling and feature learning. J
Manuf Syst 2018;48:34–50. http://dx.doi.org/10.1016/j.jmsy.2018.04.005.
[49] Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S,
et al. Generative adversarial nets. Adv Neural Inf Process Syst 2014.
[50] Kingma DP, Welling M. Auto-encoding variational bayes. In: 2nd int. conf.
learn. represent. ICLR 2014 - conf. track proc. 2014.
[51] Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B,
Bharath AA. Generative adversarial networks: An overview. IEEE Signal
Process Mag 2018. http://dx.doi.org/10.1109/MSP.2017.2765202.
[52] Shao S, Wang P, Yan R. Generative adversarial networks for data aug-
mentation in machine fault diagnosis. Comput Ind 2019;106:85–93. http:
//dx.doi.org/10.1016/j.compind.2019.01.001.
[53] Radford A, Metz L, Chintala S. Unsupervised representation learning with
deep convolutional generative adversarial networks. In: 4th int. conf.
learn. represent. ICLR 2016 - conf. track proc. 2016.
[54] Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial
networks. In: 34th int. conf. mach. learn. 2017.
[55] Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A. Improved
training of wasserstein GANs. Adv Neural Inf Process Syst 2017.
[56] Mirza M, Osindero S. Conditional generative adversarial nets. 2014, p.
1–7.
[57] Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary
classifier gans. In: 34th int. conf. mach. learn. 2017.
[58] Salimans T, Goodfellow I, Zaremba W, Cheung V, Radford A, Chen X.
Improved Techniques for Training GANs. Adv Neural Inf Process Syst
2016.
[59] Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P. Info-
GAN: Interpretable representation learning by information maximizing
generative adversarial nets. Adv Neural Inf Process Syst 2016.
[60] Yin H, Li Z, Zuo J, Liu H, Yang K, Li F. Wasserstein generative adversarial
network and convolutional neural network (WG-CNN) for bearing fault
diagnosis. Math Probl Eng 2020;2020. http://dx.doi.org/10.1155/2020/
2604191.
[61] Gao X, Deng F, Yue X. Data augmentation in fault diagnosis based on the
wasserstein generative adversarial network with gradient penalty. Neu-
rocomputing 2020;396:487–94. http://dx.doi.org/10.1016/j.neucom.2018.
10.109.
[62] Zhang W, Li X, Jia XD, Ma H, Luo Z, Li X. Machinery fault diagnosis
with imbalanced data using deep generative adversarial networks. Meas J
Int Meas Confed 2020;152:152. http://dx.doi.org/10.1016/j.measurement.
2019.107377.
[63] Zhang T, Chen J, Xie J, T. Pan. SASLN: Signals augmented self-taught
learning networks for mechanical fault diagnosis under small sample
condition, vol. 9456. 2020, http://dx.doi.org/10.1109/TIM.2020.3043098.
[64] Wu J, Zhao Z, Sun C, Yan R, Chen X. Ss-infogan for class-imbalance
classification of bearing faults. Procedia Manuf 2020;49:99–104. http:
//dx.doi.org/10.1016/j.promfg.2020.07.003.
[65] Wang Z, Wang J, Wang Y. An intelligent diagnosis scheme based on
generative adversarial learning deep neural networks and its applica-
tion to planetary gearbox fault pattern recognition. Neurocomputing
2018;310:213–22. http://dx.doi.org/10.1016/j.neucom.2018.05.024.
[66] Zou L, Li Y, Xu F. An adversarial denoising convolutional neural network
for fault diagnosis of rotating machinery under noisy environment and
limited sample size case. Neurocomputing 2020;407:105–20. http://dx.
doi.org/10.1016/j.neucom.2020.04.074.
[67] Wang J, Li S, Han B, An Z, Bao H, Ji S. Generalization of deep neural
networks for imbalanced fault classification of machinery using genera-
tive adversarial networks. IEEE Access 2019;7:111168–80. http://dx.doi.
org/10.1109/access.2019.2924003.
[68] Ding Y, Ma L, Ma J, Wang C, Lu C. A generative adversarial network-
based intelligent fault diagnosis method for rotating machinery under
small sample size conditions. IEEE Access 2019;7:149736–49. http://dx.
doi.org/10.1109/ACCESS.2019.2947194.
[69] Mao W, Liu Y, Ding L, Li Y. Imbalanced fault diagnosis of rolling bearing
based on generative adversarial network: A comparative study. IEEE
Access 2019;7:9515–30. http://dx.doi.org/10.1109/ACCESS.2018.2890693.
[70] Ren Wang Y, Dong SunG, Jin Q. Imbalanced sample fault diagnosis of
rotating machinery using conditional variational auto-encoder generative
adversarial network. Appl Soft Comput J 2020;92:106333. http://dx.doi.
org/10.1016/j.asoc.2020.106333.
[71] Zheng T, Song L, Guo B, Liang H, Guo L. An efficient method based on con-
ditional generative adversarial networks for imbalanced fault diagnosis of
rolling bearing. In: 2019 progn syst heal manag conf PHM-Qingdao, vol.
2019. 2019, http://dx.doi.org/10.1109/PHM-Qingdao46334.2019.8942906.
[72] Zheng T, Song L, Wang J, Teng W, Xu X, Ma C. Data synthesis using
dual discriminator conditional generative adversarial networks for im-
balanced fault diagnosis of rolling bearings. Meas J Int Meas Confed
2020;158:107741. http://dx.doi.org/10.1016/j.measurement.2020.107741.
[73] Li Z, Zheng T, Wang Y, Cao Z, Guo Z, Fu H. A novel method for imbalanced
fault diagnosis of rotating machinery based on generative adversarial
networks. IEEE Trans Instrum Meas 2020;9456:1. http://dx.doi.org/10.
1109/tim.2020.3009343.
[74] Zhou F, Yang S, Fujita H, Chen D, Wen C. Deep learning fault diagnosis
method based on global optimization GAN for unbalanced data. Knowl-
Based Syst 2020;187:104837. http://dx.doi.org/10.1016/j.knosys.2019.07.
008.
18
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
[75] Cabrera D, Sancho F, Long J, Sanchez RV, Zhang S, Cerrada M, et al. Gener-
ative adversarial networks selection approach for extremely imbalanced
fault diagnosis of reciprocating machinery. IEEE Access 2019;7:70643–53.
http://dx.doi.org/10.1109/ACCESS.2019.2917604.
[76] Liang P, Deng C, Wu J, Yang Z, Zhu J, Zhang Z. Single and simultaneous
fault diagnosis of gearbox via a semi-supervised and high-accuracy ad-
versarial learning framework. Knowl-Based Syst 2020;198:105895. http:
//dx.doi.org/10.1016/j.knosys.2020.105895.
[77] Pan T, Chen J, Xie J, Chang Y, Zhou Z. Intelligent fault identification
for industrial automation system via multi-scale convolutional generative
adversarial network with partially labeled samples. ISA Trans 2020. http:
//dx.doi.org/10.1016/j.isatra.2020.01.014.
[78] Zhao D, Liu S, Gu D, Sun X, Wang L, Wei Y, et al. Enhanced data-
driven fault diagnosis for machines with small and unbalanced data based
on variational auto-encoder. Meas Sci Technol 2019. http://dx.doi.org/10.
1088/1361-6501/ab55f8.
[79] Larsen ABL, Sønderby SK, Larochelle H, Winther O. Autoencoding beyond
pixels using a learned similarity metric. In: 33rd int conf mach learn ICML
2016, vol. 4; 2016, pp. 2341–2349.
[80] Huang H, Yu PS, Wang C. An introduction to image synthesis with
generative adversarial nets. 2018, p. 1–17, ArXiv.
[81] Elreedy D, Atiya AF. A comprehensive analysis of synthetic minority
oversampling technique (SMOTE) for handling class imbalance. Inf Sci
(Ny) 2019;505:32–64. http://dx.doi.org/10.1016/j.ins.2019.07.070.
[82] Wu Z, Lin W, Ji Y. An integrated ensemble learning model for imbalanced
fault diagnostics and prognostics. IEEE Access 2018;6:8394–402. http:
//dx.doi.org/10.1109/ACCESS.2018.2807121.
[83] Soltanzadeh P, Hashemzadeh M. RCSMOTE: Range-controlled synthetic
minority over-sampling technique for handling the class imbalance prob-
lem. Inf Sci (Ny) 2021;542:92–111. http://dx.doi.org/10.1016/j.ins.2020.
07.014.
[84] Wu Z, Jiang H, Lu T, Zhao K. A deep transfer maximum classifier discrep-
ancy method for rolling bearing fault diagnosis under few labeled data.
Knowl-Based Syst 2020;196:105814. http://dx.doi.org/10.1016/j.knosys.
2020.105814.
[85] Hoang DT, Kang HJ. A bearing fault diagnosis method using transfer learn-
ing and Dempster-Shafer evidence theory. In: ACM int conf proceeding
ser. 2019, p. 33–8. http://dx.doi.org/10.1145/3388218.3388220.
[86] Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans Knowl Data Eng
2010. http://dx.doi.org/10.1109/TKDE.2009.191.
[87] Dai W, Yang Q, Xue GR, Yu Y. Boosting for transfer learning. In: ACM int.
conf. proceeding ser. 2007, http://dx.doi.org/10.1145/1273496.1273521.
[88] Zhang A, Li S, Cui Y, Yang W, Dong R, Hu J. Limited data rolling bearing
fault diagnosis with few-shot learning. IEEE Access 2019;7:110895–904.
http://dx.doi.org/10.1109/access.2019.2934233.
[89] Hu Y, Gao J, Zhou Q, Fan Z. Bearing fault diagnosis based on deep semisu-
pervised small sample classifier. In: 2019 progn syst heal manag conf
PHM-Qingdao 2019. 2019, http://dx.doi.org/10.1109/PHM-Qingdao46334.
2019.8943025.
[90] T Wang, J Wang, Y Wu, X Sheng. A fault diagnosis model based on
weighted extension neural network for turbo-generator sets on small
samples with noise. Chinese J Aeronaut 2020. http://dx.doi.org/10.1016/
j.cja.2020.06.024.
[91] Dong L, S LIU, H ZHANG. A method of anomaly detection and fault diag-
nosis with online adaptive learning under small training samples. Pattern
Recognit 2017;64:374–85. http://dx.doi.org/10.1016/j.patcog.2016.11.026.
[92] Jiao J, Zhao M, Lin J, Liang K. A comprehensive review on convolutional
neural network in machine fault diagnosis. Neurocomputing 2020. http:
//dx.doi.org/10.1016/j.neucom.2020.07.088.
[93] Zhao B, Zhang X, Li H, Yang Z. Intelligent fault diagnosis of rolling
bearings based on normalized CNN considering data imbalance and
variable working conditions. Knowl-Based Syst 2020;199:105971. http:
//dx.doi.org/10.1016/j.knosys.2020.105971.
[94] Jia F, Lei Y, Lu N, Xing S. Deep normalized convolutional neural network
for imbalanced fault classification of machinery and its understanding
via visualization. Mech Syst Signal Process 2018;110:349–67. http://dx.
doi.org/10.1016/j.ymssp.2018.03.025.
[95] Ren Z, Zhu Y, Yan K, Chen K, Kang W, Yue Y, et al. A novel model with
the ability of few-shot learning and quick updating for intelligent fault
diagnosis. Mech Syst Signal Process 2020;138. http://dx.doi.org/10.1016/
j.ymssp.2019.106608.
[96] Jia F, Li S, Zuo H, Shen J. Deep neural network ensemble for the
intelligent fault diagnosis of machines under imbalanced data. IEEE Access
2020;8:120974–82. http://dx.doi.org/10.1109/ACCESS.2020.3006895.
[97] Xu K, Li S, Jiang X, An Z, Wang J, Yu T. A renewable fusion fault
diagnosis network for the variable speed conditions under unbalanced
samples. Neurocomputing 2020;379:12–29. http://dx.doi.org/10.1016/j.
neucom.2019.08.099.
[98] Geng Y, Wang Z, Jia L, Qin Y, Chen X. Bogie fault diagnosis under
variable operating conditions based on fast kurtogram and deep residual
learning towards imbalanced data. Meas J Int Meas Confed 2020;166.
http://dx.doi.org/10.1016/j.measurement.2020.108191.
[99] Liu S, Sun Y, Zhang L. A novel fault diagnosis method based on
noise-assisted MEMD and functional neural fuzzy network for rolling el-
ement bearings. IEEE Access 2018;6:27048–68. http://dx.doi.org/10.1109/
ACCESS.2018.2833851.
[100] Qian W, Li S. A novel class imbalance-robust network for bearing
fault diagnosis utilizing raw vibration signals. Meas J Int Meas Confed
2020;156:107567. http://dx.doi.org/10.1016/j.measurement.2020.107567.
[101] Pan SJ, Tsang IW, Kwok JT, Yang Q. Domain adaptation via transfer
component analysis. IEEE Trans Neural Netw 2011. http://dx.doi.org/10.
1109/TNN.2010.2091281.
[102] Long M, Wang J, Ding G, Sun J, Yu PS. Transfer feature learning with
joint distribution adaptation. In: Proc. IEEE int. conf. comput. vis. 2013,
http://dx.doi.org/10.1109/ICCV.2013.274.
[103] Han T, Liu C, Yang W, Jiang D. A novel adversarial learning framework in
deep convolutional neural network for intelligent diagnosis of mechanical
faults. Knowl-Based Syst 2019;165:474–87. http://dx.doi.org/10.1016/j.
knosys.2018.12.019.
[104] Chen C, Li Z, Yang J, Liang B. A cross domain feature extraction method
based on transfer component analysis for rolling bearing fault diagnosis.
In: Proc. 29th Chinese control decis. conf.. 2017, http://dx.doi.org/10.1109/
CCDC.2017.7978168.
[105] Xie J, Zhang L, Duan L, Wang J. On cross-domain feature fusion in gearbox
fault diagnosis under various operating conditions based on transfer
component analysis. In: 2016 IEEE int. conf. progn. heal. manag. 2016,
http://dx.doi.org/10.1109/ICPHM.2016.7542845.
[106] Duan L, Xie J, Wang K, Wang J. Gearbox diagnosis based on auxiliary mon-
itoring datasets of different working conditions. Zhendong Yu Chongji/J
Vib Shock 2017. http://dx.doi.org/10.13465/j.cnki.jvs.2017.10.017.
[107] Han T, Liu C, Yang W, Jiang D. Deep transfer network with joint distribu-
tion adaptation: A new intelligent fault diagnosis framework for industry
application. ISA Trans 2020. http://dx.doi.org/10.1016/j.isatra.2019.08.012.
[108] Qian W, Li S, Yi P, Zhang K. A novel transfer learning method for robust
fault diagnosis of rotating machines under variable working conditions.
Meas J Int Meas Confed 2019. http://dx.doi.org/10.1016/j.measurement.
2019.02.073.
[109] Qian W, Li S, Jiang X. Deep transfer network for rotating machine fault
analysis. Pattern Recognit 2019;96. http://dx.doi.org/10.1016/j.patcog.
2019.106993.
[110] Zhang Z, Chen H, Li S, An Z. Unsupervised domain adaptation via
enhanced transfer joint matching for bearing fault diagnosis. Meas J Int
Meas Confed 2020;165:108071. http://dx.doi.org/10.1016/j.measurement.
2020.108071.
[111] Cheng C, Zhou B, Ma G, Wu D, Yuan Y. Wasserstein distance based
deep adversarial transfer learning for intelligent fault diagnosis with
unlabeled or insufficient labeled data. Neurocomputing 2020;409:35–45.
http://dx.doi.org/10.1016/j.neucom.2020.05.040.
[112] He Z, Shao H, Wang P, Lin J, Cheng J, Yang Y. Deep transfer multi-wavelet
auto-encoder for intelligent fault diagnosis of gearbox with few target
training samples. Knowl-Based Syst 2020;191:105313. http://dx.doi.org/
10.1016/j.knosys.2019.105313.
[113] Zhang R, Liu Y. Research on development and application of support vec-
tor machine - Transformer fault diagnosis. In: ACM Int Conf Proceeding
Ser. 2018, p. 262–8. http://dx.doi.org/10.1145/3305275.3305328.
[114] Chen G, Ge Z. SVM-tree and SVM-forest algorithms for imbalanced fault
classification in industrial processes. IFAC J Syst Control 2019;8:100052.
http://dx.doi.org/10.1016/j.ifacsc.2019.100052.
[115] Wagner C, Saalmann P, Hellingrath B. Machine condition monitoring
and fault diagnostics with imbalanced data sets based on the KDD
process. IFAC-PapersOnLine 2016;49:296–301. http://dx.doi.org/10.1016/
j.ifacol.2016.11.151.
[116] Xi PP, Zhao YP, Wang PX, Li ZQ, Pan YT, Song FQ. Least squares support
vector machine for class imbalance learning and their applications to
fault detection of aircraft engine. Aerosp Sci Technol 2019;84:56–74.
http://dx.doi.org/10.1016/j.ast.2018.08.042.
[117] Malik H, Mishra S. Proximal support vector machine (PSVM) based
imbalance fault diagnosis of wind turbine using generator current
signals. Energy Procedia 2016;90:593–603. http://dx.doi.org/10.1016/j.
egypro.2016.11.228.
[118] He Z, Shao H, Cheng J, Zhao X, Yang Y. Support tensor machine with
dynamic penalty factors and its application to the fault diagnosis of
rotating machinery with unbalanced data. Mech Syst Signal Process
2020;141:106441. http://dx.doi.org/10.1016/j.ymssp.2019.106441.
[119] Duan L, Xie M, Bai T, Wang J. A new support vector data description
method for machinery fault diagnosis with unbalanced datasets. Expert
Syst Appl 2016;64:239–46. http://dx.doi.org/10.1016/j.eswa.2016.07.039.
19
T. Zhang, J. Chen, F. Li et al. ISA Transactions xxx (xxxx) xxx
[120] Mathew J, Pang CK, Luo M, Leong WH. Classification of imbalanced data
by oversampling in kernel space of support vector machines. IEEE Trans
Neural Networks Learn Syst 2018;29:4065–76. http://dx.doi.org/10.1109/
TNNLS.2017.2751612.
[121] Duan A, Guo L, Gao H, Wu X, Dong X. Deep focus parallel convolutional
neural network for imbalanced classification of machinery fault diagnos-
tics. IEEE Trans Instrum Meas 2020;9456:1. http://dx.doi.org/10.1109/tim.
2020.2998233.
[122] Chaudhari S, Polatkan G, Ramanath R, Mithal V. An attentive survey of
attention models. 2019, ArXiv.
[123] Li F, Chen J, Pan J, Pan T. Cross-domain learning in rotating machinery
fault diagnosis under various operating conditions based on parameter
transfer. Meas Sci Technol 2020. http://dx.doi.org/10.1088/1361-6501/
ab6ade.
[124] Cao P, Zhang S, Tang J. Preprocessing-free gear fault diagnosis using
small datasets with deep convolutional neural network-based transfer
learning. IEEE Access 2018;6:26241–53. http://dx.doi.org/10.1109/ACCESS.
2018.2837621.
[125] Wu J, Zhao Z, Sun C, Yan R, Chen X. Few-shot transfer learning
for intelligent fault diagnosis of machine. Meas J Int Meas Confed
2020;166:108202. http://dx.doi.org/10.1016/j.measurement.2020.108202.
[126] Yu K, Lin TR, Ma H, Li X, Li X. A multi-stage semi-supervised learning
approach for intelligent fault diagnosis of rolling bearing using data
augmentation and metric learning. Mech Syst Signal Process 2021;146.
http://dx.doi.org/10.1016/j.ymssp.2020.107043.
[127] Li X, Zhang W, Ding Q, Sun JQ. Intelligent rotating machinery fault
diagnosis based on deep learning using data augmentation. J Intell Manuf
2020;31:433–52. http://dx.doi.org/10.1007/s10845-018- 1456-1.
[128] Lv H, Chen J, Zhang T, Hou R, Pan T, Zhou Z. SDA: Regularization with cut-
flip and mix-normal for machinery fault diagnosis under small dataset.
ISA Trans 2020. http://dx.doi.org/10.1016/j.isatra.2020.11.005.
[129] Han S, Oh J, Jeong J. Bearing fault detection with data augmentation based
on 2-d CNN and 1-d CNN. In: ACM int conf proceeding ser. 2020, p. 20–3.
http://dx.doi.org/10.1145/3421537.3421546.
[130] Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV. Autoaugment: Learning
augmentation strategies from data. In: Proc. IEEE comput. soc. conf.
comput. vis. pattern recognit. 2019, http://dx.doi.org/10.1109/CVPR.2019.
00020.
[131] Tao X, Ren C, Li Q, Guo W, Liu R, He Q, et al. Bearing defect diagnosis
based on semi-supervised kernel local Fisher discriminant analysis using
pseudo labels. ISA Trans 2020. http://dx.doi.org/10.1016/j.isatra.2020.10.
033.
[132] B Tan, Y Song, E Zhong, Q. Yang. Transitive transfer learning. In: Proc.
ACM SIGKDD int. conf. knowl. discov. data min. 2015, http://dx.doi.org/
10.1145/2783258.2783295.
[133] Tan B, Zhang Y, Pan SJ, Yang Q. Distant domain transfer learning. In: 31st
AAAI conf. artif. intell. 2017.
[134] Mai S, Hu H, Xu J. Attentive matching network for few-shot learning.
Comput Vis Image Underst 2019;187:102781. http://dx.doi.org/10.1016/j.
cviu.2019.07.001.
[135] Ali AR, Gabrys B, Budka M. Cross-domain meta-learning for time-series
forecasting. Procedia Comput Sci 2018;126:9–18. http://dx.doi.org/10.
1016/j.procS.2018.07.204.
[136] Lee Y, Choi S. Gradient-based meta-learning with learned layerwise
metric and subspace. In: 35th int. conf. mach. learn. 2018.
[137] Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast
adaptation of deep networks. In: 34th int. conf. mach. learn. 2017.
[138] Hospedales T, Antoniou A, Micaelli P, Storkey A. Meta-learning in neural
networks: A survey. 2020, p. 1–20, ArXiv.
[139] Ravi S, Larochelle H. Optimization as a model for few-shot learning. In:
Proc. 5th int. conf. learn. represent. 2017.
[140] Mishra N, Rohaninejad M, Chen X, Abbeel P. A simple neural attentive
meta-learner. In: 6th int. conf. learn. represent. ICLR 2018 - Conf. Track
Proc. 2018.
[141] Qiao S, Liu C, Shen W, Yuille A. Few-shot image recognition by predicting
parameters from activations. In: Proc. IEEE comput. soc. conf. comput. vis.
pattern recognit. 2018, http://dx.doi.org/10.1109/CVPR.2018.00755.
[142] van der Spoel E, Rozing MP, Houwing-Duistermaat JJ, Eline Slagboom P,
Beekman M, de Craen AJM, et al. Siamese neural networks for one-shot
image recognition. In: ICML - deep learn work 2015.
[143] Vinyals O, Blundell C, Lillicrap T, Kavukcuoglu K, Wierstra D. Matching
networks for one shot learning. Adv. Neural Inf Process Syst 2016.
[144] Snell J, Swersky K, Zemel R. Prototypical networks for few-shot learning.
Adv Neural Inf Process Syst 2017.
[145] Sung F, Yang Y, Zhang L, Xiang T, Torr PHS, Hospedales TM. Learning to
compare: Relation network for few-shot learning. In: Proc. IEEE comput.
soc. conf. comput. vis. pattern recognit. 2018, http://dx.doi.org/10.1109/
CVPR.2018.00131.
[146] Zhang K, Chen J, Zhang T, He S, Pan T, Zhou Z. Intelligent fault diagnosis
of mechanical equipment under varying working condition via itera-
tive matching network augmented with selective signal reuse strategy.
J Manuf Syst 2020;57:400–15. http://dx.doi.org/10.1016/j.jmsy.2020.10.
007.
[147] Lampert CH, Nickisch H, Harmeling S. Learning to detect unseen object
classes by between-class attribute transfer. In: 2009 IEEE comput. soc.
conf. comput. vis. pattern recognit. work. CVPR work. 2009. 2009, http:
//dx.doi.org/10.1109/CVPRW.2009.5206594.
[148] Romera-Paredes B, Torr PHS. An embarrassingly simple approach to
zero-shot learning. In: 32nd int. conf. mach. learn. 2015.
[149] Changpinyo S, Chao WL, Sha F. Predicting visual exemplars of unseen
classes for zero-shot learning. In: Proc. IEEE int. conf. comput. vis. 2017,
http://dx.doi.org/10.1109/ICCV.2017.376.
[150] Xian Y, Lorenz T, Schiele B, Akata Z. Feature generating networks for
zero-shot learning. In: Proc. IEEE comput. soc. conf. comput. vis. pattern
recognit. 2018, http://dx.doi.org/10.1109/CVPR.2018.00581.
[151] Gao Y, Gao L, Li X, Zheng Y. A zero-shot learning method for fault
diagnosis under unknown working loads. J Intell Manuf 2020. http:
//dx.doi.org/10.1007/s10845-019- 01485-w.
[152] Feng L, Zhao C. Fault description based attribute transfer for zero-sample
industrial fault diagnosis. IEEE Trans Ind Informatics 2020. http://dx.doi.
org/10.1109/tii.2020.2988208.
[153] Lv H, Chen J, Pan T, Zhou Z. Hybrid attribute conditional adversarial de-
noising autoencoder for zero-shot classification of mechanical intelligent
fault diagnosis. Appl Soft Comput J 2020. http://dx.doi.org/10.1016/j.asoc.
2020.106577.
[154] Xian Y, Lorenz T, Schiele B, Akata Z. Feature generating networks for
zero-shot learning. In: Proc. IEEE comput. soc. conf. comput. vis. pattern
recognit. 2018, http://dx.doi.org/10.1109/CVPR.2018.00581.
20
... The valuable motor output signals provided by the motor and suitable for condition monitoring are acquired by means of appropriate sensors placed on the motor or on its rotor (most frequently vibration sensors [1][2][3][4][5][6][7][8][9][10][11][12], temperature sensors [2][3][4][5][6]10,[13][14][15][16][17][18], instantaneous rotation speed sensors [6,9,14,16,[18][19][20][21] acoustic sensors [5,8,11,12,14], and rarely magnetic field and flux sensors [20,22,23] or stray-flux sensors [6,24]). 2 The valuable motor input signals are often acquired from the electrical supply system using appropriate instantaneous voltage and current sensors. The most commonly used input signal in motor condition monitoring is the description of the absorbed current (e. g. by Motor Current Signature Analysis, MCSA [13]) which has been widely used in scientific research [2,4,[6][7][8][9]11,12,14,15,17,18,20,[24][25][26][27][28][29][30][31][32][33][34][35]. The use of voltage sensors is indirect and rarely used on its own, e. g. in unbalancing detection of an AC power supply and overvoltage detection [26], detection of broken bars in stator [13], to prevent the phase-loss [36] or to describe voltage waveform anomalies [2]). ...
... More commonly, the Wavelet Transform is used to detect and describe short (transient) sequences (based on local spectral information) of components within these signals [2, 8, 9, 13, 14, 17, 18, 20, 26, 27, 28, 37, 38 and 39]. Actually, the use of artificial intelligence techniques (e.g. based on neural networks [2,5,11,13,18,26,3840], support vector machines [2,13], machine learning algorithms [3,5,8,10,14,29,36], deep learning algorithms [11]) or the use of IoT [16,38] for early detection [1,4,10,14,15], possibly online [8,9,17], in the detections of motors states that may develop abnormally has become increasingly attractive. ...
Preprint
Full-text available
This paper experimentally reveals some of the resources offered by the evolution of the instantaneous active electric power in describing the state of three-phase AC induction asynchronous electric motors (with squirrel-cage rotor) operating under no-load conditions. A mechanical power is required to rotate the rotor at no-load and this mechanical power is satisfactorily reflected in the constant and variable part of instantaneous active electric power. The variable part of this electrical power should necessarily have a periodic component with the same period as the period of rotation of the rotor. The paper proposes a procedure for extracting this periodic component description (as a pattern, by means of a selective averaging of instantaneous active electrical power) and analysis. The time origin of this pattern is defined by the time of a selected first passage through the origin of an angular marker placed on the rotor, detectable by a proximity sensor (e.g. a laser sensor). The usefulness of the pattern in describing the state of the motor rotor has been demonstrated by several simple experiments which show that a slight change in the no-load running conditions of the motor (e.g. by placing a dynamically unbalanced mass on the rotor) has clear effects in changing the shape of the pattern.
... While such data imbalance has already started to pose challenges in applications like machine fault diagnosis in Industry 4.0, where gathering extensive data from faulty machines for algorithm training is often infeasible due to safety and economic considerations, Industry 5.0 is envisioned to intensify this data curation challenge. Consequently, data balancing techniques, which aims to produce a uniform representation of all classes in the dataset, are becoming crucial in Industry 5.0 (Zhang, Chen, et al., 2022). ...
Chapter
This chapter is arranged as follows: Section 10.2 will delve deeper into the latest advances in data denoising, data annotation, and data balancing facilitated by DL techniques. Section 10.3 will present the manufacturing applications of these methodologies, followed by a discussion on the remaining challenges and opportunities in Section 10.4, and conclusions in Section 10.5.
... Common elements in kinematic chains include induction motors, pulleys, gearboxes, and mechanical loads. In addition to these main elements, kinematic chains contain other secondary but no less important elements, such as bearings [1], gears [2] and elements inside the motors themselves [3], which are also susceptible to failure. To maintain reliable operation, kinematic chains often employ strategies such as preventive maintenance and early fault detection [4]. ...
Article
Full-text available
Kinematic chains are crucial in numerous industrial settings, playing a key role in various processes. Over recent years, several methods have been developed to monitor and maintain these systems effectively. One notable method is the analysis of infrared thermal images, which serves as a non-invasive and effective approach for identifying various electromechanical issues. Additionally, Virtual Reality (VR) is a burgeoning technology that, despite its limited use in industrial contexts, offers a cost-effective and accessible solution for the training and education of industrial workers on specialized engineering subjects. Nevertheless, most virtual environments are based on numerical simulations. This paper presents the design and development of a Virtual Reality training module for the detection of fourteen electromechanical fault cases in a kinematic chain. The VR training tool developed is based on actual thermographic data derived from experiments conducted on an authentic kinematic chain. During these experiments, thermal images were captured using an low-cost infrared sensor. The thermographic images were processed by calculating the histogram and fifteen statistical indicators, which served to differentiate fault cases in the VR application. A comprehensive evaluation was carried out with a group of vocational students specialized in electrical and automation installations to determine the effectiveness and practicality of the VR training module.
... In many practical applications, the data available for FD are extremely scarce because the bearing is usually in normal working condition [18]. This lack of data poses a challenge to the training of the FD model based on deep learning, due to the insufficiency of fault data, which may impair the model's ability to learn effectively, resulting in reduced generalization capabilities and FD precision. ...
Article
Full-text available
In actual production, bearings are usually in a normal working state, which results in a lack of data for fault diagnosis (FD). Yet, the majority of existing studies on FD of rolling bearings focus on scenarios with ample fault data, while research on diagnosing small-sample bearings remains scarce. Therefore, this study presents an FD method for small-sample bearings, employing variational-mode decomposition and Symmetric Dot Pattern, combined with a pre-trained and fine-tuned Residual Network18 (VSDP-TLResNet18). The approach utilizes variational-mode decomposition (VMD) to break down the signal, determining the k value and the best Intrinsic-Mode Function (IMF) component based on center frequency and kurtosis criteria. Following this, the chosen IMF component is converted into a two-dimensional image using the Symmetric Dot Pattern (SDP) transform. In order to maximize the discrimination between two-dimensional fault images, Pearson correlation analysis is carried out on the parameters of SDP to select the optimal parameters. Finally, we use the pre-trained and fine-tuned method combined with ResNet18 for small-sample FD to improve the diagnosis accuracy of the model. Relative to alternative approaches, the suggested method demonstrates strong performance when dealing with small-sample FD.
... In recent years, deep neural networks such as autoencoders and convolutional neural networks have been widely used to construct end-to-end intelligent diagnostic models, which reduces the dependence on manual labor and expert knowledge and greatly promotes the development of intelligent fault diagnosis [8]. The end-to-end diagnostic model directly inputs raw data into the constructed machine learning model for training and combines it with classification methods to realize fault diagnosis without relying on any a priori knowledge [9]. ...
Article
Full-text available
Gearbox is a key component of mechanical equipment, which has a complex structure, harsh working conditions, and a higher probability of failure. Therefore, gearbox fault diagnosis is vital to ensure the efficient operation of the whole mechanical equipment. However, traditional gearbox fault diagnosis methods mainly rely on manual feature extraction. To address this issue, a novel end-to-end fault diagnosis model that combines convolutional neural network (CNN), long short-term memory network (LSTM) and attention mechanism (AM) is proposed. Firstly, Spatial features are extracted from the original input data using CNNs, and then temporal features are extracted even further from the spatial features using LSTMs. The attention mechanism improves the network's attention to the global key features by assigning weights, and finally, the fault diagnosis results are derived from the fully connected layer and Softmax classifier. The experimental results show that the method is able to adaptively extract features and ensure higher diagnostic accuracy, validating the effectiveness of the proposed method.
... The data-driven deep learning methods mentioned above are usually constructed with datasets that are set to be class-balanced. However, when dealing with unbalanced datasets, these models usually focus on the majority category and may ignore the minority category samples, resulting in low diagnostic accuracy of the minority fault samples [11,12]. ...
Article
Full-text available
To address the problems of existing methods that struggle to effectively extract fault features and unstable model training using unbalanced data, this paper proposes a new fault diagnosis method for rolling bearings based on a Markov Transition Field (MTF) and Mixed Attention Residual Network (MARN). The acquired vibration signals are transformed into two-dimensional MTF feature images as network inputs to avoid the loss of the original signal information, while retaining the temporal correlation; then, the mixed attention mechanism is inserted into the residual structure to enhance the feature extraction capability, and finally, the network is trained and outputs diagnostic results. In order to validate the feasibility of the MARN, other popular deep learning (DL) methods are compared on balanced and unbalanced datasets divided by a CWRU fault bearing dataset, and the proposed method results in superior performance. Ultimately, the proposed method achieves an average recognition accuracy of 99.5% and 99.2% under the two categories of divided datasets, respectively.
Article
The fault diagnosis of permanent magnet synchronous motor is of vital importance in industrial fields to ensure user safety and minimize economic losses from accidents. However, recent fault diagnosis methods, particularly the methods using deep learning, require a massive amount of labeled data, which may not be available in industrial fields. Few-shot learning has been recently applied in fault diagnosis for rotary machineries, to alleviate the data deficiency and/or to enable unseen fault diagnosis. However, two major obstacles still remain, specifically: (i) the limited ability of the models to be generalized for use under new operating conditions and (ii) insufficient discriminative features to precisely diagnose fault types. To address these limitations, this study proposes a Prototype-assisted dual-Contrastive learning with Depthwise separable Convolutional neural network (PCDC) for few-shot fault diagnosis for permanent magnet synchronous motors under new working conditions. Operation-robust fault features are extracted to reinforce generalization of PCDC under new operating conditions by extracting fault-induced amplitude and frequency modulation features and by eliminating the influence of operating conditions from the motor stator current signals. Prototype-assisted dual-contrastive learning is proposed to clearly distinguish the fault categories even when the fault features are similar to each other by learning both local- and global-similarity features, which increases the instance-discrimination ability while alleviating an overfitting issue. Experimental results show that the proposed PCDC outperforms the comparison models in few-shot fault diagnosis tasks under new operating conditions.
Article
Full-text available
The implementation of condition monitoring and fault diagnosis are of special importance for ensuring wind turbine (WT) operation safely and stably. In practice, however, the fault data of WT is limited, which makes it hard to identify faults of WT accurately using existing intelligent diagnosis methods. To address this, signals augmented self-taught learning networks (SASLN) is proposed for the fault diagnosis of the generator, which is one of the most important parts in WT. In SASLN, fault signal samples are generated by the Wasserstein distance guided generative adversarial networks to expand the limited training dataset. The sufficient generated signal samples are used to pre-train the self-taught learning networks (SLN) to enhance the generalization ability of SLN. Then, the weights of SLN are fine-tuned using a small number of real signal samples for accurate fault classification. The effectiveness of SASLN is verified by two bearing vibration datasets. The results show that SASLN can achieve fairly high fault classification accuracy using small training samples. Besides, SASLN has good robustness in the noisy working environment and can also identify faults even in variable loads and variable rotating speeds cases, which makes it meaningful for decreasing the running costs and improving the maintenance management of WT.
Article
Full-text available
Data-driven intelligent diagnosis model plays a key role in the monitoring and maintenance of mechanical equipment. However, due to practical limitations, the fault data is difficult to obtain, which makes model training unsatisfactory and results in poor testing performance. Based on the characteristics of 1-D mechanical vibration signal, this paper proposes Supervised Data Augmentation (SDA) as a regularization method to provide more effective training samples, which includes Cut-Flip and Mix-Normal. Cut-Flip is used directly on the raw sample without parameter selection. Mix-Normal mixes the data and labels of a random sample with a random normal sample at a certain ratio. The proposed SDA is verified on two bearing datasets with some popular intelligent diagnosis networks. Besides, we also design a Batch Normalization CNN (BNCNN) to learn the small dataset. Results show that SDA can significantly improve the classification accuracy of BNCNN by 10%–30% under 1-8 samples of each class. The proposed method also shows a competitive performance with existing advanced methods. Finally, we further discuss each data augmentation method through a series of ablation experiments and summarize the advantages and disadvantages of the proposed SDA.
Article
Full-text available
Change of working condition leads to discrepancy in domain distribution of equipment vibration signals. This discrepancy poses an obstacle to application of deep learning method in fault diagnosis of wind turbine. When lacking domain adaptation ability, diagnostic accuracy of deep learning method applied to unseen condition will decrease significantly. To solve this problem, an iterative matching network augmented with selective sample reuse strategy is proposed. By generating pseudo labels for unlabeled signals from unseen condition and reusing these signals to iteratively update parameters, embedding space of matching network reduce discrepancy in domain distribution between different working conditions. This makes the model more adaptable to unseen condition. Specially designed filter is proposed for selecting pseudo-labeled signals to increase proportion of correctly labeled signals in iteration. By combing these two points, proposed algorithm can be updated iteratively based on selected pseudo-labeled signals and achieve higher accuracy when analyzing signals of unseen working conditions. Multiscale feature extractor is used to extract features at different scales and form embedding space. Effectiveness of the proposed algorithm is verified by four datasets. Experiments show that this algorithm not only has good performance under varying load and speed conditions but also surpasses other domain adaptation methods.
Article
Attention Model has now become an important concept in neural networks that has been researched within diverse application domains. This survey provides a structured and comprehensive overview of the developments in modeling attention. In particular, we propose a taxonomy that groups existing techniques into coherent categories. We review salient neural architectures in which attention has been incorporated and discuss applications in which modeling attention has shown a significant impact. We also describe how attention has been used to improve the interpretability of neural networks. Finally, we discuss some future research directions in attention. We hope this survey will provide a succinct introduction to attention models and guide practitioners while developing approaches for their applications.
Article
In this paper, we study a novel transfer learning problem termed Distant Domain Transfer Learning (DDTL). Different from existing transfer learning problems which assume that there is a close relation between the source domain and the target domain, in the DDTL problem, the target domain can be totally different from the source domain. For example, the source domain classifies face images but the target domain distinguishes plane images. Inspired by the cognitive processof human where two seemingly unrelated concepts can be connected by learning intermediate concepts gradually, we propose a Selective Learning Algorithm (SLA) to solve the DDTL problem with supervised autoencoder or supervised convolutional autoencoder as a base model for handling different types of inputs. Intuitively, the SLA algorithm selects usefully unlabeled data gradually from intermediate domains as a bridge to break the large distribution gap for transferring knowledge between two distant domains. Empirical studies on image classification problems demonstrate the effectiveness of the proposed algorithm, and on some tasks the improvement in terms of the classification accuracy is up to 17% over “non-transfer” methods.
Article
Class imbalance issue has been a major problem in mechanical fault detection, which exists when the number of instances presents in a class is significantly fewer than that in another class.Aiming at this problem, a two-stage zero-shot fault recognition method is proposed. First, inspired by the conditional generative adversarial network, a novel feature generating network with shortcut connections which consists of a feature extractor, a discriminator and a generator is designed to capture the distribution of normal samples. Then, the generator is used to generate abundant pseudo fault features by adding a random sequence to the condition. Second, an improved deep neural network is trained with these synthetic pseudo fault features as the classifier. Specially, a condition index is designed to represent different fault classes so that it can recognize and cluster the unseen fault samples. The effectiveness of the proposed method is verified to show the superiority.
Article
In bearings defect diagnosis applications, information fusion has been widely used to improve identification accuracy for different types of faults, which may lead to high-dimensionality and information redundancy of the data and thus degenerate the classification performance. Therefore, it is a major challenge for machinery fault diagnosis to extract optimal features from high-dimensional and redundant data for classification. In addition, in order to guarantee the performance of fault diagnosis, conventional supervised methods usually require a large amount of labeled data available for learning. However, it is extremely difficult, costly and time-consuming to collect faulty labeled samples with class information, especially for expensive and critical machines, which often results in only a few labeled data available with a large amount of unlabeled data redundant. In this paper, we propose a novel bearing defect diagnosis model based on semi-supervised kernel local Fisher Discriminant Analysis (SSKLFDA) using pseudo labels, which can effectively extract optimal features for classification and simultaneously utilize unlabeled data for regularizing the supervised dimensionality reduction. The proposed SSKLFDA first adopts Density Peak Clustering technique to generate pseudo cluster labels for the labeled and unlabeled data and then regularizes the between-class scatter and within-class scatter according to two corresponding regularization strategies associated with the generated pseudo cluster labels. This regularization can further improve the discriminant performance of the extracted features and also make it suitable for the cases with the multimodality and noises. In order to accommodate for non-linear feature extraction, the kernel version of the proposed method is also provided with the introduction of kernel trick. The experimental results under different feature dimensions, numbers of labeled data, and subsequent classifiers scenarios demonstrate that the proposed SSKLFDA based bearings fault diagnosis model achieves higher classification performance than other existing dimensionality reduction methods-based models.
Article
Fault classification plays a central role in process monitoring and fault diagnosis in complex industrial processes. Plenty of fault classification methods have been proposed under the assumption that the sizes of different fault classes are similar. However, in practical industrial processes, it is a common case that large amounts of normal data (majority) and only few fault data (minority) are collected. In other words, most existing fault classification problems were carried out under the imbalanced data scenario, which will lead to a restricted performance of traditional classification algorithms. In this paper, a K-means based SVM-tree algorithm is proposed to deal with the nonlinear multiple-classification problem under the situation of imbalance data. Meanwhile, a SVM-forest scheme is further developed for sensitive data selection and performance enhancement when the imbalance degree is larger among different classes. Effectiveness of the proposed method is verified through the Tennessee Eastman (TE) benchmark process.