Content uploaded by Han Wang
Author content
All content in this area was uploaded by Han Wang on Jan 04, 2023
Content may be subject to copyright.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023 1559
Few-Shot Learning for Fault Diagnosis With a
Dual Graph Neural Network
Han Wang , Jingwei Wang , Yukai Zhao , Qing Liu , Min Liu , and Weiming Shen , Fellow, IEEE
Abstract—Mechanical fault diagnosis is crucial to ensure
the safe operations of equipment in intelligent manufactur-
ing systems. Deep learning-based methods have been re-
cently developed for fault diagnosis due to their advantages
in feature representation. However, most of these methods
fail to learn relations between samples and thus perform
poorly without sufficient labeled data. In this article, we
propose a new few-shot learning method named dual graph
neural network (DGNNet) with residual blocks to address
fault diagnosis problems with limited data. First, the resid-
ual module learns the feature of samples with image data
transferred from original signals. Second, two complete
graphs built on the sample features are used to extract the
instance-level and distribution-level relations between sam-
ples. In particular, an alternate update policy between the
instance and distribution graphs integrates the multilevel
relations to propagate the label information of a few labeled
samples to unlabeled samples. This technique leverages
labeled and unlabeled samples to identify unseen faults,
encouraging DGNNet competency in fault diagnosis tasks
with very few labeled samples. Extensive results on vari-
ous datasets show that DGNNet achieves excellent perfor-
mance in supervised fault diagnosis tasks and outperforms
baselines by a great margin in semisupervised cases.
Index Terms—Distribution learning, fault diagnosis, few-
shot learning (FSL), graph neural network (GNN), semisu-
pervised learning.
I. INTRODUCTION
THE increasing availability of big data on manufacturing
equipment offers unprecedented opportunities to explore
methods and tools for predictive maintenance of machinery.
Predictive maintenance aims to prevent mechanical failures
or to detect them before they occur to reduce losses. In the
last decades, researchers have devoted much attention to fault
Manuscript received 29 December 2021; revised 26 July 2022; ac-
cepted 4 September 2022. Date of publication 9 September 2022; date
of current version 13 December 2022. This work was supported by the
National Key R&D Program of China under Grant 2019YFB1704700 and
NSFC under Grant 62273261. Paper no. TII-21-5812. (Corresponding
author: Min Liu.)
Han Wang, Jingwei Wang, Yukai Zhao, Qing Liu, and Min Liu
are with the College of Electronics and Information Engineering,
Tongji University, Shanghai 201804, China (e-mail: 469702227@qq.
com; jwwang@tongji.edu.cn; zhaoyukaijake@tongji.edu.cn; 2010142
@tongji.edu.cn; lmin@tongji.edu.cn).
Weiming Shen is with the State Key Laboratory of Digital Manufac-
turing Equipment and Technology, Huazhong University of Science and
Technology, Wuhan 430074, China (e-mail: wshen@ieee.org).
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TII.2022.3205373.
Digital Object Identifier 10.1109/TII.2022.3205373
diagnosis, especially for rotating machinery [1]. Because rotat-
ing machinery, such as engines, turbines, and compressors, is
a critical component of most industrial facilities, many mathe-
matical models based on mechanical characteristics have been
developed to identify faults of rotating machinery. Yet, these
methods relying heavily on prior knowledge and expert experi-
ence are difficult to be applied to the real-world manufacturing
environment with enormous highly noisy data. The requirements
for real-time and high-performance drive researchers toward
data-driven fault diagnosis methods, such as deep learning (DL)
[2], [3].
Recent studies have employed DL methods, such as deep
belief networks [4], deep autoencoder [5], and convolutional
neural networks (CNNs) [6], [7], for fault diagnosis of rotating
machinery [8]. For example, Han et al. [9] proposed an improved
deep belief network for gear fault detection and obtained a high
diagnostic accuracy. Ren et al. [10] designed a deep autoencoder
to achieve nonlinear mapping of input fault data automatically.
Kiranyaz et al. [11] presented an adaptive CNN for real-time
fault detection and obtained an excellent classification perfor-
mance. However, these methods require sufficient labeled sam-
ples to train DL models; otherwise, they cannot achieve high
fault diagnosis performance [12]. In real industrial scenarios,
rotating machines usually work in normal conditions and seldom
misfunction, leading to rare fault data [13]. Consequently, DL
methods fail to maintain their advantages in industrial fault
diagnosis with small data.
Few-shot learning (FSL) is an emerging paradigm to train a
DL model with limited labeled samples. Briefly, an FSL task
is divided into many small subtasks and each subtask consists
of two sets: a support set (containing a few labeled samples)
and a query set (containing one or several unlabeled samples).
FSL-based fault diagnosis aims to obtain DL models which
can identify the fault type of unseen samples in the query set
using limited labeled samples from the support set [14]. A few
researchers have proposed FSL-based methods to solve fault
diagnosis with limited data [15]. For instance, Zhang et al.
[16] utilized the Siamese neural network to tackle fault data
scarcity problems. Feng et al. [17] proposed a semisupervised
meta-learning network with an attention mechanism to extract
distinct features from support samples to generate prototypes
for the classification of query samples. However, these meth-
ods ignored the relations between support and query samples,
including pairwise relations between two samples (i.e., instance-
level relation) and high-order relations between all samples
(i.e., distribution-level relation). Ignoring these relations that
1551-3203 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
1560 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023
distinct samples of different fault classes limit the performance
of previous FSL-based diagnosis methods.
In this article, we propose a novel FSL method to solve fault
diagnosis problems with limited data. First, we transform the
vibration signals of rotating machinery into images with each
image representing a sample and split them into support sets
and query sets. In each FSL subtask, we extract the sample
featured using a residual network [18]. Here, we use graphs
to abstract the relations between samples where each sample
represents a node in a graph. In particular, we construct two
graphs for all samples in a subtask, i.e., an instance graph and
a distribution graph. In the instance graph, the sample feature is
regarded as the node feature (instance feature) and the feature
similarity of two samples is regarded as the edge weight that
represents the instance-level relation of the two samples. In
the distribution graph, the distribution feature (the similarity
between a sample and the other samples) is regarded as a node
feature, and the edge weight representing the distribution-level
relation is calculated by the distribution feature of two nodes.
Hence, we design a dual graph neural network (DGNNet) to
learn the instance-level and distribution-level relations in the
above graphs for fault diagnosis. Specifically, DGNNet uses an
alternate update policy to propagate the label information of
labeled samples to unlabeled samples leveraging the relations
between samples at different levels. Moreover, this learning
strategy allows the support set to contain unlabeled samples,
promoting our proposed DGNNet to solve semisupervised fault
diagnosis problems.
The contributions of this article are summarized as follows.
1) We propose a novel FSL method named DGNNet that
integrates instance-level and distribution-level relation
learning for fault diagnosis. DGNNet learns different
level relations of query samples with limited data.
2) The alternate update policy between the instance graph
and distribution graph propagates label information of
scarce labeled samples to unlabeled samples within sev-
eral updates, facilitating DGNNet to address semisuper-
vised fault diagnosis.
3) Extensive experiments are implemented to evaluate
DGNNet in two benchmark datasets and a real-world
dataset. Our results show that DGNNet outperforms base-
lines by 3%–10% in supervised fault diagnosis tasks and
by 10%–12% in semisupervised cases.
The rest of this article is organized as follows. Section II
presents the preliminaries of this study. The proposed method
is detailed in Section III. In Section IV, the effectiveness of
DGNNet is evaluated with various case studies. Ablation studies
and discussion are presented in Section V. Finally, Section VI
concludes this article and presents future work.
II. PRELIMINARIES
A. Fault Diagnosis
DL-based fault diagnosis methods typically transform the
vibration signals into three-channel images. The segmentation
length of the signals should be adjusted for specific scenarios
containing a complete period for better classification. Each
Fig. 1. Episodic paradigm of FSL (5-way 1-shot).
84 ×84 image is converted by 4096 signal data points with a
sliding window of 84 as the length, allowing some of the data
points to be reused [2]. The three-channel image is generated
as follows: the value of each data point is normalized; each
pixel is filled correspondingly by each data point; and the signal
segments fill all rows of the image by sequence.
The input of DGNNet is the embedded features extracted from
these images with a residual network. The labeled and unlabeled
samples are utilized as the support set. Thus, the support set can
be denoted as
S=¯
L∪¯
U=(x1,y
1),...,xn¯
l,y
n¯
l∪xn¯
l+1,...,x
n¯
l+n¯u
(1)
where ¯
Lis the labeled data and ¯
Uis the unlabeled data. n¯
l
and n¯urepresent the number of labeled and unlabeled samples,
respectively. The input is defined as
X=[x1,x
2,...,x
n]∈Rn×3×DI(2)
where DI=84 ×84, n=n¯
l+n¯u+n¯qis the number of
all samples, and n¯qdenotes the number of samples in Q.All
query samples are classified by the FSL-based models, and the
prediction of a query sample can be formulated as
yi=argmax{pϕ(y=c|xi∈Q)}N
c=1(3)
where pϕ(y=c|xi∈Q)represents the probability that the
query sample xiis predicted as class c.
B. Few-Shot Learning (FSL)
FSL aims to learn a model to recognize unseen samples
through training with limited labeled examples [19]. Many
researchers use the episodic paradigm from meta-learning to
solve FSL problems [20]. In an episode (a subtask), a support
set Sincludes Nclasses with Ksamples per class, and a query
set Qcontains ˜
Tsamples. The goal of a subtask is to classify the
query samples using the support samples. Such a classification
problem is defined as an N-way K-shot problem. Fig. 1 shows an
example of the 5-way 1-shot FSL problem. We construct 15 000
training subtasks and 15 000 test subtasks. In each subtask, the
support set contains five different classes with one image per
class, and the query set contains one unlabeled image to be
classified. The overall classification accuracy was calculated by
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FEW-SHOT LEARNING FOR FAULT DIAGNOSIS WITH A DUAL GRAPH NEURAL NETWORK 1561
Fig. 2. Overall framework of DGNNet. The unlabeled samples are only used in semisupervised FSL for fault diagnosis.
the percentage of subtasks that the unlabeled sample is correctly
classified.
In this article, we discuss both the supervised and semisuper-
vised FSL for fault diagnosis. In the supervised cases, all support
samples are labeled, whereas the support set in the semisuper-
vised cases contains both labeled and unlabeled samples. Note
that query samples in both cases are unlabeled.
C. Graph Neural Networks (GNNs)
GNNs are DL methods for solving tasks on graph-structured
data [21]. Recently, GNNs have been widely applied in semisu-
pervised learning or FSL [22], [23]. GNNs typically include
two processes, i.e., the node update and edge update. For an
undirected graph G=(V,)with the node set Vand edge set E,
its adjacency matrix A∈Rn×nis defined as
Aij =1 if nodes (vi,v
j)connected
0 if nodes (vi,v
j)not connected .(4)
The updates of the node viand edge ekare formulated as
ek=ðe(¯v
i,e
k)¯e
k=ςe→v(E)(5)
vi=ðv(¯e
k,v
i)¯v
i=ςv→e(V)(6)
where E={e
k}k=1:Neand V={v
i}i=1:Nvrepresent the set
of edges and nodes (of cardinality Neand Nv). ekand viis the
attribute of an edge and a node. ðeis mapped across edges to
calculate the per-edge update and ðvis mapped across nodes to
calculate the per-node update. ςfunctions reduce a set to a single
element.
III. DUAL GRAPH NEURAL NETWORK (DGNNET)
In this section, we first introduce the proposed DGNNet
framework, and then briefly describe the feature learning. Then,
we elaborate on the instance-level and distribution-level relation
learning followed by the optimization of DGNNet.
A. Framework
The DGNNet framework is shown in Fig. 2. It contains
four parts, the feature learning module (residual network), the
instance-level relation learning module (instance graph), the
distribution-level relation learning module (distribution graph),
and the synergistic optimization. The residual network extracts
feature vectors from transformed images. The instance graph
learns the instance feature of all samples and the instance-level
relation between samples. The distribution graph is used to learn
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
1562 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023
the distribution feature and the distribution-level relation. The
synergistic optimization integrates the loss of the instance and
distribution similarities and optimizes DGNNet. In the end,
the last generation of the instance graph implements the fault
diagnosis.
B. Feature Learning
The residual network comprises several residual blocks (Res-
blocks), and fully connected (FC) layers, to extract embedded
features from transformed images. Each Resblock comprises
four 2-D convolutional (Conv2D) layers, four batch normal-
ization (BN) layers, one activation function ζLReLU, and one
max-pooling operation. In each Resblock, BN layers are used to
restrain internal covariate shift and smooth the loss surface [24].
The principle of BN can be formulated as
fBN =γi
xi−μ(X)
σ(X)2+ε
+βi(7)
where xirepresents the ith sample, and Xrepresents all input
samples. γiand βiare the learnable parameters, initialized to 1
and 0, respectively. ε=1×10−5is the hyperparameter.
The mean μ(xi)and variance σ(xi)of mini-batch in (7) are
denoted as
μ(X)= 1
nB
nB
i=1
xi(8)
σ(X)=
1
nBnB
i=1(xi−μ(X))2(9)
where nBis the size of a mini-batch, set to 50 in this article.
The feature vector of the sample xiis obtained through the
residual network and calculated as
vfi=fresnet (xi)(10)
where vfi∈Rm, and fresnet is a residual network.
C. Instance-Level Relation Learning
First, we concatenate the feature vector of each sample and
one-hot encoding of its label into a merged vector (i.e., instance
feature). Here, for an unlabeled sample, all elements in its one-
hot encoding are zero. We use this merged vector as the initial
instance feature of a sample, denoted as
vins
0,i=(vfi,(yi)) (11)
where vins
0,i∈Rm,is the concatenation operator, and (yi)is
a one-hot encoding of the label yiof a sample.
The instance similarity between any two samples (i.e.,
instance-level relation) is calculated by instance features and
formulated as
eins
0,ij =fGins
0vins
0,i−vins
0,j
2(12)
where eins
0,ij ∈R, and fGins :Rm→Ris an encoding network
to convert instance similarity into a certain scalar. As shown in
Fig. 3(a), the encoding network contains two Conv layers, two
BN layers, two LReLU activation layers, and a sigmoid layer.
Fig. 3. Network details about the instance graph and distribution
graph. (a) Encoding Network. (b) MLPd2i.(c)MLPi2d.
Then, we construct a fully connected instance graph where
any one node connects to the other nodes. Each sample is
regarded as a node with an instance feature, and the instance
similarity between two samples is regarded as the weight of
the edge between them. This instance graph is defined as
Gins
l=(Vins
l,Eins
l), where Vins
l={vins
l, i}and Eins
l={eins
l, ij }.
vins
l, i ∈Rmis the instance feature of the node iat the lth gener-
ation, and eins
l, ij ∈Ris the instance similarity of the edge (i, j)
at the lth generation. The node and edge features are updated,
respectively, by the following formulas:
vins
l, i =fMLPd2i⎛
⎝vins
l−1,i
,
T
j=1edis
l, ij ·vins
l−1,j
⎞
⎠(13)
eins
l, ij =fGins
lvins
l−1,i−vins
l−1,j
2·eins
l−1,ij (14)
where edis
l, ij is the distribution similarity of samples iand
jin the distribution graph (see details in the next section).
fMLPd2i:(Rm,Rm)→Rmis a multilayer perceptron (MLP)
mapping the distribution similarity and instance feature into
a new instance feature of the node i.MLPd2i,asshownin
Fig. 3(b), contains two Conv layers, two BN layers, and two
LReLU activation layers. The instance similarity in the final
instance graph contributes to the N-way K-shot fault diagnosis,
and the probability distribution for xibeing class cis denoted
as
Pϕ(yi=c|xi∈Q)=ζs⎛
⎝
NK
j=1
eins
lf,ij ·(yj)⎞
⎠(15)
where eins
lf,ij represents the instance similarity in the instance
graph at the final generation, and ζsis the softmax function.
D. Distribution-Level Relation Learning
For l=0, the distribution feature and distribution similarity
(i.e., distribution-level relation) are initialized as
vdis
0,i=NK
j=1δ(yi,y
j)if labeled
1
NK ,..., 1
NK if unlabeled (16)
edis
0,ij =fGdis
0vdis
0,i−vdis
0,j
2(17)
where vdis
0,i∈RNK,edis
0,ij ∈R, and fGdis :RNK →Ris an
encoding network to convert distribution similarity into a certain
scalar. δis the Kronecker function which is 1 if the variables
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FEW-SHOT LEARNING FOR FAULT DIAGNOSIS WITH A DUAL GRAPH NEURAL NETWORK 1563
are equal, and 0 otherwise. The encoding network is shown in
Fig. 3(a).
Then, we build a full connected distribution graph Gdis
l=
(Vdis
l,Edis
l)to learn the distribution features and distribution-
level relations, where Vdis
l={vdis
l, i},Edis
l={edis
l, ij }.vdis
l, i ∈
RNK is the distribution feature of the node iat the lth generation,
and edis
l, ij ∈Ris the distribution similarity of the edge (i, j)atthe
lth generation. The representation of node iin the distribution
graph is obtained as
vdis
l, i =fMLPi2dvdis
l−1,i
,NK
j=1eins
l, ij (18)
where fMLPi2d:(RNK,RNK)→RNK is an MLP to map the
instance similarity and distribution feature into a new distribu-
tion feature of node i.TheMLPi2dis shown in Fig. 3(c).The
weight of the edge (i, j)in the distribution graph is calculated
at the lth generation as
edis
l, ij =fGdis
lvdis
l, i −vdis
l, j 2·edis
l−1,ij
.(19)
Except for l=0,the instance and distribution features are
learned by DGNNet with an alternate update strategy, as shown
in Fig. 2. In particular, a complete update of the lth generation
is: Eins
l
MLPi2d
→Vdis
l
Gdis
l
→Edis
l
MLPd2i
→Vins
l
Gins
l
→Eins
l+1.
E. Synergistic Optimization
The class prediction of the concerned sample xi∈Qin N-way
K-shot fault classification is reformulated as
yi=argmax
⎧
⎨
⎩ζs⎛
⎝
NK
j=1
eins
lf,ij ·(yj)⎞
⎠⎫
⎬
⎭
N
c=1
.(20)
The loss of GNN for instance-level relation learning at the lth
generation is defined as
Lins
l=Lce ⎛
⎝ζs⎛
⎝
NK
j=1
eins
lf,ij ·(yj)⎞
⎠,yi⎞
⎠(21)
where Lce stands for the cross-entropy loss function, and ζsis
the softmax activation function. The loss for distribution-level
relation learning at the lth generation is defined as
Ldis
l=Lce ⎛
⎝ζs⎛
⎝
NK
j=1
edis
l, ij ·(yj)⎞
⎠,y
i⎞
⎠.(22)
Then, the loss function of the proposed DGNNet consists of
the instance-level loss and distribution-level loss defined as
L=
lf
l=1λinsLins
l+ξdisLdis
l(23)
where lfdenotes the number of generations, and λins and ξdis
are the weight parameters set to 1.0 and 0.1 in this article.
We use Ranger21 [25], a recently proposed optimizer, to opti-
mize the proposed DGNNet. Ranger21 integrates AdamW [26],
Lookahead [27], and gradient centralization [28]. It can reduce
the variance of the training loss and improve the convergence
performance.
Fig. 4. Results of supervised 5-way case on the CWRU dataset.
WDCNN∗indicates that WDCNN is trained in FSL.
IV. CASE STUDIES
To verify the capability of DGNNet for FSL-based fault diag-
nosis, we conduct case studies on various real bearing datasets.
In this section, the comparative experiments contain four re-
lated methods: wide deep CNN (WDCNN) [29], FSL-based
fault diagnosis (FLFD) [16], GNN [18], and self-supervised
joint learning (SSJL) method [30]. GNN is used as a baseline
model comparison, and the components are consistent with our
DGNNet except for the absence of the distribution graph. We
implement and evaluate them with two benchmark datasets from
the Case Western Reserve University (CWRU) and the Machin-
ery Failure Prevention Technology (MFPT). Furthermore, we
carry out a real-world industrial case to verify the generalization
performance of DGNNet. We train all models on a 3090Ti GPU
and report the average result of 20 experiments in all case studies.
A. Same-Load Fault Diagnosis
1) Description of CWRU Dataset: The CWRU dataset has
been widely used to verify intelligent fault diagnosis methods.
The vibration data are collected by the accelerometer at the
driver end with 12 kHz sampling frequency, which consists of
four bearing health conditions: normal state (NS), inner race
failure (IF), outer race failure (OF), and ball element failure
(BF). Each fault category is manufactured to the driver end
bearings with diameters of 0.18, 0.36, 0.53, and 0.71 mm. Thus,
the bearing vibration data without motor loads can be divided
into 12 fault categories. Finally, the meta dataset is constructed of
3840 samples, which contains 100 training samples, 200 testing
samples, and 20 validation samples per class.
2) Supervised Fault Diagnosis: We verify the performance
of DGNNet with comparison experiments by randomly selecting
6, 10, 20, 30, 60, and 100 samples per class. It took us almost 6
h to run the 5-way 1-shot experiment and 12 h to run the 5-way
5-shot experiment in 20 epochs.
Fig. 4 shows the accuracy of all methods in supervised fault
diagnosis tasks. It is clear that DGNNet maintains high perfor-
mance even with six samples per class and outperforms other
methods either in 1- or 5-shot experiments. As can be seen
from Table I, when six samples are provided for each class,
the fully supervised case results in a 1-shot accuracy of 99.55
and a 5-shot accuracy of 99.81. Under the same 5-way 1-shot
scenario, the accuracy is almost 11% higher than that of GNN
and 20% higher than that of FLFD and WDCNN, demonstrating
that the CNN-based method only learns partial fault features.
Notably, GNN shows greater identification ability than WDCNN
and FLFD, which proves that GNN outperforms the CNN-based
methods through learning pairwise relations between any two
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
1564 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023
TABLE I
SUPERVISED FAULT DIAGNOSIS RESULTS ON CWRU DATASET
TABLE II
SEMISUPERVISED FAULT DIAGNOSIS RESULTS ON CWRU DATASET
Fig. 5. T-SNE visualization for semisupervised 5-shot classification.
(a) WDCNN. (b) SSJL. (c) GNN. (d) DGNNet.
samples. DGNNet can identify all faults precisely with 100.00%
accuracy under the 5-shot case, but GNN and FLFD fail to do
this. The 5-shot training paradigm generally performs better than
the 1-shot, which could be interpreted that more support samples
help models find the global convergence direction.
3) Semisupervised Fault Diagnosis: To further evaluate the
effectiveness of DGNNet for semisupervised fault diagnosis,
we train DGNNet on support sets with different labeled and
unlabeled samples. We trained DGNNet in 25 epochs for 7
and 13 h to perform 5-way 1- and 5-shot classification tasks,
respectively.
The results of the semisupervised experiments are shown in
Table II. All semisupervised methods are trained in 25 epochs.
The number of unlabeled samples is consistent with the number
of labeled samples in each support set, i.e., n¯u∈{1,5}for
each class, and this selection method has proved its reliability
in subsequent experiments. It can be seen that DGNNet outper-
forms baselines in both 1- and 5-shot cases. Fig. 5 shows the 2-D
t-SNE visualization of extracted high-dimensional features for
each method. Remarkably, DGNNet cluster samples of the same
Fig. 6. Effectiveness of various unlabeled sample numbers on 1-shot
(left) and 5-shot (right) classification.
classes together and keep samples with different categories as
separate as possible.
By comparing the results of the supervised (see Table I) and
semisupervised (see Table II) experiments, it can be seen that
DGNNet achieves better classification using unlabeled data.
Although this improvement is small, it is worth noting that
models are hard to converge when very few labeled samples
are given in the supervised scenario. DGNNet still achieves
high classification accuracy when combining only three labeled
samples per class with unlabeled samples.
As demonstrated in the comparison experiments, unlabeled
samples are utilized in training to improve the fault diagnosis
performance. As shown in Fig. 6, the accuracy of DGNNet
improves when a proper number of unlabeled samples is given.
However, the performance of DGNNet drops dramatically when
the number of unlabeled samples exceeds that of labeled sam-
ples. This indicates that the best classification occurs when
the number of unlabeled samples approximates that of labeled
samples in each subtask, i.e., n¯u=K. In this article, we use
the consistent setting n¯u∈{1,5}for all FSL experiments.
B. Multiload Fault Diagnosis
1) Description of MFPT Dataset: The vibration signals from
the CWRU are all collected at the same motor speed, 1797 rpm.
Thus, the classification models could perform well for FSL based
fault diagnosis. To evaluate the validity of DGNNet on multiload
scenarios, we implement the fault classification experiments on
MFPT dataset, which contains three sets of multiload bearing
signals. The normal (N) baseline set is constructed at a load
of 270 lbs with a sampling rate of 97 656 sample per second
(SPS) for 6 seconds. Meanwhile, the fault signals of inner race
(IR) and outer race (OR) are obtained from the bearing test rig
at a sampling rate of 48828 SPS for 3 seconds, which works
under six load conditions: 50, 100, 150, 200, 250, 300 lbs.
We constructed the meta dataset for MFPT utilizing the signal
preprocessing method mentioned in Section II.A, and each load
condition contains 100 training samples, 200 testing samples,
and 20 samples for verification. It took us 4 hours to run the
3-way 1-shot experiment and 8 hours to run the 3-way 5-shot
experiment in 20 epochs.
2) Supervised Fault Diagnosis: In the MFPT dataset, each
fault class comprises the same number of samples under six
different loads. That is, {1,2,3}samples are selected from
different load data, and each class contains {6,12,18}samples.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FEW-SHOT LEARNING FOR FAULT DIAGNOSIS WITH A DUAL GRAPH NEURAL NETWORK 1565
TABLE III
SUPERVISED FAULT DIAGNOSIS ON MFPT DATASET
Fig. 7. t-SNE visualization for supervised 5-shot classification.
(a) WDCNN. (b) GNN. (c) DGNNet.
TABLE IV
SEMISUPERVISED FAULT DIAGNOSIS RESULTS ON MFPT DATASET
Here, we conduct experiments with various numbers of training
samples, where the number of samples with different motor
speeds remains exactly the same.
As shown in Table III, when there are 12 or 18 training samples
in each class for few-shot classification, DGNNet can identify
all faults accurately. The accuracy of DGNNet can reach 100%,
which is 13.38% higher than that of GNN and 25.64% higher
than that of WDCNN. In order to evaluate the performance of
various comparison methods with multiload samples, the ex-
tracted fault features are visualized by t-SNE in Fig. 7. WDCNN
can identify the vibration signals in the normal state but can
hardly separate the fault samples under multiple loads. DGNNet
can learn better relations between samples from limited samples,
and accurately identifies all faults of bearings under multiload
conditions. DGNNet makes fault features as close as possible in
each class and as separate as possible between different classes.
3) Semisupervised Fault Diagnosis: To further evaluate the
semisupervised capability of DGNNet, we carry out a series of
multiload experiments with 20 epochs. The number of unlabeled
samples is the same as that of labeled samples for each class,
i.e., K.AsshowninTable IV, under the same experimental
setting, DGNNet outperforms baseline methods. DGNNet can
classify all faults accurately even with one labeled sample per
load condition, that is, there are six labeled samples and one
unlabeled sample for each class in the 1-shot experiment. Note
that GNN uses an appropriate number of unlabeled samples to
improve the accuracy by almost 5%, and DGNNet constructs the
distribution graph to propagate labels and makes better use of
unlabeled data than GNN. Thus, it can be inferred that DGNNet
Fig. 8. Illustration of the OCB dataset.
TABLE V
DESCRIPTION OF OCB DATASET
TABLE VI
SUPERVISED FAULT DIAGNOSIS RESULTS ON OCB DATASET
has greater potential in tackling semisupervised FSL problems
for multiload fault diagnosis.
C. Industrial Scenario Fault Diagnosis
1) Description of OCB Dataset: Most existing methods per-
form well on the vibration data collected from the test rig and
are rarely applied to real-world data. In this section, compari-
son methods are evaluated on a real-world oxygen compressor
bearing (OCB) dataset from a smart factory. As shown in Fig. 8,
the bearing data is provided by the largest copper smelter in the
world, measured by an accelerometer on the oxygen compressor,
and composed of three bearing health conditions: normal con-
dition (NC), inner race fault (IRF), and outer race fault (ORF).
The bearing signals are collected by the accelerometer every five
seconds and stored in the database, from which we took six days
of historical data to construct the dataset. The description of the
OCB dataset is shown in Table V.
2) Supervised Fault Diagnosis: To analyze the performance
of all methods in the real-world scenario, we carry out a 3-
way fault diagnosis experiment on OCB as an example. Train-
ing DGNNet on OCB took us almost 4 hours to run the 1-
shot experiment and 8 hours to run the 5-shot experiment in
20 epochs. Table VI shows the performance of all methods
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
1566 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023
TABLE VII
SEMISUPERVISED FAULT DIAGNOSIS RESULTS ON OCB DATASET
TABLE VIII
DESCRIPTION FOR THREE CASES IN ABLATION STUDIES
in the supervised scenario. DGNNet achieves desirable perfor-
mance for fault diagnosis in a real-world scenario. With only six
samples available, DGNNet can obtain an excellent accuracy
of 99.48% in the 1-shot experiment, which is 2.89% higher
than that of GNN and 35.56% higher than that of WDCNN.
The classification result of DGNNet can achieve an accuracy of
100.00% with only six samples per class. In a 1-shot experiment,
DGNNet can effectively distinguish different faults, whereas
GNN and WDCNN could only identify normal samples well
but easily confuse faults.
3) Semisupervised Fault Diagnosis: In real-world scenarios,
we further verify the performance of DGNNet. The semisu-
pervised experiments are carried out in 20 epochs. DGNNet
can further improve diagnostic ability when a few unlabeled
samples are provided. As shown in Table VII, the baseline
methods can only obtain excellent diagnostic results when 20
samples are used for training. In 1-shot experiments, DGNNet
obtains an excellent accuracy of 100.00%, which is much higher
than other baseline methods. DGNNet shows superior diagnostic
performance in all semisupervised experiments. It can be seen
that DGNNet has great potential in industrial applications of
fault diagnosis, where unlabeled data from rotating machinery
is easier to collect.
V. A BLATION STUDIES AND DISCUSSIONS
In this section, we use three cases of ablation studies to explore
the effect of the size of Nand the generation numbers of DGNNet
on the diagnostic results. The three cases are: (a) same-load case
on CWRU; (b) multiload case on MFPT; (c) industrial case on
OCB. The detailed description is presented in Table VIII. Finally,
the convergence of DGNNet on the CWRU is analyzed, which
is similar to the other cases.
A. N-Way Fault Diagnosis
The effect of the number of ways is further explored for N-
way fault diagnosis in various scenarios. As shown in Fig. 9,
DGNNet obtains an accuracy of 99.83% in 2-way 1-shot fault
diagnosis on the CWRU dataset, which is 0.33% higher than in
5-way classification. In the 2-way industrial scenario (case c),
DGNNet has a more attractive diagnostic performance. Note that
Fig. 9. Influence of Nin N-way 1-shot classification (25 epochs).
Fig. 10. Generation numbers in DGNNet on three cases (25 epochs).
Fig. 11. Evolution of the test accuracy under different optimizers.
we carried out more difficult experiments in Section IV, allowing
us better to evaluate the semisupervised learning effectiveness
of DGNNet.
B. Generation Numbers
For datasets of various scales and complexities, the number
of alternate updating generations determines the test accuracy
and convergence speed of DGNNet. As shown in Fig. 10,the
test accuracy is greatly improved with the generation number
increasing from 0 to 2, and it fluctuates within a small range
when the generation number is between 2 and 8. The larger
the generation number, the longer the convergence time. In
this article, we set the generation number as 4 for the tradeoff
between test accuracy and convergence speed.
C. Ranger21 Optimizer
A recently proposed optimizer, Ranger21, is utilized in our
study. Fig. 11 shows the evolution of the test accuracy under
different optimizers in 60 epochs. The default learning rate of
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FEW-SHOT LEARNING FOR FAULT DIAGNOSIS WITH A DUAL GRAPH NEURAL NETWORK 1567
Adam and AdamW is 3e−3. Expect for Ranger21, in the early
two epochs, Ranger21 undergoes the linear warm-up process,
and then the learning rate is set to 3e−3. The linear warm-down
starts at 40 epochs, and the learning rate will decay until 3e−5.
It can be seen that Ranger21 and AdamW have better perfor-
mance than Adam at the early stage. After 20 epochs, Ranger21
consistently outperforms AdamW. Significantly, Ranger21 can
accelerate the model learning and obtain a high accuracy without
compromising generalization.
VI. CONCLUSION AND FUTURE WORK
In this article, we proposed an FSL method for fault diag-
nosis, namely DGNNet. DGNNet uses an instance graph and
a distribution graph to learn the pairwise relation between two
samples and high-order relations between all samples, respec-
tively. The learned multilevel relations help DGNNet to classify
unseen samples. In particular, the distribution graph propagates
universal label information from a few labeled samples to un-
labeled samples, enabling DGNNet to address semisupervised
problems. Extensive experiments were implemented to evaluate
the performance of DGNNet for supervised and semisupervised
fault diagnosis. The results show that DGNNet achieves excel-
lent effectiveness in fault classification and respectable general-
ization performance in various scenarios. In the future, we will
try to apply DGNNet to tackle other data scarcity problems and
label new samples in real-world applications. In addition, we
will investigate the diagnosis of unseen fault types using FSL.
REFERENCES
[1] H. Shao, M. Xia, G. Han, Y. Zhang, and J. Wan, “Intelligent fault diagnosis
of rotor-bearing system under varying working conditions with modified
transfer convolutional neural network and thermal images,” IEEE Trans.
Ind. Inform., vol. 17, no. 5, pp. 3488–3496, May 2021.
[2] G. Xu, M. Liu, Z. Jiang, W. Shen, and C. Huang, “Online fault diagnosis
method based on transfer convolutional neural networks,” IEEE Trans.
Instrum. Meas., vol. 69, no. 2, pp. 509–520, Feb. 2020.
[3] G. Liu, W. Shen, L. Gao, and A. Kusiak, “Predictive modeling with an
adaptive unsupervised broad transfer algorithm,” IEEE Trans. Instrum.
Meas., vol. 70, 2021, Art. no. 3520212.
[4] S. Xing, Y. Lei, S. Wang, and F. Jia, “Distribution-invariant deep belief
network for intelligent fault diagnosis of machines under new working
conditions,” IEEE Trans. Ind. Electron., vol. 68, no. 3, pp. 2617–2625,
Mar. 2021.
[5] C. Sun, M. Ma, Z. Zhao, S. Tian, R. Yan, and X. Chen, “Deep transfer
learning based on sparse autoencoder for remaining useful life prediction
of tool in manufacturing,” IEEE Trans. Ind. Inform., vol. 15, no. 4,
pp. 2416–2425, Apr. 2019.
[6] W. Song, W. Shen, L. Gao, and X. Li, “An early fault detection method
of rotating machines based on unsupervised sequence segmentation con-
volutional neural network,” IEEE Trans. Instrum. Meas., vol. 71, 2022,
Art. no. 3504712.
[7] G. W. Xu, M. Liu, Z. F. Jiang, D. Soffker, and W. M. Shen, “Bearing fault
diagnosis method based on deep convolutional neural network and random
forest ensemble learning,”Sensors, vol. 19, no. 5, Mar. 2019, Art. no. 1088.
[8] C.Zhao, G. K. Liu, and W. M. Shen, “A dual-view alignment-based domain
adaptation network for fault diagnosis,”Meas. Sci. Technol., vol.32, no. 11,
Nov. 2021, Art. no. 115102.
[9] D. Han, X. Guo, and P. Shi, “An intelligent fault diagnosis method of
variable condition gearbox based on improved DBN combined with WPEE
and MPE,” IEEE Access, vol. 8, pp. 131299–131309, 2020.
[10] Z. Ren, W. Zhang, and Z. Zhang, “A deep nonnegative matrix factorization
approach via autoencoder for nonlinear fault detection,” IEEE Trans. Ind.
Inform., vol. 16, no. 8, pp. 5042–5052, Aug. 2020.
[11] S. Kiranyaz, A. Gastli, L. Ben-Brahim, N. Al-Emadi, and M. Gabbouj,
“Real-time fault detection and identification for MMC using 1-D con-
volutional neural networks,” IEEE Trans. Ind. Electron., vol. 66, no. 11,
pp. 8760–8771, Nov. 2019.
[12] C.Zhao, G. K. Liu, W. M. Shen, and L. Gao, “A multi-representation-based
domain adaptation network for fault diagnosis,” Measurement, vol. 182,
Sep. 2021, Art. no. 109650.
[13] T. C. Zhang et al., “Intelligent fault diagnosis of machines with small &
imbalanced data: A state-of-the-art review and possible extensions,” ISA
Trans, vol. 119, pp. 152–171, Jan. 2021.
[14] L. Feng and C. Zhao, “Fault description based attribute transfer for zero-
sample industrial fault diagnosis,” IEEE Trans. Ind. Inform., vol. 17, no. 3,
pp. 1852–1862, Mar. 2021.
[15] Y. Hu, R. Liu, X. Li, D. Chen, and Q. Hu, “Task-sequencing meta learning
for intelligent few-shot fault diagnosis with limited data,” IEEE Trans. Ind.
Inform., vol. 18, no. 6, pp. 3894–3904, Jun. 2022.
[16] A. Zhang, S. Li, Y. Cui, W. Yang, R. Dong, and J. Hu, “Limited data
rolling bearing fault diagnosis with few-shot learning,” IEEE Access,
vol. 7, pp. 110895–110904, 2019.
[17] Y. Feng, J. Chen, T. Zhang, S. He, E. Xu, and Z. Zhou, “Semi-supervised
meta-learning networks with squeeze-and-excitation attention for few-shot
fault diagnosis,” ISA Trans., vol. 120, pp. 383–401, Jan. 2022.
[18] V. G. Satorras and J. Bruna, “Few-shot learning with graph neural net-
works,” Proc. Int. Conf. Learn. Representations, 2018, pp. 1–13.
[19] Y. Xiao, Y. Jin, and K. Hao, “Adaptive prototypical networks with la-
bel words and joint representation learning for few-shot relation clas-
sification,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–12, 2021,
doi: 10.1109/TNNLS.2021.3105377.
[20] T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-learning
in neural networks: A survey,” IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 44, no. 9, pp. 5149–5169, Sep. 2022.
[21] R. Zhou, X. Chang, L. Shi, Y.-D. Shen, Y. Yang, and F. Nie, “Person
reidentification via multi-feature fusion with adaptive graph learning,”
IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 5, pp. 1592–1601,
May 2020.
[22] M. Luo, X. Chang, L. Nie, Y. Yang, A. G. Hauptmann, and Q. Zheng, “An
adaptive semisupervised feature analysis for video semantic recognition,”
IEEE Trans. Cybern., vol. 48, no. 2, pp. 648–660, Feb. 2018.
[23] Z. Li, F. Nie, X. Chang, Y. Yang, C. Zhang, and N. Sebe, “Dynamic
affinity graph construction for spectral clustering using multiple features,”
IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 12, pp. 6323–6332,
Dec. 2018.
[24] S. Santurkar, D. Tsipras, A. Ilyas, and A. Madry, “How does batch
normalization help optimization?,” Adv. Neural Inform. Process. Syst.,
vol. 31, pp. 2488–2498, 2018.
[25] L. Wright and N. Demeure, “Ranger21: A synergistic deep learning
optimizer,” 2021, arXiv:2106.13731.
[26] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in
Proc. Int. Conf. Learn. Representations, 2019, pp. 1–18.
[27] M. R. Zhang, J. Lucas, G. Hinton, and J. Ba, “Lookahead optimizer: k
steps forward, 1 step back,” Adv. Neural Inform. Process. Syst., vol. 32,
pp. 9593–9604, 2019.
[28] W. Fuhl and E. Kasneci, “Weight and gradient centralization in deep neural
networks,” in Artificial Neural Networks and Machine Learning (Lecture
Notes in Computer Science), vol. 12894. Cham, Switzerland: Springer,
2021, pp. 227–239.
[29] W. Zhang, G. L. Peng, C. H. Li, Y. H. Chen, and Z. J. Zhang, “A
new deep learning model for fault diagnosis with good anti-noise and
domain adaptation ability on raw vibration signals,”Sensors, vol. 17, no. 2,
Feb. 2017, Art. no. 425.
[30] W. W. Zhang, D. J. Chen, and Y. Kong, “Self-supervised joint learning
fault diagnosis method based on three-channel vibration images,” Sensors,
vol. 21, no. 14, Jul. 2021, Art. no. 4774.
Han Wang received the B.E. degree in automa-
tion in 2019 from Tongji University, Shanghai,
China, where he is currently working toward the
Ph.D. degree in control science and engineering
with the College of Electronics and Information
Engineering.
His research interests include fault diagnosis,
few-shot learning, and graph deep learning.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
1568 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023
Jingwei Wang received the B.E. degree in con-
trol science and engineering from Shandong
University, Jinan, China, in 2016. He is currently
working toward the Ph.D. degree in control sci-
ence and engineering with the College of Elec-
tronics and Information Engineering, Tongji Uni-
versity, Shanghai, China, in 2022.
His research interests include data mining,
network science, and machine learning.
Yukai Zhao received the B.E. degree in au-
tomation from Nanjing Institute of Technology,
Nanjing, China, in 2019. He is currently work-
ing toward the Ph.D. degree in control science
and engineering with the College of Electronics
and Information Engineering, Tongji University,
Shanghai, China.
His research interests include action recogni-
tion, graph data mining, and graph deep learn-
ing.
Qing Liu received the B.E. degree in automa-
tion from Changshu Institute of Technology,
Changshu, China, in 2014, and the M.E. degree
in control science and engineering in 2017 from
Tongji University, Shanghai, China, where he is
currently working toward the Ph.D. degree in
control science and engineering with the Col-
lege of Electronics and Information Engineering,
Tongji University, Shanghai, China.
His research interests include machine learn-
ing, deep learning, and their applications in in-
telligent manufacturing.
Min Liu received the B.E. degree in mechan-
ical engineering from the China University of
Geosciences, Wuhan, China, in 1993, and the
M.E. degree in mechanics and the Ph.D. degree
in mechanical engineering and automation from
Zhejiang University, Hangzhou, China, in 1996
and 1999, respectively.
He is currently a Professor with the College of
Electronics and Information Engineering, Tongji
University, Shanghai, China. He has been work-
ing on computer science and system engineer-
ing and collaborative MRO and intelligent manufacturing for about 14
years. He authored or coauthored more than 100 papers in scientific
journals and international conferences in related areas. His research
interests include deep learning, fault diagnosis and prediction, and intel-
ligent maintenance.
Weiming Shen (Fellow, IEEE) received the
B.E. and M.S. degrees in mechanical engineer-
ing from Northern Jiaotong University, Beijing,
China, in 1983 and 1986, respectively, and the
Ph.D. degree in system control from the Univer-
sity of Technology of Compiegne, Compiegne,
France, in 1996.
He is currently a Professor with the Huazhong
University of Science and Technology (HUST),
Wuhan, China, and an Adjunct Professor with
the University of Western Ontario, London, ON,
Canada. Before joining HUST in 2019, he was a Principal Research
Officer at the National Research Council Canada. His work has been
cited more than 16 000 times with an h-index of 61. He authored
or coauthored several books and more than 560 articles in scientific
journals and international conferences in related areas. His research
interests include agent-based collaboration technologies and applica-
tions, collaborative intelligent manufacturing, the Internet of Things, and
Big Data analytics.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.