ArticlePDF Available

Few-Shot Learning for Fault Diagnosis With a Dual Graph Neural Network

Authors:

Abstract

Mechanical fault diagnosis is crucial to ensure safe operations of equipment in intelligent manufacturing systems. Deep learning-based methods have been recently developed for fault diagnosis due to their advantages in feature representation. However, most of these methods fail to learn relations between samples and thus perform poorly without sufficient labeled data. In this paper, we propose a new few-shot learning method named Dual Graph Neural network (DGNNet) with residual blocks to address fault diagnosis problems with limited data. Firstly, the residual module learns the feature of samples with image data transferred from original signals. Secondly, two complete graphs built on the sample features are used to extract the instance-level and distribution-level relations between samples. In particular, an alternate update policy between the instance and distribution graphs integrates the multilevel relations to propagate the label information of a few labeled samples to unlabeled samples. This technique leverages labeled and unlabeled samples to identify unseen faults, encouraging DGNNet competent in fault diagnosis tasks with very few labeled samples. Extensive results on various datasets show that DGNNet achieves excellent performance in supervised fault diagnosis tasks and outperforms baselines by a great margin in semi-supervised cases.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023 1559
Few-Shot Learning for Fault Diagnosis With a
Dual Graph Neural Network
Han Wang , Jingwei Wang , Yukai Zhao , Qing Liu , Min Liu , and Weiming Shen , Fellow, IEEE
AbstractMechanical fault diagnosis is crucial to ensure
the safe operations of equipment in intelligent manufactur-
ing systems. Deep learning-based methods have been re-
cently developed for fault diagnosis due to their advantages
in feature representation. However, most of these methods
fail to learn relations between samples and thus perform
poorly without sufficient labeled data. In this article, we
propose a new few-shot learning method named dual graph
neural network (DGNNet) with residual blocks to address
fault diagnosis problems with limited data. First, the resid-
ual module learns the feature of samples with image data
transferred from original signals. Second, two complete
graphs built on the sample features are used to extract the
instance-level and distribution-level relations between sam-
ples. In particular, an alternate update policy between the
instance and distribution graphs integrates the multilevel
relations to propagate the label information of a few labeled
samples to unlabeled samples. This technique leverages
labeled and unlabeled samples to identify unseen faults,
encouraging DGNNet competency in fault diagnosis tasks
with very few labeled samples. Extensive results on vari-
ous datasets show that DGNNet achieves excellent perfor-
mance in supervised fault diagnosis tasks and outperforms
baselines by a great margin in semisupervised cases.
Index TermsDistribution learning, fault diagnosis, few-
shot learning (FSL), graph neural network (GNN), semisu-
pervised learning.
I. INTRODUCTION
THE increasing availability of big data on manufacturing
equipment offers unprecedented opportunities to explore
methods and tools for predictive maintenance of machinery.
Predictive maintenance aims to prevent mechanical failures
or to detect them before they occur to reduce losses. In the
last decades, researchers have devoted much attention to fault
Manuscript received 29 December 2021; revised 26 July 2022; ac-
cepted 4 September 2022. Date of publication 9 September 2022; date
of current version 13 December 2022. This work was supported by the
National Key R&D Program of China under Grant 2019YFB1704700 and
NSFC under Grant 62273261. Paper no. TII-21-5812. (Corresponding
author: Min Liu.)
Han Wang, Jingwei Wang, Yukai Zhao, Qing Liu, and Min Liu
are with the College of Electronics and Information Engineering,
Tongji University, Shanghai 201804, China (e-mail: 469702227@qq.
com; jwwang@tongji.edu.cn; zhaoyukaijake@tongji.edu.cn; 2010142
@tongji.edu.cn; lmin@tongji.edu.cn).
Weiming Shen is with the State Key Laboratory of Digital Manufac-
turing Equipment and Technology, Huazhong University of Science and
Technology, Wuhan 430074, China (e-mail: wshen@ieee.org).
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TII.2022.3205373.
Digital Object Identifier 10.1109/TII.2022.3205373
diagnosis, especially for rotating machinery [1]. Because rotat-
ing machinery, such as engines, turbines, and compressors, is
a critical component of most industrial facilities, many mathe-
matical models based on mechanical characteristics have been
developed to identify faults of rotating machinery. Yet, these
methods relying heavily on prior knowledge and expert experi-
ence are difficult to be applied to the real-world manufacturing
environment with enormous highly noisy data. The requirements
for real-time and high-performance drive researchers toward
data-driven fault diagnosis methods, such as deep learning (DL)
[2], [3].
Recent studies have employed DL methods, such as deep
belief networks [4], deep autoencoder [5], and convolutional
neural networks (CNNs) [6], [7], for fault diagnosis of rotating
machinery [8]. For example, Han et al. [9] proposed an improved
deep belief network for gear fault detection and obtained a high
diagnostic accuracy. Ren et al. [10] designed a deep autoencoder
to achieve nonlinear mapping of input fault data automatically.
Kiranyaz et al. [11] presented an adaptive CNN for real-time
fault detection and obtained an excellent classification perfor-
mance. However, these methods require sufficient labeled sam-
ples to train DL models; otherwise, they cannot achieve high
fault diagnosis performance [12]. In real industrial scenarios,
rotating machines usually work in normal conditions and seldom
misfunction, leading to rare fault data [13]. Consequently, DL
methods fail to maintain their advantages in industrial fault
diagnosis with small data.
Few-shot learning (FSL) is an emerging paradigm to train a
DL model with limited labeled samples. Briefly, an FSL task
is divided into many small subtasks and each subtask consists
of two sets: a support set (containing a few labeled samples)
and a query set (containing one or several unlabeled samples).
FSL-based fault diagnosis aims to obtain DL models which
can identify the fault type of unseen samples in the query set
using limited labeled samples from the support set [14]. A few
researchers have proposed FSL-based methods to solve fault
diagnosis with limited data [15]. For instance, Zhang et al.
[16] utilized the Siamese neural network to tackle fault data
scarcity problems. Feng et al. [17] proposed a semisupervised
meta-learning network with an attention mechanism to extract
distinct features from support samples to generate prototypes
for the classification of query samples. However, these meth-
ods ignored the relations between support and query samples,
including pairwise relations between two samples (i.e., instance-
level relation) and high-order relations between all samples
(i.e., distribution-level relation). Ignoring these relations that
1551-3203 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
1560 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023
distinct samples of different fault classes limit the performance
of previous FSL-based diagnosis methods.
In this article, we propose a novel FSL method to solve fault
diagnosis problems with limited data. First, we transform the
vibration signals of rotating machinery into images with each
image representing a sample and split them into support sets
and query sets. In each FSL subtask, we extract the sample
featured using a residual network [18]. Here, we use graphs
to abstract the relations between samples where each sample
represents a node in a graph. In particular, we construct two
graphs for all samples in a subtask, i.e., an instance graph and
a distribution graph. In the instance graph, the sample feature is
regarded as the node feature (instance feature) and the feature
similarity of two samples is regarded as the edge weight that
represents the instance-level relation of the two samples. In
the distribution graph, the distribution feature (the similarity
between a sample and the other samples) is regarded as a node
feature, and the edge weight representing the distribution-level
relation is calculated by the distribution feature of two nodes.
Hence, we design a dual graph neural network (DGNNet) to
learn the instance-level and distribution-level relations in the
above graphs for fault diagnosis. Specifically, DGNNet uses an
alternate update policy to propagate the label information of
labeled samples to unlabeled samples leveraging the relations
between samples at different levels. Moreover, this learning
strategy allows the support set to contain unlabeled samples,
promoting our proposed DGNNet to solve semisupervised fault
diagnosis problems.
The contributions of this article are summarized as follows.
1) We propose a novel FSL method named DGNNet that
integrates instance-level and distribution-level relation
learning for fault diagnosis. DGNNet learns different
level relations of query samples with limited data.
2) The alternate update policy between the instance graph
and distribution graph propagates label information of
scarce labeled samples to unlabeled samples within sev-
eral updates, facilitating DGNNet to address semisuper-
vised fault diagnosis.
3) Extensive experiments are implemented to evaluate
DGNNet in two benchmark datasets and a real-world
dataset. Our results show that DGNNet outperforms base-
lines by 3%–10% in supervised fault diagnosis tasks and
by 10%–12% in semisupervised cases.
The rest of this article is organized as follows. Section II
presents the preliminaries of this study. The proposed method
is detailed in Section III. In Section IV, the effectiveness of
DGNNet is evaluated with various case studies. Ablation studies
and discussion are presented in Section V. Finally, Section VI
concludes this article and presents future work.
II. PRELIMINARIES
A. Fault Diagnosis
DL-based fault diagnosis methods typically transform the
vibration signals into three-channel images. The segmentation
length of the signals should be adjusted for specific scenarios
containing a complete period for better classification. Each
Fig. 1. Episodic paradigm of FSL (5-way 1-shot).
84 ×84 image is converted by 4096 signal data points with a
sliding window of 84 as the length, allowing some of the data
points to be reused [2]. The three-channel image is generated
as follows: the value of each data point is normalized; each
pixel is filled correspondingly by each data point; and the signal
segments fill all rows of the image by sequence.
The input of DGNNet is the embedded features extracted from
these images with a residual network. The labeled and unlabeled
samples are utilized as the support set. Thus, the support set can
be denoted as
S=¯
L¯
U=(x1,y
1),...,xn¯
l,y
n¯
lxn¯
l+1,...,x
n¯
l+n¯u
(1)
where ¯
Lis the labeled data and ¯
Uis the unlabeled data. n¯
l
and n¯urepresent the number of labeled and unlabeled samples,
respectively. The input is defined as
X=[x1,x
2,...,x
n]Rn×3×DI(2)
where DI=84 ×84, n=n¯
l+n¯u+n¯qis the number of
all samples, and n¯qdenotes the number of samples in Q.All
query samples are classified by the FSL-based models, and the
prediction of a query sample can be formulated as
yi=argmax{pϕ(y=c|xiQ)}N
c=1(3)
where pϕ(y=c|xiQ)represents the probability that the
query sample xiis predicted as class c.
B. Few-Shot Learning (FSL)
FSL aims to learn a model to recognize unseen samples
through training with limited labeled examples [19]. Many
researchers use the episodic paradigm from meta-learning to
solve FSL problems [20]. In an episode (a subtask), a support
set Sincludes Nclasses with Ksamples per class, and a query
set Qcontains ˜
Tsamples. The goal of a subtask is to classify the
query samples using the support samples. Such a classification
problem is defined as an N-way K-shot problem. Fig. 1 shows an
example of the 5-way 1-shot FSL problem. We construct 15 000
training subtasks and 15 000 test subtasks. In each subtask, the
support set contains five different classes with one image per
class, and the query set contains one unlabeled image to be
classified. The overall classification accuracy was calculated by
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FEW-SHOT LEARNING FOR FAULT DIAGNOSIS WITH A DUAL GRAPH NEURAL NETWORK 1561
Fig. 2. Overall framework of DGNNet. The unlabeled samples are only used in semisupervised FSL for fault diagnosis.
the percentage of subtasks that the unlabeled sample is correctly
classified.
In this article, we discuss both the supervised and semisuper-
vised FSL for fault diagnosis. In the supervised cases, all support
samples are labeled, whereas the support set in the semisuper-
vised cases contains both labeled and unlabeled samples. Note
that query samples in both cases are unlabeled.
C. Graph Neural Networks (GNNs)
GNNs are DL methods for solving tasks on graph-structured
data [21]. Recently, GNNs have been widely applied in semisu-
pervised learning or FSL [22], [23]. GNNs typically include
two processes, i.e., the node update and edge update. For an
undirected graph G=(V,)with the node set Vand edge set E,
its adjacency matrix ARn×nis defined as
Aij =1 if nodes (vi,v
j)connected
0 if nodes (vi,v
j)not connected .(4)
The updates of the node viand edge ekare formulated as
ek=ðev
i,e
ke
k=ςev(E)(5)
vi=ðve
k,v
iv
i=ςve(V)(6)
where E={e
k}k=1:Neand V={v
i}i=1:Nvrepresent the set
of edges and nodes (of cardinality Neand Nv). ekand viis the
attribute of an edge and a node. ðeis mapped across edges to
calculate the per-edge update and ðvis mapped across nodes to
calculate the per-node update. ςfunctions reduce a set to a single
element.
III. DUAL GRAPH NEURAL NETWORK (DGNNET)
In this section, we first introduce the proposed DGNNet
framework, and then briefly describe the feature learning. Then,
we elaborate on the instance-level and distribution-level relation
learning followed by the optimization of DGNNet.
A. Framework
The DGNNet framework is shown in Fig. 2. It contains
four parts, the feature learning module (residual network), the
instance-level relation learning module (instance graph), the
distribution-level relation learning module (distribution graph),
and the synergistic optimization. The residual network extracts
feature vectors from transformed images. The instance graph
learns the instance feature of all samples and the instance-level
relation between samples. The distribution graph is used to learn
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
1562 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023
the distribution feature and the distribution-level relation. The
synergistic optimization integrates the loss of the instance and
distribution similarities and optimizes DGNNet. In the end,
the last generation of the instance graph implements the fault
diagnosis.
B. Feature Learning
The residual network comprises several residual blocks (Res-
blocks), and fully connected (FC) layers, to extract embedded
features from transformed images. Each Resblock comprises
four 2-D convolutional (Conv2D) layers, four batch normal-
ization (BN) layers, one activation function ζLReLU, and one
max-pooling operation. In each Resblock, BN layers are used to
restrain internal covariate shift and smooth the loss surface [24].
The principle of BN can be formulated as
fBN =γi
xiμ(X)
σ(X)2+ε
+βi(7)
where xirepresents the ith sample, and Xrepresents all input
samples. γiand βiare the learnable parameters, initialized to 1
and 0, respectively. ε=1×105is the hyperparameter.
The mean μ(xi)and variance σ(xi)of mini-batch in (7) are
denoted as
μ(X)= 1
nB
nB
i=1
xi(8)
σ(X)=
1
nBnB
i=1(xiμ(X))2(9)
where nBis the size of a mini-batch, set to 50 in this article.
The feature vector of the sample xiis obtained through the
residual network and calculated as
vfi=fresnet (xi)(10)
where vfiRm, and fresnet is a residual network.
C. Instance-Level Relation Learning
First, we concatenate the feature vector of each sample and
one-hot encoding of its label into a merged vector (i.e., instance
feature). Here, for an unlabeled sample, all elements in its one-
hot encoding are zero. We use this merged vector as the initial
instance feature of a sample, denoted as
vins
0,i=(vfi,(yi)) (11)
where vins
0,iRm,is the concatenation operator, and (yi)is
a one-hot encoding of the label yiof a sample.
The instance similarity between any two samples (i.e.,
instance-level relation) is calculated by instance features and
formulated as
eins
0,ij =fGins
0vins
0,ivins
0,j
2(12)
where eins
0,ij R, and fGins :RmRis an encoding network
to convert instance similarity into a certain scalar. As shown in
Fig. 3(a), the encoding network contains two Conv layers, two
BN layers, two LReLU activation layers, and a sigmoid layer.
Fig. 3. Network details about the instance graph and distribution
graph. (a) Encoding Network. (b) MLPd2i.(c)MLPi2d.
Then, we construct a fully connected instance graph where
any one node connects to the other nodes. Each sample is
regarded as a node with an instance feature, and the instance
similarity between two samples is regarded as the weight of
the edge between them. This instance graph is defined as
Gins
l=(Vins
l,Eins
l), where Vins
l={vins
l, i}and Eins
l={eins
l, ij }.
vins
l, i Rmis the instance feature of the node iat the lth gener-
ation, and eins
l, ij Ris the instance similarity of the edge (i, j)
at the lth generation. The node and edge features are updated,
respectively, by the following formulas:
vins
l, i =fMLPd2i
vins
l1,i
,
T
j=1edis
l, ij ·vins
l1,j
(13)
eins
l, ij =fGins
lvins
l1,ivins
l1,j
2·eins
l1,ij (14)
where edis
l, ij is the distribution similarity of samples iand
jin the distribution graph (see details in the next section).
fMLPd2i:(Rm,Rm)Rmis a multilayer perceptron (MLP)
mapping the distribution similarity and instance feature into
a new instance feature of the node i.MLPd2i,asshownin
Fig. 3(b), contains two Conv layers, two BN layers, and two
LReLU activation layers. The instance similarity in the final
instance graph contributes to the N-way K-shot fault diagnosis,
and the probability distribution for xibeing class cis denoted
as
Pϕ(yi=c|xiQ)=ζs
NK
j=1
eins
lf,ij ·(yj)
(15)
where eins
lf,ij represents the instance similarity in the instance
graph at the final generation, and ζsis the softmax function.
D. Distribution-Level Relation Learning
For l=0, the distribution feature and distribution similarity
(i.e., distribution-level relation) are initialized as
vdis
0,i=NK
j=1δ(yi,y
j)if labeled
1
NK ,..., 1
NK if unlabeled (16)
edis
0,ij =fGdis
0vdis
0,ivdis
0,j
2(17)
where vdis
0,iRNK,edis
0,ij R, and fGdis :RNK Ris an
encoding network to convert distribution similarity into a certain
scalar. δis the Kronecker function which is 1 if the variables
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FEW-SHOT LEARNING FOR FAULT DIAGNOSIS WITH A DUAL GRAPH NEURAL NETWORK 1563
are equal, and 0 otherwise. The encoding network is shown in
Fig. 3(a).
Then, we build a full connected distribution graph Gdis
l=
(Vdis
l,Edis
l)to learn the distribution features and distribution-
level relations, where Vdis
l={vdis
l, i},Edis
l={edis
l, ij }.vdis
l, i
RNK is the distribution feature of the node iat the lth generation,
and edis
l, ij Ris the distribution similarity of the edge (i, j)atthe
lth generation. The representation of node iin the distribution
graph is obtained as
vdis
l, i =fMLPi2dvdis
l1,i
,NK
j=1eins
l, ij (18)
where fMLPi2d:(RNK,RNK)RNK is an MLP to map the
instance similarity and distribution feature into a new distribu-
tion feature of node i.TheMLPi2dis shown in Fig. 3(c).The
weight of the edge (i, j)in the distribution graph is calculated
at the lth generation as
edis
l, ij =fGdis
lvdis
l, i vdis
l, j 2·edis
l1,ij
.(19)
Except for l=0,the instance and distribution features are
learned by DGNNet with an alternate update strategy, as shown
in Fig. 2. In particular, a complete update of the lth generation
is: Eins
l
MLPi2d
Vdis
l
Gdis
l
Edis
l
MLPd2i
Vins
l
Gins
l
Eins
l+1.
E. Synergistic Optimization
The class prediction of the concerned sample xiQin N-way
K-shot fault classification is reformulated as
yi=argmax
ζs
NK
j=1
eins
lf,ij ·(yj)
N
c=1
.(20)
The loss of GNN for instance-level relation learning at the lth
generation is defined as
Lins
l=Lce
ζs
NK
j=1
eins
lf,ij ·(yj)
,yi
(21)
where Lce stands for the cross-entropy loss function, and ζsis
the softmax activation function. The loss for distribution-level
relation learning at the lth generation is defined as
Ldis
l=Lce
ζs
NK
j=1
edis
l, ij ·(yj)
,y
i
.(22)
Then, the loss function of the proposed DGNNet consists of
the instance-level loss and distribution-level loss defined as
L=
lf
l=1λinsLins
l+ξdisLdis
l(23)
where lfdenotes the number of generations, and λins and ξdis
are the weight parameters set to 1.0 and 0.1 in this article.
We use Ranger21 [25], a recently proposed optimizer, to opti-
mize the proposed DGNNet. Ranger21 integrates AdamW [26],
Lookahead [27], and gradient centralization [28]. It can reduce
the variance of the training loss and improve the convergence
performance.
Fig. 4. Results of supervised 5-way case on the CWRU dataset.
WDCNNindicates that WDCNN is trained in FSL.
IV. CASE STUDIES
To verify the capability of DGNNet for FSL-based fault diag-
nosis, we conduct case studies on various real bearing datasets.
In this section, the comparative experiments contain four re-
lated methods: wide deep CNN (WDCNN) [29], FSL-based
fault diagnosis (FLFD) [16], GNN [18], and self-supervised
joint learning (SSJL) method [30]. GNN is used as a baseline
model comparison, and the components are consistent with our
DGNNet except for the absence of the distribution graph. We
implement and evaluate them with two benchmark datasets from
the Case Western Reserve University (CWRU) and the Machin-
ery Failure Prevention Technology (MFPT). Furthermore, we
carry out a real-world industrial case to verify the generalization
performance of DGNNet. We train all models on a 3090Ti GPU
and report the average result of 20 experiments in all case studies.
A. Same-Load Fault Diagnosis
1) Description of CWRU Dataset: The CWRU dataset has
been widely used to verify intelligent fault diagnosis methods.
The vibration data are collected by the accelerometer at the
driver end with 12 kHz sampling frequency, which consists of
four bearing health conditions: normal state (NS), inner race
failure (IF), outer race failure (OF), and ball element failure
(BF). Each fault category is manufactured to the driver end
bearings with diameters of 0.18, 0.36, 0.53, and 0.71 mm. Thus,
the bearing vibration data without motor loads can be divided
into 12 fault categories. Finally, the meta dataset is constructed of
3840 samples, which contains 100 training samples, 200 testing
samples, and 20 validation samples per class.
2) Supervised Fault Diagnosis: We verify the performance
of DGNNet with comparison experiments by randomly selecting
6, 10, 20, 30, 60, and 100 samples per class. It took us almost 6
h to run the 5-way 1-shot experiment and 12 h to run the 5-way
5-shot experiment in 20 epochs.
Fig. 4 shows the accuracy of all methods in supervised fault
diagnosis tasks. It is clear that DGNNet maintains high perfor-
mance even with six samples per class and outperforms other
methods either in 1- or 5-shot experiments. As can be seen
from Table I, when six samples are provided for each class,
the fully supervised case results in a 1-shot accuracy of 99.55
and a 5-shot accuracy of 99.81. Under the same 5-way 1-shot
scenario, the accuracy is almost 11% higher than that of GNN
and 20% higher than that of FLFD and WDCNN, demonstrating
that the CNN-based method only learns partial fault features.
Notably, GNN shows greater identification ability than WDCNN
and FLFD, which proves that GNN outperforms the CNN-based
methods through learning pairwise relations between any two
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
1564 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023
TABLE I
SUPERVISED FAULT DIAGNOSIS RESULTS ON CWRU DATASET
TABLE II
SEMISUPERVISED FAULT DIAGNOSIS RESULTS ON CWRU DATASET
Fig. 5. T-SNE visualization for semisupervised 5-shot classification.
(a) WDCNN. (b) SSJL. (c) GNN. (d) DGNNet.
samples. DGNNet can identify all faults precisely with 100.00%
accuracy under the 5-shot case, but GNN and FLFD fail to do
this. The 5-shot training paradigm generally performs better than
the 1-shot, which could be interpreted that more support samples
help models find the global convergence direction.
3) Semisupervised Fault Diagnosis: To further evaluate the
effectiveness of DGNNet for semisupervised fault diagnosis,
we train DGNNet on support sets with different labeled and
unlabeled samples. We trained DGNNet in 25 epochs for 7
and 13 h to perform 5-way 1- and 5-shot classification tasks,
respectively.
The results of the semisupervised experiments are shown in
Table II. All semisupervised methods are trained in 25 epochs.
The number of unlabeled samples is consistent with the number
of labeled samples in each support set, i.e., n¯u∈{1,5}for
each class, and this selection method has proved its reliability
in subsequent experiments. It can be seen that DGNNet outper-
forms baselines in both 1- and 5-shot cases. Fig. 5 shows the 2-D
t-SNE visualization of extracted high-dimensional features for
each method. Remarkably, DGNNet cluster samples of the same
Fig. 6. Effectiveness of various unlabeled sample numbers on 1-shot
(left) and 5-shot (right) classification.
classes together and keep samples with different categories as
separate as possible.
By comparing the results of the supervised (see Table I) and
semisupervised (see Table II) experiments, it can be seen that
DGNNet achieves better classification using unlabeled data.
Although this improvement is small, it is worth noting that
models are hard to converge when very few labeled samples
are given in the supervised scenario. DGNNet still achieves
high classification accuracy when combining only three labeled
samples per class with unlabeled samples.
As demonstrated in the comparison experiments, unlabeled
samples are utilized in training to improve the fault diagnosis
performance. As shown in Fig. 6, the accuracy of DGNNet
improves when a proper number of unlabeled samples is given.
However, the performance of DGNNet drops dramatically when
the number of unlabeled samples exceeds that of labeled sam-
ples. This indicates that the best classification occurs when
the number of unlabeled samples approximates that of labeled
samples in each subtask, i.e., n¯u=K. In this article, we use
the consistent setting n¯u∈{1,5}for all FSL experiments.
B. Multiload Fault Diagnosis
1) Description of MFPT Dataset: The vibration signals from
the CWRU are all collected at the same motor speed, 1797 rpm.
Thus, the classification models could perform well for FSL based
fault diagnosis. To evaluate the validity of DGNNet on multiload
scenarios, we implement the fault classification experiments on
MFPT dataset, which contains three sets of multiload bearing
signals. The normal (N) baseline set is constructed at a load
of 270 lbs with a sampling rate of 97 656 sample per second
(SPS) for 6 seconds. Meanwhile, the fault signals of inner race
(IR) and outer race (OR) are obtained from the bearing test rig
at a sampling rate of 48828 SPS for 3 seconds, which works
under six load conditions: 50, 100, 150, 200, 250, 300 lbs.
We constructed the meta dataset for MFPT utilizing the signal
preprocessing method mentioned in Section II.A, and each load
condition contains 100 training samples, 200 testing samples,
and 20 samples for verification. It took us 4 hours to run the
3-way 1-shot experiment and 8 hours to run the 3-way 5-shot
experiment in 20 epochs.
2) Supervised Fault Diagnosis: In the MFPT dataset, each
fault class comprises the same number of samples under six
different loads. That is, {1,2,3}samples are selected from
different load data, and each class contains {6,12,18}samples.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FEW-SHOT LEARNING FOR FAULT DIAGNOSIS WITH A DUAL GRAPH NEURAL NETWORK 1565
TABLE III
SUPERVISED FAULT DIAGNOSIS ON MFPT DATASET
Fig. 7. t-SNE visualization for supervised 5-shot classification.
(a) WDCNN. (b) GNN. (c) DGNNet.
TABLE IV
SEMISUPERVISED FAULT DIAGNOSIS RESULTS ON MFPT DATASET
Here, we conduct experiments with various numbers of training
samples, where the number of samples with different motor
speeds remains exactly the same.
As shown in Table III, when there are 12 or 18 training samples
in each class for few-shot classification, DGNNet can identify
all faults accurately. The accuracy of DGNNet can reach 100%,
which is 13.38% higher than that of GNN and 25.64% higher
than that of WDCNN. In order to evaluate the performance of
various comparison methods with multiload samples, the ex-
tracted fault features are visualized by t-SNE in Fig. 7. WDCNN
can identify the vibration signals in the normal state but can
hardly separate the fault samples under multiple loads. DGNNet
can learn better relations between samples from limited samples,
and accurately identifies all faults of bearings under multiload
conditions. DGNNet makes fault features as close as possible in
each class and as separate as possible between different classes.
3) Semisupervised Fault Diagnosis: To further evaluate the
semisupervised capability of DGNNet, we carry out a series of
multiload experiments with 20 epochs. The number of unlabeled
samples is the same as that of labeled samples for each class,
i.e., K.AsshowninTable IV, under the same experimental
setting, DGNNet outperforms baseline methods. DGNNet can
classify all faults accurately even with one labeled sample per
load condition, that is, there are six labeled samples and one
unlabeled sample for each class in the 1-shot experiment. Note
that GNN uses an appropriate number of unlabeled samples to
improve the accuracy by almost 5%, and DGNNet constructs the
distribution graph to propagate labels and makes better use of
unlabeled data than GNN. Thus, it can be inferred that DGNNet
Fig. 8. Illustration of the OCB dataset.
TABLE V
DESCRIPTION OF OCB DATASET
TABLE VI
SUPERVISED FAULT DIAGNOSIS RESULTS ON OCB DATASET
has greater potential in tackling semisupervised FSL problems
for multiload fault diagnosis.
C. Industrial Scenario Fault Diagnosis
1) Description of OCB Dataset: Most existing methods per-
form well on the vibration data collected from the test rig and
are rarely applied to real-world data. In this section, compari-
son methods are evaluated on a real-world oxygen compressor
bearing (OCB) dataset from a smart factory. As shown in Fig. 8,
the bearing data is provided by the largest copper smelter in the
world, measured by an accelerometer on the oxygen compressor,
and composed of three bearing health conditions: normal con-
dition (NC), inner race fault (IRF), and outer race fault (ORF).
The bearing signals are collected by the accelerometer every five
seconds and stored in the database, from which we took six days
of historical data to construct the dataset. The description of the
OCB dataset is shown in Table V.
2) Supervised Fault Diagnosis: To analyze the performance
of all methods in the real-world scenario, we carry out a 3-
way fault diagnosis experiment on OCB as an example. Train-
ing DGNNet on OCB took us almost 4 hours to run the 1-
shot experiment and 8 hours to run the 5-shot experiment in
20 epochs. Table VI shows the performance of all methods
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
1566 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023
TABLE VII
SEMISUPERVISED FAULT DIAGNOSIS RESULTS ON OCB DATASET
TABLE VIII
DESCRIPTION FOR THREE CASES IN ABLATION STUDIES
in the supervised scenario. DGNNet achieves desirable perfor-
mance for fault diagnosis in a real-world scenario. With only six
samples available, DGNNet can obtain an excellent accuracy
of 99.48% in the 1-shot experiment, which is 2.89% higher
than that of GNN and 35.56% higher than that of WDCNN.
The classification result of DGNNet can achieve an accuracy of
100.00% with only six samples per class. In a 1-shot experiment,
DGNNet can effectively distinguish different faults, whereas
GNN and WDCNN could only identify normal samples well
but easily confuse faults.
3) Semisupervised Fault Diagnosis: In real-world scenarios,
we further verify the performance of DGNNet. The semisu-
pervised experiments are carried out in 20 epochs. DGNNet
can further improve diagnostic ability when a few unlabeled
samples are provided. As shown in Table VII, the baseline
methods can only obtain excellent diagnostic results when 20
samples are used for training. In 1-shot experiments, DGNNet
obtains an excellent accuracy of 100.00%, which is much higher
than other baseline methods. DGNNet shows superior diagnostic
performance in all semisupervised experiments. It can be seen
that DGNNet has great potential in industrial applications of
fault diagnosis, where unlabeled data from rotating machinery
is easier to collect.
V. A BLATION STUDIES AND DISCUSSIONS
In this section, we use three cases of ablation studies to explore
the effect of the size of Nand the generation numbers of DGNNet
on the diagnostic results. The three cases are: (a) same-load case
on CWRU; (b) multiload case on MFPT; (c) industrial case on
OCB. The detailed description is presented in Table VIII. Finally,
the convergence of DGNNet on the CWRU is analyzed, which
is similar to the other cases.
A. N-Way Fault Diagnosis
The effect of the number of ways is further explored for N-
way fault diagnosis in various scenarios. As shown in Fig. 9,
DGNNet obtains an accuracy of 99.83% in 2-way 1-shot fault
diagnosis on the CWRU dataset, which is 0.33% higher than in
5-way classification. In the 2-way industrial scenario (case c),
DGNNet has a more attractive diagnostic performance. Note that
Fig. 9. Influence of Nin N-way 1-shot classification (25 epochs).
Fig. 10. Generation numbers in DGNNet on three cases (25 epochs).
Fig. 11. Evolution of the test accuracy under different optimizers.
we carried out more difficult experiments in Section IV, allowing
us better to evaluate the semisupervised learning effectiveness
of DGNNet.
B. Generation Numbers
For datasets of various scales and complexities, the number
of alternate updating generations determines the test accuracy
and convergence speed of DGNNet. As shown in Fig. 10,the
test accuracy is greatly improved with the generation number
increasing from 0 to 2, and it fluctuates within a small range
when the generation number is between 2 and 8. The larger
the generation number, the longer the convergence time. In
this article, we set the generation number as 4 for the tradeoff
between test accuracy and convergence speed.
C. Ranger21 Optimizer
A recently proposed optimizer, Ranger21, is utilized in our
study. Fig. 11 shows the evolution of the test accuracy under
different optimizers in 60 epochs. The default learning rate of
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: FEW-SHOT LEARNING FOR FAULT DIAGNOSIS WITH A DUAL GRAPH NEURAL NETWORK 1567
Adam and AdamW is 3e3. Expect for Ranger21, in the early
two epochs, Ranger21 undergoes the linear warm-up process,
and then the learning rate is set to 3e3. The linear warm-down
starts at 40 epochs, and the learning rate will decay until 3e5.
It can be seen that Ranger21 and AdamW have better perfor-
mance than Adam at the early stage. After 20 epochs, Ranger21
consistently outperforms AdamW. Significantly, Ranger21 can
accelerate the model learning and obtain a high accuracy without
compromising generalization.
VI. CONCLUSION AND FUTURE WORK
In this article, we proposed an FSL method for fault diag-
nosis, namely DGNNet. DGNNet uses an instance graph and
a distribution graph to learn the pairwise relation between two
samples and high-order relations between all samples, respec-
tively. The learned multilevel relations help DGNNet to classify
unseen samples. In particular, the distribution graph propagates
universal label information from a few labeled samples to un-
labeled samples, enabling DGNNet to address semisupervised
problems. Extensive experiments were implemented to evaluate
the performance of DGNNet for supervised and semisupervised
fault diagnosis. The results show that DGNNet achieves excel-
lent effectiveness in fault classification and respectable general-
ization performance in various scenarios. In the future, we will
try to apply DGNNet to tackle other data scarcity problems and
label new samples in real-world applications. In addition, we
will investigate the diagnosis of unseen fault types using FSL.
REFERENCES
[1] H. Shao, M. Xia, G. Han, Y. Zhang, and J. Wan, “Intelligent fault diagnosis
of rotor-bearing system under varying working conditions with modified
transfer convolutional neural network and thermal images, IEEE Trans.
Ind. Inform., vol. 17, no. 5, pp. 3488–3496, May 2021.
[2] G. Xu, M. Liu, Z. Jiang, W. Shen, and C. Huang, “Online fault diagnosis
method based on transfer convolutional neural networks, IEEE Trans.
Instrum. Meas., vol. 69, no. 2, pp. 509–520, Feb. 2020.
[3] G. Liu, W. Shen, L. Gao, and A. Kusiak, “Predictive modeling with an
adaptive unsupervised broad transfer algorithm, IEEE Trans. Instrum.
Meas., vol. 70, 2021, Art. no. 3520212.
[4] S. Xing, Y. Lei, S. Wang, and F. Jia, “Distribution-invariant deep belief
network for intelligent fault diagnosis of machines under new working
conditions,” IEEE Trans. Ind. Electron., vol. 68, no. 3, pp. 2617–2625,
Mar. 2021.
[5] C. Sun, M. Ma, Z. Zhao, S. Tian, R. Yan, and X. Chen, “Deep transfer
learning based on sparse autoencoder for remaining useful life prediction
of tool in manufacturing,” IEEE Trans. Ind. Inform., vol. 15, no. 4,
pp. 2416–2425, Apr. 2019.
[6] W. Song, W. Shen, L. Gao, and X. Li, “An early fault detection method
of rotating machines based on unsupervised sequence segmentation con-
volutional neural network, IEEE Trans. Instrum. Meas., vol. 71, 2022,
Art. no. 3504712.
[7] G. W. Xu, M. Liu, Z. F. Jiang, D. Soffker, and W. M. Shen, “Bearing fault
diagnosis method based on deep convolutional neural network and random
forest ensemble learning,”Sensors, vol. 19, no. 5, Mar. 2019, Art. no. 1088.
[8] C.Zhao, G. K. Liu, and W. M. Shen, “A dual-view alignment-based domain
adaptation network for fault diagnosis,”Meas. Sci. Technol., vol.32, no. 11,
Nov. 2021, Art. no. 115102.
[9] D. Han, X. Guo, and P. Shi, “An intelligent fault diagnosis method of
variable condition gearbox based on improved DBN combined with WPEE
and MPE,” IEEE Access, vol. 8, pp. 131299–131309, 2020.
[10] Z. Ren, W. Zhang, and Z. Zhang, “A deep nonnegative matrix factorization
approach via autoencoder for nonlinear fault detection,” IEEE Trans. Ind.
Inform., vol. 16, no. 8, pp. 5042–5052, Aug. 2020.
[11] S. Kiranyaz, A. Gastli, L. Ben-Brahim, N. Al-Emadi, and M. Gabbouj,
“Real-time fault detection and identification for MMC using 1-D con-
volutional neural networks, IEEE Trans. Ind. Electron., vol. 66, no. 11,
pp. 8760–8771, Nov. 2019.
[12] C.Zhao, G. K. Liu, W. M. Shen, and L. Gao, “A multi-representation-based
domain adaptation network for fault diagnosis, Measurement, vol. 182,
Sep. 2021, Art. no. 109650.
[13] T. C. Zhang et al., “Intelligent fault diagnosis of machines with small &
imbalanced data: A state-of-the-art review and possible extensions, ISA
Trans, vol. 119, pp. 152–171, Jan. 2021.
[14] L. Feng and C. Zhao, “Fault description based attribute transfer for zero-
sample industrial fault diagnosis,” IEEE Trans. Ind. Inform., vol. 17, no. 3,
pp. 1852–1862, Mar. 2021.
[15] Y. Hu, R. Liu, X. Li, D. Chen, and Q. Hu, “Task-sequencing meta learning
for intelligent few-shot fault diagnosis with limited data, IEEE Trans. Ind.
Inform., vol. 18, no. 6, pp. 3894–3904, Jun. 2022.
[16] A. Zhang, S. Li, Y. Cui, W. Yang, R. Dong, and J. Hu, “Limited data
rolling bearing fault diagnosis with few-shot learning, IEEE Access,
vol. 7, pp. 110895–110904, 2019.
[17] Y. Feng, J. Chen, T. Zhang, S. He, E. Xu, and Z. Zhou, “Semi-supervised
meta-learning networks with squeeze-and-excitation attention for few-shot
fault diagnosis,” ISA Trans., vol. 120, pp. 383–401, Jan. 2022.
[18] V. G. Satorras and J. Bruna, “Few-shot learning with graph neural net-
works,” Proc. Int. Conf. Learn. Representations, 2018, pp. 1–13.
[19] Y. Xiao, Y. Jin, and K. Hao, “Adaptive prototypical networks with la-
bel words and joint representation learning for few-shot relation clas-
sification,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–12, 2021,
doi: 10.1109/TNNLS.2021.3105377.
[20] T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-learning
in neural networks: A survey, IEEE Trans. Pattern Anal. Mach. Intell.,
vol. 44, no. 9, pp. 5149–5169, Sep. 2022.
[21] R. Zhou, X. Chang, L. Shi, Y.-D. Shen, Y. Yang, and F. Nie, “Person
reidentification via multi-feature fusion with adaptive graph learning,
IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 5, pp. 1592–1601,
May 2020.
[22] M. Luo, X. Chang, L. Nie, Y. Yang, A. G. Hauptmann, and Q. Zheng, “An
adaptive semisupervised feature analysis for video semantic recognition,
IEEE Trans. Cybern., vol. 48, no. 2, pp. 648–660, Feb. 2018.
[23] Z. Li, F. Nie, X. Chang, Y. Yang, C. Zhang, and N. Sebe, “Dynamic
affinity graph construction for spectral clustering using multiple features,
IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 12, pp. 6323–6332,
Dec. 2018.
[24] S. Santurkar, D. Tsipras, A. Ilyas, and A. Madry, “How does batch
normalization help optimization?,” Adv. Neural Inform. Process. Syst.,
vol. 31, pp. 2488–2498, 2018.
[25] L. Wright and N. Demeure, “Ranger21: A synergistic deep learning
optimizer, 2021, arXiv:2106.13731.
[26] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in
Proc. Int. Conf. Learn. Representations, 2019, pp. 1–18.
[27] M. R. Zhang, J. Lucas, G. Hinton, and J. Ba, “Lookahead optimizer: k
steps forward, 1 step back,” Adv. Neural Inform. Process. Syst., vol. 32,
pp. 9593–9604, 2019.
[28] W. Fuhl and E. Kasneci, “Weight and gradient centralization in deep neural
networks,” in Artificial Neural Networks and Machine Learning (Lecture
Notes in Computer Science), vol. 12894. Cham, Switzerland: Springer,
2021, pp. 227–239.
[29] W. Zhang, G. L. Peng, C. H. Li, Y. H. Chen, and Z. J. Zhang, “A
new deep learning model for fault diagnosis with good anti-noise and
domain adaptation ability on raw vibration signals,”Sensors, vol. 17, no. 2,
Feb. 2017, Art. no. 425.
[30] W. W. Zhang, D. J. Chen, and Y. Kong, “Self-supervised joint learning
fault diagnosis method based on three-channel vibration images,” Sensors,
vol. 21, no. 14, Jul. 2021, Art. no. 4774.
Han Wang received the B.E. degree in automa-
tion in 2019 from Tongji University, Shanghai,
China, where he is currently working toward the
Ph.D. degree in control science and engineering
with the College of Electronics and Information
Engineering.
His research interests include fault diagnosis,
few-shot learning, and graph deep learning.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
1568 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023
Jingwei Wang received the B.E. degree in con-
trol science and engineering from Shandong
University, Jinan, China, in 2016. He is currently
working toward the Ph.D. degree in control sci-
ence and engineering with the College of Elec-
tronics and Information Engineering, Tongji Uni-
versity, Shanghai, China, in 2022.
His research interests include data mining,
network science, and machine learning.
Yukai Zhao received the B.E. degree in au-
tomation from Nanjing Institute of Technology,
Nanjing, China, in 2019. He is currently work-
ing toward the Ph.D. degree in control science
and engineering with the College of Electronics
and Information Engineering, Tongji University,
Shanghai, China.
His research interests include action recogni-
tion, graph data mining, and graph deep learn-
ing.
Qing Liu received the B.E. degree in automa-
tion from Changshu Institute of Technology,
Changshu, China, in 2014, and the M.E. degree
in control science and engineering in 2017 from
Tongji University, Shanghai, China, where he is
currently working toward the Ph.D. degree in
control science and engineering with the Col-
lege of Electronics and Information Engineering,
Tongji University, Shanghai, China.
His research interests include machine learn-
ing, deep learning, and their applications in in-
telligent manufacturing.
Min Liu received the B.E. degree in mechan-
ical engineering from the China University of
Geosciences, Wuhan, China, in 1993, and the
M.E. degree in mechanics and the Ph.D. degree
in mechanical engineering and automation from
Zhejiang University, Hangzhou, China, in 1996
and 1999, respectively.
He is currently a Professor with the College of
Electronics and Information Engineering, Tongji
University, Shanghai, China. He has been work-
ing on computer science and system engineer-
ing and collaborative MRO and intelligent manufacturing for about 14
years. He authored or coauthored more than 100 papers in scientific
journals and international conferences in related areas. His research
interests include deep learning, fault diagnosis and prediction, and intel-
ligent maintenance.
Weiming Shen (Fellow, IEEE) received the
B.E. and M.S. degrees in mechanical engineer-
ing from Northern Jiaotong University, Beijing,
China, in 1983 and 1986, respectively, and the
Ph.D. degree in system control from the Univer-
sity of Technology of Compiegne, Compiegne,
France, in 1996.
He is currently a Professor with the Huazhong
University of Science and Technology (HUST),
Wuhan, China, and an Adjunct Professor with
the University of Western Ontario, London, ON,
Canada. Before joining HUST in 2019, he was a Principal Research
Officer at the National Research Council Canada. His work has been
cited more than 16 000 times with an h-index of 61. He authored
or coauthored several books and more than 560 articles in scientific
journals and international conferences in related areas. His research
interests include agent-based collaboration technologies and applica-
tions, collaborative intelligent manufacturing, the Internet of Things, and
Big Data analytics.
Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.
... GNNs are specialized for graph-structured data, adept at representing intricate relationships among nodes and edges [41]. These networks utilize information propagation to derive node representations by incorporating both the node features and its surrounding context. ...
Article
Full-text available
Human-Cyber-Physical Systems (HCPS), as an emerging paradigm centered around humans, provide a promising direction for the advancement of various domains, such as intelligent manufacturing and aerospace. In contrast to Cyber-Physical Systems (CPS), the development of HCPS emphasizes the expansion of human capabilities. Humans no longer solely function as operators or agents working in collaboration with computers and machines but extend their roles to include system design and innovation management. This paper proposes a Multisensory Interaction Framework for HCPS (MS-HCPS) that leverages human senses to facilitate system creation and management. Additionally, the introduced Multisensory Graph Convolutional Network (MS-GCN) model calculates recommendation values for multiple senses, elucidating their relevance to system development. Furthermore, the effectiveness of the proposed framework and model is validated through three practical engineering scenarios. This study explores the research on multisensory interaction in HCPS from a human sensory perspective, aiming to facilitate the progress and development of HCPS across various domains.
... GNNs are specialized for graph-structured data, adept at representing intricate relationships among nodes and edges [41]. These networks utilize information propagation to derive node representations by incorporating both the node features and its surrounding context. ...
Article
Full-text available
Human-Cyber-Physical Systems (HCPS), as an emerging paradigm centered around humans, provide a promising direction for the advancement of various domains, such as intelligent manufacturing and aerospace. In contrast to Cyber-Physical Systems (CPS), the development of HCPS emphasizes the expansion of human capabilities. Humans no longer solely function as operators or agents working in collaboration with computers and machines but extend their roles to include system design and innovation management. This paper proposes a Multisensory Interaction Framework for HCPS (MS-HCPS) that leverages human senses to facilitate system creation and management. Additionally, the introduced Multisensory Graph Convolutional Network (MS-GCN) model calculates recommendation values for multiple senses, elucidating their relevance to system development. Furthermore, the effectiveness of the proposed framework and model is validated through three practical engineering scenarios. This study explores the research on multisensory interaction in HCPS from a human sensory perspective, aiming to facilitate the progress and development of HCPS across various domains.
... Lee et al. [47] proposed generating adversarial networks to build high-accuracy models in the case of data imbalance. Recently, Wang et al. [48] used the concepts of dual graph neural networks and transfer learning to establish a fault detection model suitable for intelligent manufacturing systems. ...
Article
Full-text available
In the era of Industry 3.0, product fault detection systems became important auxiliary systems for factories. These systems efficiently monitor product quality, and as such, substantial amounts of capital were invested in their development. However, with the arrival of Industry 4.0, high-volume low-mix production modes are gradually being replaced by low-volume high-mix production modes, reducing the applicability of existing systems. The extent of investment has prompted factories to seek upgrades to tailor existing systems to suit new production modes. In this paper, we propose an approach to upgrading based on the concept of transfer learning. The key elements are (1) using a framework with a basic model and an add-on model rather than fine-tuning parameters and (2) designing a radial basis function deep neural network (RBF-DNN) to extract important features to construct the basic and add-on models. The effectiveness of the proposed approach is verified using real-world data from a spring factory.
Article
Full-text available
Few-sample modes are easy to appear when a new working condition is triggered in industrial processes especially during the early stages of the new working mode. However, monitoring the early behavior of a new mode is important because engineers and operators are less knowledgeable with such a new mode. Considering the few-sample challenge in this problem, a new multisource transfer learning framework is proposed that leverages historical data under various operating conditions to enrich process monitoring over new mode data. In contrast to existing transfer learning-related work, a new unsupervised domain adaptation framework is designed. The historical modes as the source provide precious knowledge and reference to the new mode so that the features of the new mode are robust to noise and insufficient samples. Mathematically, the historical features play the role of a regularizer for the feature learning in the target domain. A geometrical illustration is given and an iterative optimization algorithm is developed with the convergence analysis. Except for the features guided by historical modes, individual features of the new mode are also extracted from the residual part to form a complete monitoring framework. Finally, the effectiveness of the proposed method is validated through a numerical experiment and a real industrial hydrocracking process.
Article
The deep learning algorithms have become the general trend in transformer fault diagnos. The diagnostic accuracy is contingent upon the quantity of fault labeled samples. However, obtaining sufficient fault labeled samples remains a challenge and the collected samples in practice are unlabeled. Therefore, a semi-supervised fault diagnosis method is proposed for the transformer based on discriminative feature enhancement and adaptive weight adjustment. Firstly, the pseudo labels of unlabeled samples were generated through graph propagation and the quality of pseudo labels is crucial for fault diagnosis. Secondly, the discriminative feature enhancement was used to improve the quality of pseudo labels by optimizing diagnostic boundaries. Then, the weight of pseudo labels involved in training was adaptively adjusted using the truncated normal distribution function, based on the deviation between the confidence of pseudo labels and the mean of the normal distribution function. Finally, the proposed method was verified on the collected dataset. The experiment results demonstrated that the proposed method can guarantee the high quality and quantity of pseudo labels involved in training. The proposed method achieves a diagnostic accuracy of 94.1 % for transformer faults.
Article
Faults in underground cables are hard to be detected directly from external features such as sound and light due to the harsh operating environment. Moreover, the rising penetration of distributed generations further complicates the characteristics of cable faults. Facing the challenges, this article first aims to reveal cable fault features via analyzing the generation mechanism of the cable grounding wire current (GWC). The impact of photovoltaic on GWC is then discussed, and the essential differences between faulty and healthy cables are also studied. Finally, considering both the structure of distribution network and an optimal deployment of measurement points, a novel fault cable segment identification method based on the amplitude and polarity of the starting terminal GWC steady-state variation is proposed. Simulations based on a real low-resistance grounded distribution network are performed in diverse fault scenarios and show a near 100% accuracy for all cases. In addition, the proposed method demonstrated a time complexity of O (896), showing a significant reduction in calculation time compared to common methods. These findings suggest that the proposed method is highly reliable and efficient for identifying faulty cable segments in active distribution networks.
Article
Full-text available
The accuracy of bearing fault diagnosis is of great significance for the reliable operation of rotating machinery. In recent years, increasing attention has been paid to intelligent fault diagnosis techniques based on deep learning. However, most of these methods are based on supervised learning with a large amount of labeled data, which is a challenge for industrial applications. To reduce the dependence on labeled data, a self-supervised joint learning (SSJL) fault diagnosis method based on three-channel vibration images is proposed. The method combines self-supervised learning with supervised learning, makes full use of unlabeled data to learn fault features, and further improves the feature recognition rate by transforming the data into three-channel vibration images. The validity of the method was verified using two typical data sets from a motor bearing. Experimental results show that this method has higher diagnostic accuracy for small quantities of labeled data and is superior to the existing methods.
Article
Full-text available
Domain adaptation is a major area of interest in intelligent equipment maintenance and fault diagnosis in recent years. Traditional machine/deep-learning-based fault diagnosis methods assume that the source and target domains share the same distribution, which may fail and lead to catastrophic damages. Many domain adaptation-based fault diagnosis methods have been proposed to address the domain shift problem. However, most of them only align global domain distributions and ignore class relationships between domains, which leads to a decline in diagnostic performance. To overcome this deficiency, a dual-view alignment-based domain adaptation network (DVADAN) for fault diagnosis is proposed in this paper. Specifically, the proposed dual-view alignment, consisting of a global (marginal) alignment constructed with maximum mean discrepancy and a local (conditional) alignment calculating the class-centers by Wasserstein distance, is developed to reduce domain distribution discrepancy. Extensive experiments on two test rigs validated the effectiveness of the proposed DVADAN and showed its superiority over state-of-art fault diagnosis methods.
Article
Full-text available
The field of meta-learning, or learning-to-learn, has seen a dramatic rise in interest in recent years. Contrary to conventional approaches to AI where a given task is solved from scratch using a fixed learning algorithm, meta-learning aims to improve the learning algorithm itself, given the experience of multiple learning episodes. This paradigm provides an opportunity to tackle many of the conventional challenges of deep learning, including data and computation bottlenecks, as well as the fundamental issue of generalization. In this survey we describe the contemporary meta-learning landscape. We first discuss definitions of meta-learning and position it with respect to related fields, such as transfer learning, multi-task learning, and hyperparameter optimization. We then propose a new taxonomy that provides a more comprehensive breakdown of the space of meta-learning methods today. We survey promising applications and successes of meta-learning including few-shot learning, reinforcement learning and architecture search. Finally, we discuss outstanding challenges and promising areas for future research.
Article
Early fault detection (EFD) is vital for mechanical systems to reduce downtime and increase stability. The main challenge of EFD for rotating machines is to extract discriminative features from noisy signals to identify early faults. However, the lack of labels for the whole lifecycle data hinders the application of some powerful supervised deep learning methods in EFD. Besides, many EFD methods have to set a criterion manually, such as a threshold, to judge whether an early fault has occurred. To address these challenges, this paper proposes a novel early fault detection method based on Unsupervised Sequence Segmentation Convolutional Neural Network (USSCNN). At first, frequency domain features are extracted from raw signals and converted to 2D grey images. Then historical lifecycle data are labelled by USSCNN, so that a CNN classifier can be trained with these labelled data. The deep features of the historical data learned by the CNN classifier are utilized to train the Health Index (HI) Assessment Model. The proposed method is tested on three bearing datasets. The results shown that the proposed method can detect incipient faults earlier than the comparing methods with lower false alarms. And the HIs learned by the Health Index Assessment Model shown that the proposed method can extract discriminative features for EFD. More importantly, the proposed method can detect an early fault by the well-trained classifier which avoids manual criterion-making. Results of comparison demonstrated the effectiveness and the robustness of the proposed method.
Article
2018 Curran Associates Inc.All rights reserved. Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers' input distributions during training to reduce the so-called “internal covariate shift”. In this work, we demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training.
Article
Recently, deep learning-based intelligent fault diagnosis methods have been developed rapidly, which rely on massive data to train the diagnosis model. However, it is usually difficult to collect sufficient failure data in practical industrial production, thus limits the application of intelligent diagnosis methods. To address the few-shot fault diagnosis problem, a task-sequencing meta learning (TSML) method is proposed in this paper. Firstly, meta learning model is trained over a series of learning tasks to obtain knowledge about how to diagnosis. Thus, the learned knowledge can help adapt and generalize with a few examples when dealing with new tasks that have never been encountered. Then, considering the difference and connection between different failures and diagnosis tasks, a task-sequencing algorithm is proposed to sort meta training tasks from easy to difficult, which followed the way human acquire knowledge. After evaluating the difficulty of each task, the proposed method learns simple tasks firstly and generalizes the learned knowledge to complex tasks. Better knowledge adaptability is obtained by gradually increasing the task difficulty. Finally, utilizing gradient-based meta learning, the initialization parameters are trained by a small number of gradient steps. The effectiveness of the proposed method is validated by a practice rolling bearing dataset and a power system dataset. The experiment results illustrate that the proposed method can identify new categories within only a few samples. In addition, it also shows advantages in fault diagnosis when the categories are fine-grained according to the working conditions.
Chapter
Batch normalization is currently the most widely used variant of internal normalization for deep neural networks. Additional work has shown that the normalization of weights and additional conditioning as well as the normalization of gradients further improve the generalization. In this work, we combine several of these methods and thereby increase the generalization of the networks. The advantage of the newer methods compared to the batch normalization is not only increased generalization, but also that these methods only have to be applied during training and, therefore, do not influence the running time during use. https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/?p=%2FWeightAndGradientCentralization&mode=list.
Article
Relation classification (RC) task is one of fundamental tasks of information extraction, aiming to detect the relation information between entity pairs in unstructured natural language text and generate structured data in the form of entity-relation triple. Although distant supervision methods can effectively alleviate the problem of lack of training data in supervised learning, they also introduce noise into the data and still cannot fundamentally solve the long-tail distribution problem of the training instances. In order to enable the neural network to learn new knowledge through few instances such as humans, this work focuses on few-shot relation classification (FSRC), where a classifier should generalize to new classes that have not been seen in the training set, given only a number of samples for each class. To make full use of the existing information and get a better feature representation for each instance, we propose to encode each class prototype in an adaptive way from two aspects. First, based on the prototypical networks, we propose an adaptive mixture mechanism to add label words to the representation of the class prototype, which, to the best of our knowledge, is the first attempt to integrate the label information into features of the support samples of each class so as to get more interactive class prototypes. Second, to more reasonably measure the distances between samples of each category, we introduce a loss function for joint representation learning (JRL) to encode each support instance in an adaptive manner. Extensive experiments have been conducted on FewRel under different few-shot (FS) settings, and the results show that the proposed adaptive prototypical networks with label words and JRL has not only achieved significant improvements in accuracy but also increased the generalization ability of FSRC.
Article
Deep-learning algorithms have produced promising results, however, domain adaptation remains a challenge. In addition, excessive training time and computing resource requirements need to be addressed. Deep-learning algorithms face a domain adaptation issue when the data distribution of a target domain differs from that of the source domain. The emerging concept of broad learning shows potential in addressing the domain adaptation and training time issues. An adaptive unsupervised broad transfer learning (AUBTL) algorithm is proposed to tackle the cross-domain problems. The proposed algorithm utilizes a sparse auto-encoder and random orthogonal mapping to extract and augment the feature space. Then, it initializes the weights of a classifier by solving a ridge regression problem. The logit ranking strategy is applied to develop a transfer estimator to evaluate and sample data in the target domain for an adaptive transfer. Based on the sampled data, AUBTL optimizes the hyper-parameter space. Performance of the AUBTL algorithm is validated with three benchmark datasets including 20 transfer tasks. The computational results demonstrated the efficiency and accuracy of the proposed algorithm over other deep learning algorithms considered in this research.
Article
Deep learning-based domain adaptation algorithms with various representations have been recently developed to address the domain shift problem in mechanical fault diagnosis. However, few research have focused on potential improvements through multiple representations. Thus, a multi-representation-based domain adaptation network is proposed in this paper. Three complementary time-frequency representations are first proposed to serve as input-based multiple representations for the subsequent parallel models. Then, parallel models with improved inception modules are trained to obtain feature-based multiple domain-invariant representations. Finally, ensemble learning through majority voting is used to obtain the final results. Comprehensive experimental results on two test rigs reveal that the proposed algorithm outperforms state-of-the-art single-representation-based domain adaptation algorithms in terms of cross-domain fault diagnosis. Furthermore, visualization results demonstrate that the proposed algorithm extracts transferable features and takes advantage of ensemble learning to achieve high-precision diagnosis.