ArticlePDF Available

Few-Shot Learning for Fault Diagnosis With a Dual Graph Neural Network

January 2022
IEEE Transactions on Industrial Informatics PP(99):1-9

January 2022
PP(99):1-9

DOI:10.1109/TII.2022.3205373

Authors:

Han Wang

Tongji University

Yukai Zhao

Tongji University

Show all 6 authorsHide

Mechanical fault diagnosis is crucial to ensure safe operations of equipment in intelligent manufacturing systems. Deep learning-based methods have been recently developed for fault diagnosis due to their advantages in feature representation. However, most of these methods fail to learn relations between samples and thus perform poorly without sufficient labeled data. In this paper, we propose a new few-shot learning method named Dual Graph Neural network (DGNNet) with residual blocks to address fault diagnosis problems with limited data. Firstly, the residual module learns the feature of samples with image data transferred from original signals. Secondly, two complete graphs built on the sample features are used to extract the instance-level and distribution-level relations between samples. In particular, an alternate update policy between the instance and distribution graphs integrates the multilevel relations to propagate the label information of a few labeled samples to unlabeled samples. This technique leverages labeled and unlabeled samples to identify unseen faults, encouraging DGNNet competent in fault diagnosis tasks with very few labeled samples. Extensive results on various datasets show that DGNNet achieves excellent performance in supervised fault diagnosis tasks and outperforms baselines by a great margin in semi-supervised cases.

Content uploaded by Han Wang

Content may be subject to copyright.

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023 1559

Few-Shot Learning for Fault Diagnosis With a

Dual Graph Neural Network

Han Wang , Jingwei Wang , Yukai Zhao , Qing Liu , Min Liu , and Weiming Shen , Fellow, IEEE

Abstract—Mechanical fault diagnosis is crucial to ensure

the safe operations of equipment in intelligent manufactur-

ing systems. Deep learning-based methods have been re-

cently developed for fault diagnosis due to their advantages

in feature representation. However, most of these methods

fail to learn relations between samples and thus perform

poorly without sufﬁcient labeled data. In this article, we

propose a new few-shot learning method named dual graph

neural network (DGNNet) with residual blocks to address

fault diagnosis problems with limited data. First, the resid-

ual module learns the feature of samples with image data

transferred from original signals. Second, two complete

graphs built on the sample features are used to extract the

instance-level and distribution-level relations between sam-

ples. In particular, an alternate update policy between the

instance and distribution graphs integrates the multilevel

relations to propagate the label information of a few labeled

samples to unlabeled samples. This technique leverages

labeled and unlabeled samples to identify unseen faults,

encouraging DGNNet competency in fault diagnosis tasks

with very few labeled samples. Extensive results on vari-

ous datasets show that DGNNet achieves excellent perfor-

mance in supervised fault diagnosis tasks and outperforms

baselines by a great margin in semisupervised cases.

Index Terms—Distribution learning, fault diagnosis, few-

shot learning (FSL), graph neural network (GNN), semisu-

pervised learning.

I. INTRODUCTION

THE increasing availability of big data on manufacturing

equipment offers unprecedented opportunities to explore

methods and tools for predictive maintenance of machinery.

Predictive maintenance aims to prevent mechanical failures

or to detect them before they occur to reduce losses. In the

last decades, researchers have devoted much attention to fault

Manuscript received 29 December 2021; revised 26 July 2022; ac-

cepted 4 September 2022. Date of publication 9 September 2022; date

of current version 13 December 2022. This work was supported by the

National Key R&D Program of China under Grant 2019YFB1704700 and

NSFC under Grant 62273261. Paper no. TII-21-5812. (Corresponding

author: Min Liu.)

Han Wang, Jingwei Wang, Yukai Zhao, Qing Liu, and Min Liu

are with the College of Electronics and Information Engineering,

Tongji University, Shanghai 201804, China (e-mail: 469702227@qq.

com; jwwang@tongji.edu.cn; zhaoyukaijake@tongji.edu.cn; 2010142

@tongji.edu.cn; lmin@tongji.edu.cn).

Weiming Shen is with the State Key Laboratory of Digital Manufac-

turing Equipment and Technology, Huazhong University of Science and

Technology, Wuhan 430074, China (e-mail: wshen@ieee.org).

Color versions of one or more ﬁgures in this article are available at

https://doi.org/10.1109/TII.2022.3205373.

Digital Object Identiﬁer 10.1109/TII.2022.3205373

diagnosis, especially for rotating machinery [1]. Because rotat-

ing machinery, such as engines, turbines, and compressors, is

a critical component of most industrial facilities, many mathe-

matical models based on mechanical characteristics have been

developed to identify faults of rotating machinery. Yet, these

methods relying heavily on prior knowledge and expert experi-

ence are difﬁcult to be applied to the real-world manufacturing

environment with enormous highly noisy data. The requirements

for real-time and high-performance drive researchers toward

data-driven fault diagnosis methods, such as deep learning (DL)

[2], [3].

Recent studies have employed DL methods, such as deep

belief networks [4], deep autoencoder [5], and convolutional

neural networks (CNNs) [6], [7], for fault diagnosis of rotating

machinery [8]. For example, Han et al. [9] proposed an improved

deep belief network for gear fault detection and obtained a high

diagnostic accuracy. Ren et al. [10] designed a deep autoencoder

to achieve nonlinear mapping of input fault data automatically.

Kiranyaz et al. [11] presented an adaptive CNN for real-time

fault detection and obtained an excellent classiﬁcation perfor-

mance. However, these methods require sufﬁcient labeled sam-

ples to train DL models; otherwise, they cannot achieve high

fault diagnosis performance [12]. In real industrial scenarios,

rotating machines usually work in normal conditions and seldom

misfunction, leading to rare fault data [13]. Consequently, DL

methods fail to maintain their advantages in industrial fault

diagnosis with small data.

Few-shot learning (FSL) is an emerging paradigm to train a

DL model with limited labeled samples. Brieﬂy, an FSL task

is divided into many small subtasks and each subtask consists

of two sets: a support set (containing a few labeled samples)

and a query set (containing one or several unlabeled samples).

FSL-based fault diagnosis aims to obtain DL models which

can identify the fault type of unseen samples in the query set

using limited labeled samples from the support set [14]. A few

researchers have proposed FSL-based methods to solve fault

diagnosis with limited data [15]. For instance, Zhang et al.

[16] utilized the Siamese neural network to tackle fault data

scarcity problems. Feng et al. [17] proposed a semisupervised

meta-learning network with an attention mechanism to extract

distinct features from support samples to generate prototypes

for the classiﬁcation of query samples. However, these meth-

ods ignored the relations between support and query samples,

including pairwise relations between two samples (i.e., instance-

level relation) and high-order relations between all samples

(i.e., distribution-level relation). Ignoring these relations that

See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.

1560 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023

distinct samples of different fault classes limit the performance

of previous FSL-based diagnosis methods.

In this article, we propose a novel FSL method to solve fault

diagnosis problems with limited data. First, we transform the

vibration signals of rotating machinery into images with each

image representing a sample and split them into support sets

and query sets. In each FSL subtask, we extract the sample

featured using a residual network [18]. Here, we use graphs

to abstract the relations between samples where each sample

represents a node in a graph. In particular, we construct two

graphs for all samples in a subtask, i.e., an instance graph and

a distribution graph. In the instance graph, the sample feature is

regarded as the node feature (instance feature) and the feature

similarity of two samples is regarded as the edge weight that

represents the instance-level relation of the two samples. In

the distribution graph, the distribution feature (the similarity

between a sample and the other samples) is regarded as a node

feature, and the edge weight representing the distribution-level

relation is calculated by the distribution feature of two nodes.

Hence, we design a dual graph neural network (DGNNet) to

learn the instance-level and distribution-level relations in the

above graphs for fault diagnosis. Speciﬁcally, DGNNet uses an

alternate update policy to propagate the label information of

labeled samples to unlabeled samples leveraging the relations

between samples at different levels. Moreover, this learning

strategy allows the support set to contain unlabeled samples,

promoting our proposed DGNNet to solve semisupervised fault

diagnosis problems.

The contributions of this article are summarized as follows.

1) We propose a novel FSL method named DGNNet that

integrates instance-level and distribution-level relation

learning for fault diagnosis. DGNNet learns different

level relations of query samples with limited data.

2) The alternate update policy between the instance graph

and distribution graph propagates label information of

scarce labeled samples to unlabeled samples within sev-

eral updates, facilitating DGNNet to address semisuper-

vised fault diagnosis.

3) Extensive experiments are implemented to evaluate

DGNNet in two benchmark datasets and a real-world

dataset. Our results show that DGNNet outperforms base-

lines by 3%–10% in supervised fault diagnosis tasks and

by 10%–12% in semisupervised cases.

The rest of this article is organized as follows. Section II

presents the preliminaries of this study. The proposed method

is detailed in Section III. In Section IV, the effectiveness of

DGNNet is evaluated with various case studies. Ablation studies

and discussion are presented in Section V. Finally, Section VI

concludes this article and presents future work.

II. PRELIMINARIES

A. Fault Diagnosis

DL-based fault diagnosis methods typically transform the

vibration signals into three-channel images. The segmentation

length of the signals should be adjusted for speciﬁc scenarios

containing a complete period for better classiﬁcation. Each

Fig. 1. Episodic paradigm of FSL (5-way 1-shot).

84 ×84 image is converted by 4096 signal data points with a

sliding window of 84 as the length, allowing some of the data

points to be reused [2]. The three-channel image is generated

as follows: the value of each data point is normalized; each

pixel is ﬁlled correspondingly by each data point; and the signal

segments ﬁll all rows of the image by sequence.

The input of DGNNet is the embedded features extracted from

these images with a residual network. The labeled and unlabeled

samples are utilized as the support set. Thus, the support set can

be denoted as

S=¯

L∪¯

U=(x1,y

1),...,xn¯

l,y

n¯

l∪xn¯

l+1,...,x

n¯

l+n¯u

(1)

where ¯

Lis the labeled data and ¯

Uis the unlabeled data. n¯

and n¯urepresent the number of labeled and unlabeled samples,

respectively. The input is deﬁned as

X=[x1,x

2,...,x

n]∈Rn×3×DI(2)

where DI=84 ×84, n=n¯

l+n¯u+n¯qis the number of

all samples, and n¯qdenotes the number of samples in Q.All

query samples are classiﬁed by the FSL-based models, and the

prediction of a query sample can be formulated as

yi=argmax{pϕ(y=c|xi∈Q)}N

c=1(3)

where pϕ(y=c|xi∈Q)represents the probability that the

query sample xiis predicted as class c.

B. Few-Shot Learning (FSL)

FSL aims to learn a model to recognize unseen samples

through training with limited labeled examples [19]. Many

researchers use the episodic paradigm from meta-learning to

solve FSL problems [20]. In an episode (a subtask), a support

set Sincludes Nclasses with Ksamples per class, and a query

set Qcontains ˜

Tsamples. The goal of a subtask is to classify the

query samples using the support samples. Such a classiﬁcation

problem is deﬁned as an N-way K-shot problem. Fig. 1 shows an

example of the 5-way 1-shot FSL problem. We construct 15 000

training subtasks and 15 000 test subtasks. In each subtask, the

support set contains ﬁve different classes with one image per

class, and the query set contains one unlabeled image to be

classiﬁed. The overall classiﬁcation accuracy was calculated by

Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.

WANG et al.: FEW-SHOT LEARNING FOR FAULT DIAGNOSIS WITH A DUAL GRAPH NEURAL NETWORK 1561

Fig. 2. Overall framework of DGNNet. The unlabeled samples are only used in semisupervised FSL for fault diagnosis.

the percentage of subtasks that the unlabeled sample is correctly

classiﬁed.

In this article, we discuss both the supervised and semisuper-

vised FSL for fault diagnosis. In the supervised cases, all support

samples are labeled, whereas the support set in the semisuper-

vised cases contains both labeled and unlabeled samples. Note

that query samples in both cases are unlabeled.

C. Graph Neural Networks (GNNs)

GNNs are DL methods for solving tasks on graph-structured

data [21]. Recently, GNNs have been widely applied in semisu-

pervised learning or FSL [22], [23]. GNNs typically include

two processes, i.e., the node update and edge update. For an

undirected graph G=(V,)with the node set Vand edge set E,

its adjacency matrix A∈Rn×nis deﬁned as

Aij =1 if nodes (vi,v

j)connected

0 if nodes (vi,v

j)not connected .(4)

The updates of the node viand edge ekare formulated as

ek=ðe(¯v

i,e

k)¯e

k=ςe→v(E)(5)

vi=ðv(¯e

k,v

i)¯v

i=ςv→e(V)(6)

where E={e

k}k=1:Neand V={v

i}i=1:Nvrepresent the set

of edges and nodes (of cardinality Neand Nv). ekand viis the

attribute of an edge and a node. ðeis mapped across edges to

calculate the per-edge update and ðvis mapped across nodes to

calculate the per-node update. ςfunctions reduce a set to a single

element.

III. DUAL GRAPH NEURAL NETWORK (DGNNET)

In this section, we ﬁrst introduce the proposed DGNNet

framework, and then brieﬂy describe the feature learning. Then,

we elaborate on the instance-level and distribution-level relation

learning followed by the optimization of DGNNet.

A. Framework

The DGNNet framework is shown in Fig. 2. It contains

four parts, the feature learning module (residual network), the

instance-level relation learning module (instance graph), the

distribution-level relation learning module (distribution graph),

and the synergistic optimization. The residual network extracts

feature vectors from transformed images. The instance graph

learns the instance feature of all samples and the instance-level

relation between samples. The distribution graph is used to learn

Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.

1562 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023

the distribution feature and the distribution-level relation. The

synergistic optimization integrates the loss of the instance and

distribution similarities and optimizes DGNNet. In the end,

the last generation of the instance graph implements the fault

diagnosis.

B. Feature Learning

The residual network comprises several residual blocks (Res-

blocks), and fully connected (FC) layers, to extract embedded

features from transformed images. Each Resblock comprises

four 2-D convolutional (Conv2D) layers, four batch normal-

ization (BN) layers, one activation function ζLReLU, and one

max-pooling operation. In each Resblock, BN layers are used to

restrain internal covariate shift and smooth the loss surface [24].

The principle of BN can be formulated as

fBN =γi

xi−μ(X)

σ(X)2+ε

+βi(7)

where xirepresents the ith sample, and Xrepresents all input

samples. γiand βiare the learnable parameters, initialized to 1

and 0, respectively. ε=1×10−5is the hyperparameter.

The mean μ(xi)and variance σ(xi)of mini-batch in (7) are

denoted as

μ(X)= 1



i=1

xi(8)

σ(X)=

1

nBnB

i=1(xi−μ(X))2(9)

where nBis the size of a mini-batch, set to 50 in this article.

The feature vector of the sample xiis obtained through the

residual network and calculated as

vfi=fresnet (xi)(10)

where vfi∈Rm, and fresnet is a residual network.

C. Instance-Level Relation Learning

First, we concatenate the feature vector of each sample and

one-hot encoding of its label into a merged vector (i.e., instance

feature). Here, for an unlabeled sample, all elements in its one-

hot encoding are zero. We use this merged vector as the initial

instance feature of a sample, denoted as

vins

0,i=(vfi,(yi)) (11)

where vins

0,i∈Rm,is the concatenation operator, and (yi)is

a one-hot encoding of the label yiof a sample.

The instance similarity between any two samples (i.e.,

instance-level relation) is calculated by instance features and

formulated as

eins

0,ij =fGins

0vins

0,i−vins

0,j

2(12)

where eins

0,ij ∈R, and fGins :Rm→Ris an encoding network

to convert instance similarity into a certain scalar. As shown in

Fig. 3(a), the encoding network contains two Conv layers, two

BN layers, two LReLU activation layers, and a sigmoid layer.

Fig. 3. Network details about the instance graph and distribution

graph. (a) Encoding Network. (b) MLPd2i.(c)MLPi2d.

Then, we construct a fully connected instance graph where

any one node connects to the other nodes. Each sample is

regarded as a node with an instance feature, and the instance

similarity between two samples is regarded as the weight of

the edge between them. This instance graph is deﬁned as

Gins

l=(Vins

l,Eins

l), where Vins

l={vins

l, i}and Eins

l={eins

l, ij }.

vins

l, i ∈Rmis the instance feature of the node iat the lth gener-

ation, and eins

l, ij ∈Ris the instance similarity of the edge (i, j)

at the lth generation. The node and edge features are updated,

respectively, by the following formulas:

vins

l, i =fMLPd2i⎛

⎝vins

l−1,i



j=1edis

l, ij ·vins

l−1,j

⎞

⎠(13)

eins

l, ij =fGins

lvins

l−1,i−vins

l−1,j

2·eins

l−1,ij (14)

where edis

l, ij is the distribution similarity of samples iand

jin the distribution graph (see details in the next section).

fMLPd2i:(Rm,Rm)→Rmis a multilayer perceptron (MLP)

mapping the distribution similarity and instance feature into

a new instance feature of the node i.MLPd2i,asshownin

Fig. 3(b), contains two Conv layers, two BN layers, and two

LReLU activation layers. The instance similarity in the ﬁnal

instance graph contributes to the N-way K-shot fault diagnosis,

and the probability distribution for xibeing class cis denoted

Pϕ(yi=c|xi∈Q)=ζs⎛

⎝



j=1

eins

lf,ij ·(yj)⎞

⎠(15)

where eins

lf,ij represents the instance similarity in the instance

graph at the ﬁnal generation, and ζsis the softmax function.

D. Distribution-Level Relation Learning

For l=0, the distribution feature and distribution similarity

(i.e., distribution-level relation) are initialized as

vdis

0,i=NK

j=1δ(yi,y

j)if labeled

1

NK ,..., 1

NK if unlabeled (16)

edis

0,ij =fGdis

0vdis

0,i−vdis

0,j

2(17)

where vdis

0,i∈RNK,edis

0,ij ∈R, and fGdis :RNK →Ris an

encoding network to convert distribution similarity into a certain

scalar. δis the Kronecker function which is 1 if the variables

Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.

WANG et al.: FEW-SHOT LEARNING FOR FAULT DIAGNOSIS WITH A DUAL GRAPH NEURAL NETWORK 1563

are equal, and 0 otherwise. The encoding network is shown in

Fig. 3(a).

Then, we build a full connected distribution graph Gdis

(Vdis

l,Edis

l)to learn the distribution features and distribution-

level relations, where Vdis

l={vdis

l, i},Edis

l={edis

l, ij }.vdis

l, i ∈

RNK is the distribution feature of the node iat the lth generation,

and edis

l, ij ∈Ris the distribution similarity of the edge (i, j)atthe

lth generation. The representation of node iin the distribution

graph is obtained as

vdis

l, i =fMLPi2dvdis

l−1,i

,NK

j=1eins

l, ij (18)

where fMLPi2d:(RNK,RNK)→RNK is an MLP to map the

instance similarity and distribution feature into a new distribu-

tion feature of node i.TheMLPi2dis shown in Fig. 3(c).The

weight of the edge (i, j)in the distribution graph is calculated

at the lth generation as

edis

l, ij =fGdis

lvdis

l, i −vdis

l, j 2·edis

l−1,ij

.(19)

Except for l=0,the instance and distribution features are

learned by DGNNet with an alternate update strategy, as shown

in Fig. 2. In particular, a complete update of the lth generation

is: Eins

MLPi2d

→Vdis

Gdis

→Edis

MLPd2i

→Vins

Gins

→Eins

l+1.

E. Synergistic Optimization

The class prediction of the concerned sample xi∈Qin N-way

K-shot fault classiﬁcation is reformulated as

yi=argmax

⎧

⎨

⎩ζs⎛

⎝



j=1

eins

lf,ij ·(yj)⎞

⎠⎫

⎬

⎭

c=1

.(20)

The loss of GNN for instance-level relation learning at the lth

generation is deﬁned as

Lins

l=Lce ⎛

⎝ζs⎛

⎝



j=1

eins

lf,ij ·(yj)⎞

⎠,yi⎞

⎠(21)

where Lce stands for the cross-entropy loss function, and ζsis

the softmax activation function. The loss for distribution-level

relation learning at the lth generation is deﬁned as

Ldis

l=Lce ⎛

⎝ζs⎛

⎝



j=1

edis

l, ij ·(yj)⎞

⎠,y

i⎞

⎠.(22)

Then, the loss function of the proposed DGNNet consists of

the instance-level loss and distribution-level loss deﬁned as



l=1λinsLins

l+ξdisLdis

l(23)

where lfdenotes the number of generations, and λins and ξdis

are the weight parameters set to 1.0 and 0.1 in this article.

We use Ranger21 [25], a recently proposed optimizer, to opti-

mize the proposed DGNNet. Ranger21 integrates AdamW [26],

Lookahead [27], and gradient centralization [28]. It can reduce

the variance of the training loss and improve the convergence

performance.

Fig. 4. Results of supervised 5-way case on the CWRU dataset.

WDCNN∗indicates that WDCNN is trained in FSL.

IV. CASE STUDIES

To verify the capability of DGNNet for FSL-based fault diag-

nosis, we conduct case studies on various real bearing datasets.

In this section, the comparative experiments contain four re-

lated methods: wide deep CNN (WDCNN) [29], FSL-based

fault diagnosis (FLFD) [16], GNN [18], and self-supervised

joint learning (SSJL) method [30]. GNN is used as a baseline

model comparison, and the components are consistent with our

DGNNet except for the absence of the distribution graph. We

implement and evaluate them with two benchmark datasets from

the Case Western Reserve University (CWRU) and the Machin-

ery Failure Prevention Technology (MFPT). Furthermore, we

carry out a real-world industrial case to verify the generalization

performance of DGNNet. We train all models on a 3090Ti GPU

and report the average result of 20 experiments in all case studies.

A. Same-Load Fault Diagnosis

1) Description of CWRU Dataset: The CWRU dataset has

been widely used to verify intelligent fault diagnosis methods.

The vibration data are collected by the accelerometer at the

driver end with 12 kHz sampling frequency, which consists of

four bearing health conditions: normal state (NS), inner race

failure (IF), outer race failure (OF), and ball element failure

(BF). Each fault category is manufactured to the driver end

bearings with diameters of 0.18, 0.36, 0.53, and 0.71 mm. Thus,

the bearing vibration data without motor loads can be divided

into 12 fault categories. Finally, the meta dataset is constructed of

3840 samples, which contains 100 training samples, 200 testing

samples, and 20 validation samples per class.

2) Supervised Fault Diagnosis: We verify the performance

of DGNNet with comparison experiments by randomly selecting

6, 10, 20, 30, 60, and 100 samples per class. It took us almost 6

h to run the 5-way 1-shot experiment and 12 h to run the 5-way

5-shot experiment in 20 epochs.

Fig. 4 shows the accuracy of all methods in supervised fault

diagnosis tasks. It is clear that DGNNet maintains high perfor-

mance even with six samples per class and outperforms other

methods either in 1- or 5-shot experiments. As can be seen

from Table I, when six samples are provided for each class,

the fully supervised case results in a 1-shot accuracy of 99.55

and a 5-shot accuracy of 99.81. Under the same 5-way 1-shot

scenario, the accuracy is almost 11% higher than that of GNN

and 20% higher than that of FLFD and WDCNN, demonstrating

that the CNN-based method only learns partial fault features.

Notably, GNN shows greater identiﬁcation ability than WDCNN

and FLFD, which proves that GNN outperforms the CNN-based

methods through learning pairwise relations between any two

Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.

1564 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023

TABLE I

SUPERVISED FAULT DIAGNOSIS RESULTS ON CWRU DATASET

TABLE II

SEMISUPERVISED FAULT DIAGNOSIS RESULTS ON CWRU DATASET

Fig. 5. T-SNE visualization for semisupervised 5-shot classiﬁcation.

(a) WDCNN. (b) SSJL. (c) GNN. (d) DGNNet.

samples. DGNNet can identify all faults precisely with 100.00%

accuracy under the 5-shot case, but GNN and FLFD fail to do

this. The 5-shot training paradigm generally performs better than

the 1-shot, which could be interpreted that more support samples

help models ﬁnd the global convergence direction.

3) Semisupervised Fault Diagnosis: To further evaluate the

effectiveness of DGNNet for semisupervised fault diagnosis,

we train DGNNet on support sets with different labeled and

unlabeled samples. We trained DGNNet in 25 epochs for 7

and 13 h to perform 5-way 1- and 5-shot classiﬁcation tasks,

respectively.

The results of the semisupervised experiments are shown in

Table II. All semisupervised methods are trained in 25 epochs.

The number of unlabeled samples is consistent with the number

of labeled samples in each support set, i.e., n¯u∈{1,5}for

each class, and this selection method has proved its reliability

in subsequent experiments. It can be seen that DGNNet outper-

forms baselines in both 1- and 5-shot cases. Fig. 5 shows the 2-D

t-SNE visualization of extracted high-dimensional features for

each method. Remarkably, DGNNet cluster samples of the same

Fig. 6. Effectiveness of various unlabeled sample numbers on 1-shot

(left) and 5-shot (right) classiﬁcation.

classes together and keep samples with different categories as

separate as possible.

By comparing the results of the supervised (see Table I) and

semisupervised (see Table II) experiments, it can be seen that

DGNNet achieves better classiﬁcation using unlabeled data.

Although this improvement is small, it is worth noting that

models are hard to converge when very few labeled samples

are given in the supervised scenario. DGNNet still achieves

high classiﬁcation accuracy when combining only three labeled

samples per class with unlabeled samples.

As demonstrated in the comparison experiments, unlabeled

samples are utilized in training to improve the fault diagnosis

performance. As shown in Fig. 6, the accuracy of DGNNet

improves when a proper number of unlabeled samples is given.

However, the performance of DGNNet drops dramatically when

the number of unlabeled samples exceeds that of labeled sam-

ples. This indicates that the best classiﬁcation occurs when

the number of unlabeled samples approximates that of labeled

samples in each subtask, i.e., n¯u=K. In this article, we use

the consistent setting n¯u∈{1,5}for all FSL experiments.

B. Multiload Fault Diagnosis

1) Description of MFPT Dataset: The vibration signals from

the CWRU are all collected at the same motor speed, 1797 rpm.

Thus, the classiﬁcation models could perform well for FSL based

fault diagnosis. To evaluate the validity of DGNNet on multiload

scenarios, we implement the fault classiﬁcation experiments on

MFPT dataset, which contains three sets of multiload bearing

signals. The normal (N) baseline set is constructed at a load

of 270 lbs with a sampling rate of 97 656 sample per second

(SPS) for 6 seconds. Meanwhile, the fault signals of inner race

(IR) and outer race (OR) are obtained from the bearing test rig

at a sampling rate of 48828 SPS for 3 seconds, which works

under six load conditions: 50, 100, 150, 200, 250, 300 lbs.

We constructed the meta dataset for MFPT utilizing the signal

preprocessing method mentioned in Section II.A, and each load

condition contains 100 training samples, 200 testing samples,

and 20 samples for veriﬁcation. It took us 4 hours to run the

3-way 1-shot experiment and 8 hours to run the 3-way 5-shot

experiment in 20 epochs.

2) Supervised Fault Diagnosis: In the MFPT dataset, each

fault class comprises the same number of samples under six

different loads. That is, {1,2,3}samples are selected from

different load data, and each class contains {6,12,18}samples.

Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.

WANG et al.: FEW-SHOT LEARNING FOR FAULT DIAGNOSIS WITH A DUAL GRAPH NEURAL NETWORK 1565

TABLE III

SUPERVISED FAULT DIAGNOSIS ON MFPT DATASET

Fig. 7. t-SNE visualization for supervised 5-shot classiﬁcation.

(a) WDCNN. (b) GNN. (c) DGNNet.

TABLE IV

SEMISUPERVISED FAULT DIAGNOSIS RESULTS ON MFPT DATASET

Here, we conduct experiments with various numbers of training

samples, where the number of samples with different motor

speeds remains exactly the same.

As shown in Table III, when there are 12 or 18 training samples

in each class for few-shot classiﬁcation, DGNNet can identify

all faults accurately. The accuracy of DGNNet can reach 100%,

which is 13.38% higher than that of GNN and 25.64% higher

than that of WDCNN. In order to evaluate the performance of

various comparison methods with multiload samples, the ex-

tracted fault features are visualized by t-SNE in Fig. 7. WDCNN

can identify the vibration signals in the normal state but can

hardly separate the fault samples under multiple loads. DGNNet

can learn better relations between samples from limited samples,

and accurately identiﬁes all faults of bearings under multiload

conditions. DGNNet makes fault features as close as possible in

each class and as separate as possible between different classes.

3) Semisupervised Fault Diagnosis: To further evaluate the

semisupervised capability of DGNNet, we carry out a series of

multiload experiments with 20 epochs. The number of unlabeled

samples is the same as that of labeled samples for each class,

i.e., K.AsshowninTable IV, under the same experimental

setting, DGNNet outperforms baseline methods. DGNNet can

classify all faults accurately even with one labeled sample per

load condition, that is, there are six labeled samples and one

unlabeled sample for each class in the 1-shot experiment. Note

that GNN uses an appropriate number of unlabeled samples to

improve the accuracy by almost 5%, and DGNNet constructs the

distribution graph to propagate labels and makes better use of

unlabeled data than GNN. Thus, it can be inferred that DGNNet

Fig. 8. Illustration of the OCB dataset.

TABLE V

DESCRIPTION OF OCB DATASET

TABLE VI

SUPERVISED FAULT DIAGNOSIS RESULTS ON OCB DATASET

has greater potential in tackling semisupervised FSL problems

for multiload fault diagnosis.

C. Industrial Scenario Fault Diagnosis

1) Description of OCB Dataset: Most existing methods per-

form well on the vibration data collected from the test rig and

are rarely applied to real-world data. In this section, compari-

son methods are evaluated on a real-world oxygen compressor

bearing (OCB) dataset from a smart factory. As shown in Fig. 8,

the bearing data is provided by the largest copper smelter in the

world, measured by an accelerometer on the oxygen compressor,

and composed of three bearing health conditions: normal con-

dition (NC), inner race fault (IRF), and outer race fault (ORF).

The bearing signals are collected by the accelerometer every ﬁve

seconds and stored in the database, from which we took six days

of historical data to construct the dataset. The description of the

OCB dataset is shown in Table V.

2) Supervised Fault Diagnosis: To analyze the performance

of all methods in the real-world scenario, we carry out a 3-

way fault diagnosis experiment on OCB as an example. Train-

ing DGNNet on OCB took us almost 4 hours to run the 1-

shot experiment and 8 hours to run the 5-shot experiment in

20 epochs. Table VI shows the performance of all methods

Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.

1566 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023

TABLE VII

SEMISUPERVISED FAULT DIAGNOSIS RESULTS ON OCB DATASET

TABLE VIII

DESCRIPTION FOR THREE CASES IN ABLATION STUDIES

in the supervised scenario. DGNNet achieves desirable perfor-

mance for fault diagnosis in a real-world scenario. With only six

samples available, DGNNet can obtain an excellent accuracy

of 99.48% in the 1-shot experiment, which is 2.89% higher

than that of GNN and 35.56% higher than that of WDCNN.

The classiﬁcation result of DGNNet can achieve an accuracy of

100.00% with only six samples per class. In a 1-shot experiment,

DGNNet can effectively distinguish different faults, whereas

GNN and WDCNN could only identify normal samples well

but easily confuse faults.

3) Semisupervised Fault Diagnosis: In real-world scenarios,

we further verify the performance of DGNNet. The semisu-

pervised experiments are carried out in 20 epochs. DGNNet

can further improve diagnostic ability when a few unlabeled

samples are provided. As shown in Table VII, the baseline

methods can only obtain excellent diagnostic results when 20

samples are used for training. In 1-shot experiments, DGNNet

obtains an excellent accuracy of 100.00%, which is much higher

than other baseline methods. DGNNet shows superior diagnostic

performance in all semisupervised experiments. It can be seen

that DGNNet has great potential in industrial applications of

fault diagnosis, where unlabeled data from rotating machinery

is easier to collect.

V. A BLATION STUDIES AND DISCUSSIONS

In this section, we use three cases of ablation studies to explore

the effect of the size of Nand the generation numbers of DGNNet

on the diagnostic results. The three cases are: (a) same-load case

on CWRU; (b) multiload case on MFPT; (c) industrial case on

OCB. The detailed description is presented in Table VIII. Finally,

the convergence of DGNNet on the CWRU is analyzed, which

is similar to the other cases.

A. N-Way Fault Diagnosis

The effect of the number of ways is further explored for N-

way fault diagnosis in various scenarios. As shown in Fig. 9,

DGNNet obtains an accuracy of 99.83% in 2-way 1-shot fault

diagnosis on the CWRU dataset, which is 0.33% higher than in

5-way classiﬁcation. In the 2-way industrial scenario (case c),

DGNNet has a more attractive diagnostic performance. Note that

Fig. 9. Inﬂuence of Nin N-way 1-shot classiﬁcation (25 epochs).

Fig. 10. Generation numbers in DGNNet on three cases (25 epochs).

Fig. 11. Evolution of the test accuracy under different optimizers.

we carried out more difﬁcult experiments in Section IV, allowing

us better to evaluate the semisupervised learning effectiveness

of DGNNet.

B. Generation Numbers

For datasets of various scales and complexities, the number

of alternate updating generations determines the test accuracy

and convergence speed of DGNNet. As shown in Fig. 10,the

test accuracy is greatly improved with the generation number

increasing from 0 to 2, and it ﬂuctuates within a small range

when the generation number is between 2 and 8. The larger

the generation number, the longer the convergence time. In

this article, we set the generation number as 4 for the tradeoff

between test accuracy and convergence speed.

C. Ranger21 Optimizer

A recently proposed optimizer, Ranger21, is utilized in our

study. Fig. 11 shows the evolution of the test accuracy under

different optimizers in 60 epochs. The default learning rate of

Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.

WANG et al.: FEW-SHOT LEARNING FOR FAULT DIAGNOSIS WITH A DUAL GRAPH NEURAL NETWORK 1567

Adam and AdamW is 3e−3. Expect for Ranger21, in the early

two epochs, Ranger21 undergoes the linear warm-up process,

and then the learning rate is set to 3e−3. The linear warm-down

starts at 40 epochs, and the learning rate will decay until 3e−5.

It can be seen that Ranger21 and AdamW have better perfor-

mance than Adam at the early stage. After 20 epochs, Ranger21

consistently outperforms AdamW. Signiﬁcantly, Ranger21 can

accelerate the model learning and obtain a high accuracy without

compromising generalization.

VI. CONCLUSION AND FUTURE WORK

In this article, we proposed an FSL method for fault diag-

nosis, namely DGNNet. DGNNet uses an instance graph and

a distribution graph to learn the pairwise relation between two

samples and high-order relations between all samples, respec-

tively. The learned multilevel relations help DGNNet to classify

unseen samples. In particular, the distribution graph propagates

universal label information from a few labeled samples to un-

labeled samples, enabling DGNNet to address semisupervised

problems. Extensive experiments were implemented to evaluate

the performance of DGNNet for supervised and semisupervised

fault diagnosis. The results show that DGNNet achieves excel-

lent effectiveness in fault classiﬁcation and respectable general-

ization performance in various scenarios. In the future, we will

try to apply DGNNet to tackle other data scarcity problems and

label new samples in real-world applications. In addition, we

will investigate the diagnosis of unseen fault types using FSL.

REFERENCES

[1] H. Shao, M. Xia, G. Han, Y. Zhang, and J. Wan, “Intelligent fault diagnosis

of rotor-bearing system under varying working conditions with modiﬁed

transfer convolutional neural network and thermal images,” IEEE Trans.

Ind. Inform., vol. 17, no. 5, pp. 3488–3496, May 2021.

[2] G. Xu, M. Liu, Z. Jiang, W. Shen, and C. Huang, “Online fault diagnosis

method based on transfer convolutional neural networks,” IEEE Trans.

Instrum. Meas., vol. 69, no. 2, pp. 509–520, Feb. 2020.

[3] G. Liu, W. Shen, L. Gao, and A. Kusiak, “Predictive modeling with an

adaptive unsupervised broad transfer algorithm,” IEEE Trans. Instrum.

Meas., vol. 70, 2021, Art. no. 3520212.

[4] S. Xing, Y. Lei, S. Wang, and F. Jia, “Distribution-invariant deep belief

network for intelligent fault diagnosis of machines under new working

conditions,” IEEE Trans. Ind. Electron., vol. 68, no. 3, pp. 2617–2625,

Mar. 2021.

[5] C. Sun, M. Ma, Z. Zhao, S. Tian, R. Yan, and X. Chen, “Deep transfer

learning based on sparse autoencoder for remaining useful life prediction

of tool in manufacturing,” IEEE Trans. Ind. Inform., vol. 15, no. 4,

pp. 2416–2425, Apr. 2019.

[6] W. Song, W. Shen, L. Gao, and X. Li, “An early fault detection method

of rotating machines based on unsupervised sequence segmentation con-

volutional neural network,” IEEE Trans. Instrum. Meas., vol. 71, 2022,

Art. no. 3504712.

[7] G. W. Xu, M. Liu, Z. F. Jiang, D. Soffker, and W. M. Shen, “Bearing fault

diagnosis method based on deep convolutional neural network and random

forest ensemble learning,”Sensors, vol. 19, no. 5, Mar. 2019, Art. no. 1088.

[8] C.Zhao, G. K. Liu, and W. M. Shen, “A dual-view alignment-based domain

adaptation network for fault diagnosis,”Meas. Sci. Technol., vol.32, no. 11,

Nov. 2021, Art. no. 115102.

[9] D. Han, X. Guo, and P. Shi, “An intelligent fault diagnosis method of

variable condition gearbox based on improved DBN combined with WPEE

and MPE,” IEEE Access, vol. 8, pp. 131299–131309, 2020.

[10] Z. Ren, W. Zhang, and Z. Zhang, “A deep nonnegative matrix factorization

approach via autoencoder for nonlinear fault detection,” IEEE Trans. Ind.

Inform., vol. 16, no. 8, pp. 5042–5052, Aug. 2020.

[11] S. Kiranyaz, A. Gastli, L. Ben-Brahim, N. Al-Emadi, and M. Gabbouj,

“Real-time fault detection and identiﬁcation for MMC using 1-D con-

volutional neural networks,” IEEE Trans. Ind. Electron., vol. 66, no. 11,

pp. 8760–8771, Nov. 2019.

[12] C.Zhao, G. K. Liu, W. M. Shen, and L. Gao, “A multi-representation-based

domain adaptation network for fault diagnosis,” Measurement, vol. 182,

Sep. 2021, Art. no. 109650.

[13] T. C. Zhang et al., “Intelligent fault diagnosis of machines with small &

imbalanced data: A state-of-the-art review and possible extensions,” ISA

Trans, vol. 119, pp. 152–171, Jan. 2021.

[14] L. Feng and C. Zhao, “Fault description based attribute transfer for zero-

sample industrial fault diagnosis,” IEEE Trans. Ind. Inform., vol. 17, no. 3,

pp. 1852–1862, Mar. 2021.

[15] Y. Hu, R. Liu, X. Li, D. Chen, and Q. Hu, “Task-sequencing meta learning

for intelligent few-shot fault diagnosis with limited data,” IEEE Trans. Ind.

Inform., vol. 18, no. 6, pp. 3894–3904, Jun. 2022.

[16] A. Zhang, S. Li, Y. Cui, W. Yang, R. Dong, and J. Hu, “Limited data

rolling bearing fault diagnosis with few-shot learning,” IEEE Access,

vol. 7, pp. 110895–110904, 2019.

[17] Y. Feng, J. Chen, T. Zhang, S. He, E. Xu, and Z. Zhou, “Semi-supervised

meta-learning networks with squeeze-and-excitation attention for few-shot

fault diagnosis,” ISA Trans., vol. 120, pp. 383–401, Jan. 2022.

[18] V. G. Satorras and J. Bruna, “Few-shot learning with graph neural net-

works,” Proc. Int. Conf. Learn. Representations, 2018, pp. 1–13.

[19] Y. Xiao, Y. Jin, and K. Hao, “Adaptive prototypical networks with la-

bel words and joint representation learning for few-shot relation clas-

siﬁcation,” IEEE Trans. Neural Netw. Learn. Syst., pp. 1–12, 2021,

doi: 10.1109/TNNLS.2021.3105377.

[20] T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-learning

in neural networks: A survey,” IEEE Trans. Pattern Anal. Mach. Intell.,

vol. 44, no. 9, pp. 5149–5169, Sep. 2022.

[21] R. Zhou, X. Chang, L. Shi, Y.-D. Shen, Y. Yang, and F. Nie, “Person

reidentiﬁcation via multi-feature fusion with adaptive graph learning,”

IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 5, pp. 1592–1601,

May 2020.

[22] M. Luo, X. Chang, L. Nie, Y. Yang, A. G. Hauptmann, and Q. Zheng, “An

adaptive semisupervised feature analysis for video semantic recognition,”

IEEE Trans. Cybern., vol. 48, no. 2, pp. 648–660, Feb. 2018.

[23] Z. Li, F. Nie, X. Chang, Y. Yang, C. Zhang, and N. Sebe, “Dynamic

afﬁnity graph construction for spectral clustering using multiple features,”

IEEE Trans. Neural Netw. Learn. Syst., vol. 29, no. 12, pp. 6323–6332,

Dec. 2018.

[24] S. Santurkar, D. Tsipras, A. Ilyas, and A. Madry, “How does batch

normalization help optimization?,” Adv. Neural Inform. Process. Syst.,

vol. 31, pp. 2488–2498, 2018.

[25] L. Wright and N. Demeure, “Ranger21: A synergistic deep learning

optimizer,” 2021, arXiv:2106.13731.

[26] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in

Proc. Int. Conf. Learn. Representations, 2019, pp. 1–18.

[27] M. R. Zhang, J. Lucas, G. Hinton, and J. Ba, “Lookahead optimizer: k

steps forward, 1 step back,” Adv. Neural Inform. Process. Syst., vol. 32,

pp. 9593–9604, 2019.

[28] W. Fuhl and E. Kasneci, “Weight and gradient centralization in deep neural

networks,” in Artiﬁcial Neural Networks and Machine Learning (Lecture

Notes in Computer Science), vol. 12894. Cham, Switzerland: Springer,

2021, pp. 227–239.

[29] W. Zhang, G. L. Peng, C. H. Li, Y. H. Chen, and Z. J. Zhang, “A

new deep learning model for fault diagnosis with good anti-noise and

domain adaptation ability on raw vibration signals,”Sensors, vol. 17, no. 2,

Feb. 2017, Art. no. 425.

[30] W. W. Zhang, D. J. Chen, and Y. Kong, “Self-supervised joint learning

fault diagnosis method based on three-channel vibration images,” Sensors,

vol. 21, no. 14, Jul. 2021, Art. no. 4774.

Han Wang received the B.E. degree in automa-

tion in 2019 from Tongji University, Shanghai,

China, where he is currently working toward the

Ph.D. degree in control science and engineering

with the College of Electronics and Information

Engineering.

His research interests include fault diagnosis,

few-shot learning, and graph deep learning.

Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.

1568 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 19, NO. 2, FEBRUARY 2023

Jingwei Wang received the B.E. degree in con-

trol science and engineering from Shandong

University, Jinan, China, in 2016. He is currently

working toward the Ph.D. degree in control sci-

ence and engineering with the College of Elec-

tronics and Information Engineering, Tongji Uni-

versity, Shanghai, China, in 2022.

His research interests include data mining,

network science, and machine learning.

Yukai Zhao received the B.E. degree in au-

tomation from Nanjing Institute of Technology,

Nanjing, China, in 2019. He is currently work-

ing toward the Ph.D. degree in control science

and engineering with the College of Electronics

and Information Engineering, Tongji University,

Shanghai, China.

His research interests include action recogni-

tion, graph data mining, and graph deep learn-

ing.

Qing Liu received the B.E. degree in automa-

tion from Changshu Institute of Technology,

Changshu, China, in 2014, and the M.E. degree

in control science and engineering in 2017 from

Tongji University, Shanghai, China, where he is

currently working toward the Ph.D. degree in

control science and engineering with the Col-

lege of Electronics and Information Engineering,

Tongji University, Shanghai, China.

His research interests include machine learn-

ing, deep learning, and their applications in in-

telligent manufacturing.

Min Liu received the B.E. degree in mechan-

ical engineering from the China University of

Geosciences, Wuhan, China, in 1993, and the

M.E. degree in mechanics and the Ph.D. degree

in mechanical engineering and automation from

Zhejiang University, Hangzhou, China, in 1996

and 1999, respectively.

He is currently a Professor with the College of

Electronics and Information Engineering, Tongji

University, Shanghai, China. He has been work-

ing on computer science and system engineer-

ing and collaborative MRO and intelligent manufacturing for about 14

years. He authored or coauthored more than 100 papers in scientiﬁc

journals and international conferences in related areas. His research

interests include deep learning, fault diagnosis and prediction, and intel-

ligent maintenance.

Weiming Shen (Fellow, IEEE) received the

B.E. and M.S. degrees in mechanical engineer-

ing from Northern Jiaotong University, Beijing,

China, in 1983 and 1986, respectively, and the

Ph.D. degree in system control from the Univer-

sity of Technology of Compiegne, Compiegne,

France, in 1996.

He is currently a Professor with the Huazhong

University of Science and Technology (HUST),

Wuhan, China, and an Adjunct Professor with

the University of Western Ontario, London, ON,

Canada. Before joining HUST in 2019, he was a Principal Research

Ofﬁcer at the National Research Council Canada. His work has been

cited more than 16 000 times with an h-index of 61. He authored

or coauthored several books and more than 560 articles in scientiﬁc

journals and international conferences in related areas. His research

interests include agent-based collaboration technologies and applica-

tions, collaborative intelligent manufacturing, the Internet of Things, and

Big Data analytics.

Authorized licensed use limited to: TONGJI UNIVERSITY. Downloaded on January 04,2023 at 00:33:48 UTC from IEEE Xplore. Restrictions apply.

A multisensory Interaction Framework for Human-Cyber-Physical System based on Graph Convolutional Networks

Article

Full-text available

Aug 2024
ADV ENG INFORM

Human-Cyber-Physical Systems (HCPS), as an emerging paradigm centered around humans, provide a promising direction for the advancement of various domains, such as intelligent manufacturing and aerospace. In contrast to Cyber-Physical Systems (CPS), the development of HCPS emphasizes the expansion of human capabilities. Humans no longer solely function as operators or agents working in collaboration with computers and machines but extend their roles to include system design and innovation management. This paper proposes a Multisensory Interaction Framework for HCPS (MS-HCPS) that leverages human senses to facilitate system creation and management. Additionally, the introduced Multisensory Graph Convolutional Network (MS-GCN) model calculates recommendation values for multiple senses, elucidating their relevance to system development. Furthermore, the effectiveness of the proposed framework and model is validated through three practical engineering scenarios. This study explores the research on multisensory interaction in HCPS from a human sensory perspective, aiming to facilitate the progress and development of HCPS across various domains.

A multisensory Interaction Framework for Human-Cyber-Physical System based on Graph Convolutional Networks

Article

Full-text available

Aug 2024
ADV ENG INFORM

Using Transfer Learning and Radial Basis Function Deep Neural Network Feature Extraction to Upgrade Existing Product Fault Detection Systems for Industry 4.0: A Case Study of a Spring Factory

Article

Full-text available

Mar 2024

In the era of Industry 3.0, product fault detection systems became important auxiliary systems for factories. These systems efficiently monitor product quality, and as such, substantial amounts of capital were invested in their development. However, with the arrival of Industry 4.0, high-volume low-mix production modes are gradually being replaced by low-volume high-mix production modes, reducing the applicability of existing systems. The extent of investment has prompted factories to seek upgrades to tailor existing systems to suit new production modes. In this paper, we propose an approach to upgrading based on the concept of transfer learning. The key elements are (1) using a framework with a basic model and an add-on model rather than fine-tuning parameters and (2) designing a radial basis function deep neural network (RBF-DNN) to extract important features to construct the basic and add-on models. The effectiveness of the proposed approach is verified using real-world data from a spring factory.

Privacy-preserving intelligent fault diagnostics for wind turbine clusters using federated stacked capsule autoencoder

Article

May 2024
EXPERT SYST APPL

Historical Information-Aided Monitoring of Few-Sample Modes in Industrial Processes With Orthogonal Transferred Projection

Article

Full-text available

May 2024

Few-sample modes are easy to appear when a new working condition is triggered in industrial processes especially during the early stages of the new working mode. However, monitoring the early behavior of a new mode is important because engineers and operators are less knowledgeable with such a new mode. Considering the few-sample challenge in this problem, a new multisource transfer learning framework is proposed that leverages historical data under various operating conditions to enrich process monitoring over new mode data. In contrast to existing transfer learning-related work, a new unsupervised domain adaptation framework is designed. The historical modes as the source provide precious knowledge and reference to the new mode so that the features of the new mode are robust to noise and insufficient samples. Mathematically, the historical features play the role of a regularizer for the feature learning in the target domain. A geometrical illustration is given and an iterative optimization algorithm is developed with the convergence analysis. Except for the features guided by historical modes, individual features of the new mode are also extracted from the residual part to form a complete monitoring framework. Finally, the effectiveness of the proposed method is validated through a numerical experiment and a real industrial hydrocracking process.

A Semi-Supervised Fault Diagnosis Method for Transformers Based on Discriminative Feature Enhancement and Adaptive Weight Adjustment

Article

Jan 2024

The deep learning algorithms have become the general trend in transformer fault diagnos. The diagnostic accuracy is contingent upon the quantity of fault labeled samples. However, obtaining sufficient fault labeled samples remains a challenge and the collected samples in practice are unlabeled. Therefore, a semi-supervised fault diagnosis method is proposed for the transformer based on discriminative feature enhancement and adaptive weight adjustment. Firstly, the pseudo labels of unlabeled samples were generated through graph propagation and the quality of pseudo labels is crucial for fault diagnosis. Secondly, the discriminative feature enhancement was used to improve the quality of pseudo labels by optimizing diagnostic boundaries. Then, the weight of pseudo labels involved in training was adaptively adjusted using the truncated normal distribution function, based on the deviation between the confidence of pseudo labels and the mean of the normal distribution function. Finally, the proposed method was verified on the collected dataset. The experiment results demonstrated that the proposed method can guarantee the high quality and quantity of pseudo labels involved in training. The proposed method achieves a diagnostic accuracy of 94.1 % for transformer faults.

A framework for process states structural interpretation of zero-defect manufacturing

Article

Apr 2024
ADV ENG INFORM

A pruned-optimized weighted graph convolutional network for axial flow pump fault diagnosis with hydrophone signals

Article

Apr 2024
ADV ENG INFORM

Causality-Embedded Reconstruction Network for High-resolution Fault Identification in Chemical Process

Article

Mar 2024
PROCESS SAF ENVIRON

Faulty Cable Segment Identification of Low-Resistance Grounded Active Distributions via Grounding Wire Current-Based Approach

Article

May 2024

Faults in underground cables are hard to be detected directly from external features such as sound and light due to the harsh operating environment. Moreover, the rising penetration of distributed generations further complicates the characteristics of cable faults. Facing the challenges, this article first aims to reveal cable fault features via analyzing the generation mechanism of the cable grounding wire current (GWC). The impact of photovoltaic on GWC is then discussed, and the essential differences between faulty and healthy cables are also studied. Finally, considering both the structure of distribution network and an optimal deployment of measurement points, a novel fault cable segment identification method based on the amplitude and polarity of the starting terminal GWC steady-state variation is proposed. Simulations based on a real low-resistance grounded distribution network are performed in diverse fault scenarios and show a near 100% accuracy for all cases. In addition, the proposed method demonstrated a time complexity of O (896), showing a significant reduction in calculation time compared to common methods. These findings suggest that the proposed method is highly reliable and efficient for identifying faulty cable segments in active distribution networks.

Self-Supervised Joint Learning Fault Diagnosis Method Based on Three-Channel Vibration Images

Article

Full-text available

Jul 2021
SENSORS-BASEL

The accuracy of bearing fault diagnosis is of great significance for the reliable operation of rotating machinery. In recent years, increasing attention has been paid to intelligent fault diagnosis techniques based on deep learning. However, most of these methods are based on supervised learning with a large amount of labeled data, which is a challenge for industrial applications. To reduce the dependence on labeled data, a self-supervised joint learning (SSJL) fault diagnosis method based on three-channel vibration images is proposed. The method combines self-supervised learning with supervised learning, makes full use of unlabeled data to learn fault features, and further improves the feature recognition rate by transforming the data into three-channel vibration images. The validity of the method was verified using two typical data sets from a motor bearing. Experimental results show that this method has higher diagnostic accuracy for small quantities of labeled data and is superior to the existing methods.

A dual-view alignment-based domain adaptation network for fault diagnosis

Article

Full-text available

Jul 2021
MEAS SCI TECHNOL

Domain adaptation is a major area of interest in intelligent equipment maintenance and fault diagnosis in recent years. Traditional machine/deep-learning-based fault diagnosis methods assume that the source and target domains share the same distribution, which may fail and lead to catastrophic damages. Many domain adaptation-based fault diagnosis methods have been proposed to address the domain shift problem. However, most of them only align global domain distributions and ignore class relationships between domains, which leads to a decline in diagnostic performance. To overcome this deficiency, a dual-view alignment-based domain adaptation network (DVADAN) for fault diagnosis is proposed in this paper. Specifically, the proposed dual-view alignment, consisting of a global (marginal) alignment constructed with maximum mean discrepancy and a local (conditional) alignment calculating the class-centers by Wasserstein distance, is developed to reduce domain distribution discrepancy. Extensive experiments on two test rigs validated the effectiveness of the proposed DVADAN and showed its superiority over state-of-art fault diagnosis methods.

Meta-Learning in Neural Networks: A Survey

Article

Full-text available

May 2021

The field of meta-learning, or learning-to-learn, has seen a dramatic rise in interest in recent years. Contrary to conventional approaches to AI where a given task is solved from scratch using a fixed learning algorithm, meta-learning aims to improve the learning algorithm itself, given the experience of multiple learning episodes. This paradigm provides an opportunity to tackle many of the conventional challenges of deep learning, including data and computation bottlenecks, as well as the fundamental issue of generalization. In this survey we describe the contemporary meta-learning landscape. We first discuss definitions of meta-learning and position it with respect to related fields, such as transfer learning, multi-task learning, and hyperparameter optimization. We then propose a new taxonomy that provides a more comprehensive breakdown of the space of meta-learning methods today. We survey promising applications and successes of meta-learning including few-shot learning, reinforcement learning and architecture search. Finally, we discuss outstanding challenges and promising areas for future research.

An Early Fault Detection Method of Rotating Machines Based on Unsupervised Sequence Segmentation Convolutional Neural Network

Article

Dec 2021

Early fault detection (EFD) is vital for mechanical systems to reduce downtime and increase stability. The main challenge of EFD for rotating machines is to extract discriminative features from noisy signals to identify early faults. However, the lack of labels for the whole lifecycle data hinders the application of some powerful supervised deep learning methods in EFD. Besides, many EFD methods have to set a criterion manually, such as a threshold, to judge whether an early fault has occurred. To address these challenges, this paper proposes a novel early fault detection method based on Unsupervised Sequence Segmentation Convolutional Neural Network (USSCNN). At first, frequency domain features are extracted from raw signals and converted to 2D grey images. Then historical lifecycle data are labelled by USSCNN, so that a CNN classifier can be trained with these labelled data. The deep features of the historical data learned by the CNN classifier are utilized to train the Health Index (HI) Assessment Model. The proposed method is tested on three bearing datasets. The results shown that the proposed method can detect incipient faults earlier than the comparing methods with lower false alarms. And the HIs learned by the Health Index Assessment Model shown that the proposed method can extract discriminative features for EFD. More importantly, the proposed method can detect an early fault by the well-trained classifier which avoids manual criterion-making. Results of comparison demonstrated the effectiveness and the robustness of the proposed method.

How does batch normalization help optimization?

Article

Jan 2018

2018 Curran Associates Inc.All rights reserved. Batch Normalization (BatchNorm) is a widely adopted technique that enables faster and more stable training of deep neural networks (DNNs). Despite its pervasiveness, the exact reasons for BatchNorm's effectiveness are still poorly understood. The popular belief is that this effectiveness stems from controlling the change of the layers' input distributions during training to reduce the so-called “internal covariate shift”. In this work, we demonstrate that such distributional stability of layer inputs has little to do with the success of BatchNorm. Instead, we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness induces a more predictive and stable behavior of the gradients, allowing for faster training.

Task-Sequencing Meta Learning for Intelligent Few-Shot Fault Diagnosis With Limited Data

Article

Sep 2021

Recently, deep learning-based intelligent fault diagnosis methods have been developed rapidly, which rely on massive data to train the diagnosis model. However, it is usually difficult to collect sufficient failure data in practical industrial production, thus limits the application of intelligent diagnosis methods. To address the few-shot fault diagnosis problem, a task-sequencing meta learning (TSML) method is proposed in this paper. Firstly, meta learning model is trained over a series of learning tasks to obtain knowledge about how to diagnosis. Thus, the learned knowledge can help adapt and generalize with a few examples when dealing with new tasks that have never been encountered. Then, considering the difference and connection between different failures and diagnosis tasks, a task-sequencing algorithm is proposed to sort meta training tasks from easy to difficult, which followed the way human acquire knowledge. After evaluating the difficulty of each task, the proposed method learns simple tasks firstly and generalizes the learned knowledge to complex tasks. Better knowledge adaptability is obtained by gradually increasing the task difficulty. Finally, utilizing gradient-based meta learning, the initialization parameters are trained by a small number of gradient steps. The effectiveness of the proposed method is validated by a practice rolling bearing dataset and a power system dataset. The experiment results illustrate that the proposed method can identify new categories within only a few samples. In addition, it also shows advantages in fault diagnosis when the categories are fine-grained according to the working conditions.

Weight and Gradient Centralization in Deep Neural Networks

Chapter

Sep 2021

Batch normalization is currently the most widely used variant of internal normalization for deep neural networks. Additional work has shown that the normalization of weights and additional conditioning as well as the normalization of gradients further improve the generalization. In this work, we combine several of these methods and thereby increase the generalization of the networks. The advantage of the newer methods compared to the batch normalization is not only increased generalization, but also that these methods only have to be applied during training and, therefore, do not influence the running time during use. https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/?p=%2FWeightAndGradientCentralization&mode=list.

Adaptive Prototypical Networks With Label Words and Joint Representation Learning for Few-Shot Relation Classification

Article

Sep 2021

Relation classification (RC) task is one of fundamental tasks of information extraction, aiming to detect the relation information between entity pairs in unstructured natural language text and generate structured data in the form of entity-relation triple. Although distant supervision methods can effectively alleviate the problem of lack of training data in supervised learning, they also introduce noise into the data and still cannot fundamentally solve the long-tail distribution problem of the training instances. In order to enable the neural network to learn new knowledge through few instances such as humans, this work focuses on few-shot relation classification (FSRC), where a classifier should generalize to new classes that have not been seen in the training set, given only a number of samples for each class. To make full use of the existing information and get a better feature representation for each instance, we propose to encode each class prototype in an adaptive way from two aspects. First, based on the prototypical networks, we propose an adaptive mixture mechanism to add label words to the representation of the class prototype, which, to the best of our knowledge, is the first attempt to integrate the label information into features of the support samples of each class so as to get more interactive class prototypes. Second, to more reasonably measure the distances between samples of each category, we introduce a loss function for joint representation learning (JRL) to encode each support instance in an adaptive manner. Extensive experiments have been conducted on FewRel under different few-shot (FS) settings, and the results show that the proposed adaptive prototypical networks with label words and JRL has not only achieved significant improvements in accuracy but also increased the generalization ability of FSRC.

Predictive Modeling With an Adaptive Unsupervised Broad Transfer Algorithm

Article

Jun 2021

Deep-learning algorithms have produced promising results, however, domain adaptation remains a challenge. In addition, excessive training time and computing resource requirements need to be addressed. Deep-learning algorithms face a domain adaptation issue when the data distribution of a target domain differs from that of the source domain. The emerging concept of broad learning shows potential in addressing the domain adaptation and training time issues. An adaptive unsupervised broad transfer learning (AUBTL) algorithm is proposed to tackle the cross-domain problems. The proposed algorithm utilizes a sparse auto-encoder and random orthogonal mapping to extract and augment the feature space. Then, it initializes the weights of a classifier by solving a ridge regression problem. The logit ranking strategy is applied to develop a transfer estimator to evaluate and sample data in the target domain for an adaptive transfer. Based on the sampled data, AUBTL optimizes the hyper-parameter space. Performance of the AUBTL algorithm is validated with three benchmark datasets including 20 transfer tasks. The computational results demonstrated the efficiency and accuracy of the proposed algorithm over other deep learning algorithms considered in this research.

A Multi-Representation-Based Domain Adaptation Network for Fault Diagnosis

Article

Jun 2021
MEASUREMENT

Deep learning-based domain adaptation algorithms with various representations have been recently developed to address the domain shift problem in mechanical fault diagnosis. However, few research have focused on potential improvements through multiple representations. Thus, a multi-representation-based domain adaptation network is proposed in this paper. Three complementary time-frequency representations are first proposed to serve as input-based multiple representations for the subsequent parallel models. Then, parallel models with improved inception modules are trained to obtain feature-based multiple domain-invariant representations. Finally, ensemble learning through majority voting is used to obtain the final results. Comprehensive experimental results on two test rigs reveal that the proposed algorithm outperforms state-of-the-art single-representation-based domain adaptation algorithms in terms of cross-domain fault diagnosis. Furthermore, visualization results demonstrate that the proposed algorithm extracts transferable features and takes advantage of ensemble learning to achieve high-precision diagnosis.

Few-Shot Learning for Fault Diagnosis With a Dual Graph Neural Network

Abstract

Recommended publications

Adversarial Mutual Information-Guided Single Domain Generalization Network for Intelligent Fault Dia...

Early fault detection for rolling bearings: A meta‐learning approach

An Early Fault Detection Method of Rotating Machines Based on Unsupervised Sequence Segmentation Con...

Meta-Learning Based Early Fault Detection for Rolling Bearings via Few-Shot Anomaly Detection