ArticlePDF Available

Multi-task Learning Based on Lightweight 1DCNN for Fault Diagnosis of Wheelset Bearings

Authors:

Abstract

In recent years, deep learning has been proven to be a promising bearing fault diagnosis technology. However, most of existing methods are based on single-task learning. Fault diagnosis task is treated as an independent task, and rich correlation information contained in different tasks is ignored. Therefore, this paper explores possibility of using speed identification and load identification tasks as two auxiliary tasks to improve performance of the fault diagnosis task, and proposes a multi-task one-dimensional convolutional neural network (MT-1DCNN). Specifically, the MT-1DCNN utilizes trunk network to learn shared features required for every task, and then processes different tasks through multiple taskspecific branches. In this way, the MT-1DCNN can utilize features learned by related tasks to improve performance of the fault diagnosis task. Experiment results with wheelset bearing dataset show that the multi-task learning can make full use of the feature information captured by the speed identification and the load identification tasks to improve fault diagnosis performance of the network, and the MT-1DCNN has better performance than five excellent networks in accuracy.
0018-9456 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2020.3017900, IEEE
Transactions on Instrumentation and Measurement
1
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
Abstract—In recent years, deep learning has been
proven to be a promising bearing fault diagnosis
technology. However, most of existing methods are based
on single-task learning. Fault diagnosis task is treated as
an independent task, and rich correlation information
contained in different tasks is ignored. Therefore, this
paper explores possibility of using speed identification and
load identification tasks as two auxiliary tasks to improve
performance of the fault diagnosis task, and proposes a
multi-task one-dimensional convolutional neural network
(MT-1DCNN). Specifically, the MT-1DCNN utilizes trunk
network to learn shared features required for every task,
and then processes different tasks through multiple task-
specific branches. In this way, the MT-1DCNN can utilize
features learned by related tasks to improve performance
of the fault diagnosis task. Experiment results with
wheelset bearing dataset show that the multi-task learning
can make full use of the feature information captured by the
speed identification and the load identification tasks to
improve fault diagnosis performance of the network, and
the MT-1DCNN has better performance than five excellent
networks in accuracy.
Index Terms—Multi-task learning, Convolutional neural
network, Bearing fault diagnosis, Vibration analysis.
I. INTRODUCTION
HEELSET bearing is core component of high-speed train
(HST) bogie, its mechanical performance greatly affect
safety and reliability of the HST operation. Therefore,
automatic health monitoring for wheelset bearing is of great
significance [1]. Due to long-term operation of the HST under
time-variant conditions such as speed, load, and operation
environment, vibration signals from wheelset bearing are easily
interfered. It brings a big challenge for accurate fault diagnosis
based on vibration analysis.
The main work of fault diagnosis research is to extract useful
information from vibration signals, and then use classification
methods to obtain robust diagnosis results. Scholars have
proposed various signal processing methods to extract
This work was supported by the National Natural Science Foundation of
China (61833002). (Corresponding authors: Huan Wang and Yong Qin).
Zhiliang Liu, Huan Wang, Junjie Liu are with the School of Mechanical and
Electrical Engineering, University of Electronic Science and Technology of
China, Chengdu, 611731, China.
Yon Qin is with the State Key laboratory of Rail Traffic Control and Safety,
Beijing Jiaotong University, Beijing, 100044, China.
Dandan Peng is with the Department of Mechanical Engineering, KU Leuven,
Leuven, Belgium BE-3001.
representative features. For example, variational mode
decomposition [2, 3], empirical wavelet transform [4, 5] and
local mean decomposition [6, 7]. In addition, support vector
machine (SVM) [8, 9], k-nearest neighbor [10, 11] and multi-
layer perceptron [12] are o ft en use d as class ifiers t o predic t fault
types. For instance, Zheng et al. [13] proposed a fault diagnosis
method based on multi-scale fuzzy entropy and SVM for rolling
bearing. Kihoon et al. [14] reduced diagnostic errors by fusing
multiple classifier decisions. However, these methods rely
heavily on the domain knowledge of professionals, and they
cannot comprehensively extract the complex dynamic features
of the signals. Robustness and accuracy of these methods need
to be further improved.
As an efficient feature extraction and pattern recognition
method, deep learning attracts more and more attention from
researchers [15]. In particular, convolutional neural network
(CNN) has achieved significant success in fault diagnosis of
rotating machinery due to its unique feature learning
mechanism through convolution operation [16-27]. For
example, Liu et al. [21] proposed a residual CNN with a multi-
scale kernel for motor fault diagnosis in non-stationary
conditions. Zhang et al. [28] proposed a deep CNN with wide
first-layer kernels, which can better learn the long-time
information of vibration signals. These methods are based on
one-dimensional CNN (1DCNN) [17-21, 29], which uses the
1DCNN to automatically learn useful information of vibration
signals and diagnose the health condition of machinery. In
addition, Wen et al. [30] transformed vibration signals into two-
dimensional (2D) image, and then used two-dimensional CNN
(2DCNN) to learn the useful features. The 2DCNN-based
methods usually require converting 1D signal into 2D matrix
(e.g. time-frequency spectra) [30, 31]. Compared with 2DCNN,
1DCNN can learn the features directly, and the structure is
relatively simple, so it is more suitable for bearing fault
diagnosis.
The above deep learning networks are all based on single-
task learning. Their network parameter optimization is
constrained by fault diagnosis task (FDT), and thus the features
learned by the network are only applicable to the diagnosis of
mechanical health condition. This approach seems reasonable,
but there are implicit shortcomings. Many problems in the real
world cannot be decomposed into independent subtasks. Even
if it can be decomposed, its subtasks are related to each other
and are connected by some sharing factors or sharing features
[32]. Therefore, if the real problem is treated as multiple
independent single tasks, the rich associated information among
these tasks is ignored.
Multi-task learning (MTL) [32-35] is a machine learning
method aimed at solving multiple tasks at the same time. It can
Multi-task Learning Based on Lightweight 1DCNN for
Fault Diagnosis of Wheelset Bearings
Zhiliang Liu, Member, IEEE, Huan Wang, Junjie Liu, Yong Qin, Member, IEEE, Dandan Peng
W
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.
0018-9456 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2020.3017900, IEEE
Transactions on Instrumentation and Measurement
2
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
use the useful information learned by related tasks to improve
performance of the network. Rich et al. [32] summarized that
the MTL is an approach to inductive transfer that improves
generalization by using domain information contained in the
training signals of related tasks as an inductive bias. Simply put,
this method can learn the shared features of multiple tasks, and
allows the features that are specific to a task of this shared
features to be used by other tasks, which can effectively
promote the results of the main task and other auxiliary tasks.
This advantage is not possessed by single-task learning.
There are some research on MTL in the field of fault
diagnosis. For example, Guo et al. [36] introduced MTL to the
field of fault diagnosis and proposed a multi-task neural
network for processing fault mode task and fault location task
simultaneously. Liu et al. [37] proposed a MTL method that
simultaneously predicts the fault category and remaining useful
life. Cao et al. [38] used MTL to diagnose health status of
planetary gearbox. These methods proved effectiveness of the
MTL in the FDT, but lacks in-depth analysis and interpretation
of feature learning mechanism of the MTL. In addition, these
works ignore the correlation between the working condition and
the health condition.
Vibration response of rotating machinery is related not only
to the health condition, but also to the working condition, such
as speed and load. Fig. 1 shows the vibration responses of the
wheelset bearing with two different fault categories at different
speeds and loads. The fault category, speed and load of rotating
mechanical system have a great influence on the vibration
responses. If we make the network learn these related tasks
together, and make them share the learned features, the network
can have a more comprehensive understanding of the vibration
signals, and the learning features also have better generalization
performance.
F3 F8
(a)
(b)
(c)
(d)
Fig. 1. Vibration signals with different fault categories (F3, F8) at different
speeds and vertical loads. (a) speed 60km/h, vertical load 56KN; (b)
speed 120km/h, vertical load 56KN; (c) speed 60km/h, vertical load
272KN; (d) speed 120km/h, vertical load 272KN. (Note: F3 and F8 are
described in detail in Section IV)
Therefore, this paper introduces the MTL principle into the
bearing fault diagnosis, and proposes a multi-task one-
dimensional convolutional neural network (MT-1DCNN). The
MT-1DCNN aims to enhance the performance of the network
by using the two auxiliary tasks: speed identification and load
identification. Specifically, the MT-1DCNN processes three
tasks at the same time, and first learns the shared features
among multiple tasks through the trunk network. Subsequently,
the MT-1DCNN uses multiple task-specific branches to process
these tasks. The input of these branches is the shared features
learned by trunk network. Every task-specific branch can take
advantage of the shared features learned by multiple tasks. In
this way, the features specific to one task of the shared features
can be used by other tasks, so that the network can fully
understand the characteristics of the signals and improve the
accuracy of each task. In addition, each task has an independent
loss function. The overall loss of the MT-1DCNN is obtained
by adding up the loss functions of these tasks according to a
certain weight. Powered by MTL, the MT-1DCNN can process
three tasks simultaneously in a lightweight network structure,
and good results can be obtained.
Contributions of this paper are summarized as follows:
1) We introduce the MTL principle to wheelset bearing fault
diagnosis. Effectiveness of the MTL principle has been
demonstrated with implementations based on two deep learning
architectures.
2) We propose a lightweight CNN-based network that uses
vibration signals to simultaneously deal with three related tasks:
fault diagnosis task, speed identification task, and load
identification task.
3) We conduct a set of performance comparison with the
wheelset bearing dataset. In addition, we interpret feature
learning mechanism of MTL by using visualization technique.
The paper is organized as follows. Section II defines MTL in
detail. Section III describes the MT-1DCNN meticulously.
Section IV verifies the effectiveness and superiority of the MT-
1DCNN with the wheelset bearing dataset. Section V discusses
four aspects of the MT-1DCNN. Section VI summarizes the
whole paper.
II. MULTI-TASK LEARNING CONCEPT
Given m learning tasks 1
{}
im
i
where all tasks or a subset of
them are related, MTL aims to improve the learning of a model
for the i
by using the knowledge contained in all or some of
the m tasks [33].
Based on this definition, we focus on supervised learning in
MTL since most FDTs fall in this setting. In the setting of
supervised learning tasks, usually a task i
is accompanied by
a training dataset Di consisting of ni training samples, i.e.,
1
{,}
i
iiin
jjj
Dgl
, where i
id
j
gis the j-th training instance in i
and i
j
l is its corresponding label. Here we consider a special
setting for MTL that the training data Di for each task is the
same. In this setting, the network learns shared features from
the same data set that can be used for multiple task processing.
This sharing feature can be shared among these tasks, so as to
improve the generalization performance of the network. Thus,
sharing what is learned by different tasks while tasks are trained
in parallel is the central idea of MTL.
III. MT-1DCNN BASED FAULT DIAGNOSIS METHOD
This study is devoted to explore application of the MTL in
improvement of wheelset bearing fault diagnosis. HST working
condition (such as speed and load) is closely related to vibration
response of wheelset bearing. A fault diagnosis method is
expected to integrate all these comprehensive information of
wheelset bearings and to improve diagnosis results. Therefore,
this paper proposes MT-1DCNN, which can learn fault
information and working condition information of vibration
signals at the same time. Focusing on the FDT, this method
introduces speed identification task (SIT) and load
identification task (LIT) as two auxiliary tasks to obtain more
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.
0018-9456 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2020.3017900, IEEE
Transactions on Instrumentation and Measurement
3
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
generalized shared features through multi-task collaborative
learning. The overall structure of the MT-1DCNN is shown in
Fig. 2. The MT-1DCNN mainly consists of three parts: trunk
network, task-specific branches, and multi-loss function.
F11F10F2F1
MT-1DCNN
F11
F2
F1
GAP+Softmax
L4L3L2L1
GAP+Softmax
S4S3S2S1
GAP+Softmax
Inpu t 1D
Signal
Fa ult D ia gno si s Br an chSpeed Identification Branch Load Identification Branch
f
ss ll
Loss L L L
 
f
L
Trunk Network
Multi-los s Function
s
L
l
L
Fig. 2. The overall architecture of the MT-1DCNN.
A. Trunk Network
In the MT-1DCNN, the trunk network takes the 1D vibration
signal as input, and then uses multiple convolutional layers to
learn the rich features contained in the raw signal. Different
from other CNNs, the trunk network can learn not only the
fault-related features, but also the features specific to fault-
related tasks. That is, the trunk network can learn shared
features of multiple related tasks, which contain all the feature
information needed to process these tasks. Inspired by Wei et
al. [28], we build a lightweight and excellent trunk network,
whose structure is described in Fig. 2 and TABLE I. The trunk
network consists of five convolution modules, each of which
consists of a convolution layer and a ReLU activation function.
The length of the input 1D signals is 2048×1. To capture the
long-term features of the signal, the convolution kernel size of
the first and the second layers of the trunk network is set to 12×1.
To reduce the network parameters, we gradually reduce the size
of the convolution kernel. In addition, to reduce the complexity
of the trunk network, we set the number of channels to 16 in the
first convolution layer, and then gradually increases to 32. In
the network, the stride size of each convolutional layer is set to
2 to achieve the down-sampling, which can avoid the
information loss caused by using the max-pooling. It can be
seen that our trunk network can capture the long-term features
and short-term features of the input signals with small model
complexity, as well as effectively learn the shared features of
multiple tasks.
B. Task-specific Branches
This study aims to use the MTL principle to simultaneously
process the FDT, the SIT, and the LIT, so that they can share
features with each other and promote the performance of the
FDT. Among them, the FDT is to diagnose the health condition
of the bearing. The SIT and the LIT are to perceive speed and
load of the rotating mechanical system, respectively. To this
end, we design three task-specific branches, which are fault
diagnosis branch (FDB), speed identification branch (SIB) and
load identification branch (LIB). These three branches share the
learning features of the trunk network. Therefore, the
introduction of the SIB and the LIB enables the trunk network
to learn the speed and load information implicitly including in
the vibration responses. The FDB can make full use of the rich
information of the shared features to accurately distinguish
different fault categories.
TABLE I
NETWORK CONFIGURATION OF THE MT-1DCNN ARCHITECTURE
Layer Layer
Typ e
Kernel
Size Channel Stride Padding
1 Conv 12×1 16 2 Yes
2 Conv 12×1 16 2 Yes
3 Conv 9×1 24 2 Yes
4 Conv 9×1 24 2 Yes
5 Conv 6×1 32 2 Yes
Task-specific Branches
Speed Identification Fault Diagnosis Branch Load Identification
Layer Size Stride
Type channel padding
Layer Size Stride
Type channel padding
Layer Size Stride
Type channel padding
Conv 6×1 2
32 Yes
Conv 6×1 2
32 Yes
Conv 6×1 2
32 Yes
Conv 3×1 2
64 Yes
Conv 3×1 2
64 Yes
Conv 3×1 2
64 Yes
Global Average Pooling Global Average Pooling Global Average Pooling
Softmax Softmax Softmax
As shown in Fig. 2 and TABLE I, suppose that the shared
features learned by the trunk network is 1
(;),
tT
t
MfX X
,
where X is the input signal of the network, T = 2048 is the length
of the signal, f t represents the function learned by the trunk
network, and t
is the parameter of f t. The FDB, the SIB and
the LIB take M as input, and then use two convolution modules
to learn the features (Yf, Ys and Yl) that are used for specific tasks
processing from M. This process can be expressed as Eq. (1).
, , (; ), (;), (;),
fsl f s l
fsl
YYY f M fM fM

(1)
where f f, f s, and f l are the feature extraction functions learned
by FDB, SIB and LIB respectively, and θf, θf, and θl are the
corresponding parameters.
Then, a global average pooling layer (GAP) [39] is used to
compress the global information of each channel on Yf, Ys and
Yl into a channel descriptor, so as to get feature vectors zf, zs,
and zl. The j-th element of zf is calculated by Eq. (2).
1
1
() (),
1
W
ff ffC
jj
u
zGAPY Yuz
W
 
(2)
where W and C are the length and number of channels of Yf
respectively.
The GAP can compress the input features into a vector,
which greatly reduces the network parameters. Therefore, it can
effectively avoid the over fitting problem caused by the full
connection layer. In addition, the GAP is more native to the
convolution structure by enforcing correspondences between
feature channels and categories [39].
Wheelset bearing has many fault types, so the FDT is a multi-
class classification problem. In this study, we also consider the
SIT and the LIT into multi-class classification problems. The
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.
0018-9456 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2020.3017900, IEEE
Transactions on Instrumentation and Measurement
4
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
softmax activation function is used for the three tasks. The
softmax function maps the input feature vector to a range from
0 to 1, and makes the sum of all elements of the vector equal to
1, so that it is generally used as the classifier to estimate the
probability distribution belonging to different classes. We
assume that k f, ks and kl are the number of health condition,
speed condition and load condition, respectively. In this paper,
k f = 11 and ks = kl = 4. The softmax function is expressed as Eq.
(3).
1
ˆ
exp( )
ˆ
() , 1,2, , ,
ˆ
exp( )
j
jk
jj
z
Qz j k
z

(3)
where ˆj
zis the j-th element of ˆ
z,and ˆ
z is the input vector of
softmax activation function. ˆ
()
j
Qz is the estimated probability
distribution of ˆ
z belonging to the j-th class.
C. Multi-loss Function
When designing a deep neural network, the choice of the loss
function is always an important aspect. Mean squared error and
mean absolute error often lead to poor performance when used
with gradient-based optimization. Some output units that
saturate produce very small gradients when combined with
these cost functions [40]. Recently, the cross-entropy loss
function gets its popularity and has been widely used in
classification tasks because of its better performance.
The MT-1DCNN needs to process three different
classification tasks simultaneously, so three cross-entropy loss
functions are set. They are Lf for the FDT, Ls for the SIT, and Ll
for the LIT. These three loss functions are independent of each
other. The cross-entropy loss function is mainly used to
evaluate the error of the estimated softmax output probability
distribution and the target class probability distribution.
Suppose pf, ps and pl are the target distribution of the FDT, the
SIT, and the LIT, respective ly, and qf, qs and ql are the estimated
distributions of the three tasks. Lf, Ls and Ll are expressed as Eq.
(4).
111
log( ); log( ); log( )
kkk
fsl
f
fssll
f
jjs jjl jj
jjj
LpqLpqLpq

  

(4)
In order to make the three tasks co-trained in the MT-
1DCNN and use the features learned by the SIT and the LIT to
improve the fault diagnosis performance of the network, adding
Lf, Ls and Ll directly to obtain the final loss function of the
network is not optimal. This is because the contribution of
different auxiliary tasks to the FDT is inconsistent, and if the
auxiliary tasks are given too much weights, the features learned
by the network may be more biased to solve the auxiliary tasks.
Therefore, we introduce two hyperparameters (i.e. λs and λl) to
control the weights of the SIT and the LIT, respectively. The
total loss of the MT-1DCNN are expressed as Eq. (5).
f
ss ll
Loss L L L
 (5)
The accuracy of each task is different, which could result in
unbalance problem in the loss function. To address this problem,
we introduce two weight coefficients in the loss function to
balance accuracies of the three task branches. Selection of the
two hyperparameters will be discussed in Section IV.C. Then,
we use the stochastic gradient descent method to minimize the
loss function in Eq. (5). During training, the network
simultaneously processes the FDT, the SIT, and the LIT. In
other words, the network updates the parameters θt, θf, θs, and
θl at the same time. As training progresses, the trunk network
learn more generalized shared features, and task-specific
branches can take advantage of the features learned from other
tasks to achieve better performance.
IV. EXPERIMENT RESULTS AND ANALYSIS
In this section, effectiveness and superiority of MTL and the
proposed MT-1DCNN is verified on the wheelset bearing
dataset.
A. Test Rig for Fault Experiments
As shown in Fig. 3, the wheelset bearing test rig is
established to evaluate the effectiveness of the MT-1DCNN.
The test rig is designed to simulate a real train operating
environment. It is mainly composed of a drive motor, a belt
transmission system, and a control system. In addition, we also
set up a vertical loading device, a lateral loading device, and
two fan motors to simulate wind resistance and 2D loads during
the real train operation. An axle and its two supporting bearings
are assembled to the test rig. The experimental data are
collected by acceleration sensors installed in the axle boxes, and
the sampling frequency is 5120 Hz.
Wheel and Axle
Later al Loadi ng
Set Drive Motor
Belt Tr ansmission
System
Fan Motor
Vertical Loading
Set
Axle Box
and Test Bearing
Fan Motor
Fig. 3. The wheelset bearing test rig.
In this test rig, we collected bearings with naturally-
generated faults from real operation lines, including a total of
11 health conditions. TABLE II shows the health condition
information of the experimental bearings, which are marked as
F1, F2, ..., F11. To simulate the working environment of
wheelset bearings on a real train as much as possible, four
operation speeds (60, 90, 120, and 150 km / h) and four vertical
loads (56, 146, 236, and 272 KN) were set in each healthy
condition. Therefore, the SIT has four categories, which are
respectively labeled as S1, S2, S3, and S4; the LIT has four
categories, which are respectively labeled as L1, L2, L3, and
L4.
TABLE II
ELEVEN HEALTH CONDITIONS OF WHEELSET BEARINGS.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.
0018-9456 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2020.3017900, IEEE
Transactions on Instrumentation and Measurement
5
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
B. Experiment Setup
The data are randomly divided into training set, validation set
and test set according to the ratio of 3:1:2, and then sliding
segmentation method is used for data augmentation. The sliding
segmentation is an efficient vibration signal data augmentation
method, which has been used in [16, 28]. In our experiment, the
length of each sample is 2048, and the step size of sliding
segmentation is set to 256. Finally, 45,652 training samples,
13,332 validation samples and 19,700 test samples are obtained.
And four repeating experiments are carried out for each method.
The MT-1DCNN is implemented by Keras library and
Python 3.5. Network training and testing are performed on a
workstation with Ubuntu 16.04 operating system, an Intel Core
i9-9900K CPU and a GTX 2080 GPU with 8G video memory.
Each sample accelerates the convergence speed of the network
by subtracting the mean and dividing the variance. During the
training, the learn rate is 0.0001 and batch size is 96.
Accuracy, recall and precision are used as evaluation metrics
to comprehensively measure the performance of classification
methods. They are defined as Eq. (6-8).
100%
TP TN
Accuracy TP FN FP TN

, (6)
= 100%
TP
Recall TP FN
, (7)
=100%
TP
Precision TP FP
, (8)
where FP, TP, FN and TN refer to the number of false positive
samples, true positive samples, false negative samples and true
negative samples, respectively.
To better simulate the strong noise interference of rolling
bearings in real operation, we add white Gaussian noise into the
raw signals. The definition of SNR is shown as Eq. (9).
10
10
s
ignal
dB
noise
P
SNR log P



, (9)
where Psignal and Pnoise are power of the signal and the noise,
respectively.
We compared the MT-1DCNN with the following five
excellent networks. They are Wen-2DCNN [30], Guo-2DCNN
[31], Wei-1DCNN [28], Zhang-1DCNN [17]. In addition, the
5-layer BPNN is tested for comparison as well. The hidden
layers are 1024, 512, 256, 96, and 11, respectively. Notably, the
training strategy is consistent for every network in all validation.
C. Selecting Weight Coefficients in the Loss Function
In the MT-1DCNN, the two auxiliary tasks have different
contributions to the FDT, which may results in the unbalance
problem in the loss function. The unbalance problem could be
alleviated by tuning the two weight coefficients, whose values
are determined by a grid search with the accuracy of the FDB
in this paper.
In this experiment, λs and λl are sequentially set to 0.2, 0.4,
0.6, 0.8 and 1 under SNR=−5 dB. In other words, we have
conducted 25 different experiments to discuss the influence of
weight coefficients on fault diagnosis performance. The
accuracy of fault diagnosis under different weight coefficient
settings is shown in TABLE III.
Obviously, different weight settings have an impact on the
fault diagnosis performance of the proposed network. This
shows that the three tasks interact with each other, and the
network can learn their correlation. In addition, it also proves
that different auxiliary tasks have different contributions to the
FDT, so it is necessary to set different weight coefficients for
different auxiliary tasks. On the other hand, when λs and λl
increase from 0.2 to 1, the fault diagnosis accuracy of the MT-
1DCNN increases first and then decreases. This is because a
lower weight makes the auxiliary task cannot provide enough
contribution, and a higher weight makes the network more
inclined to process the auxiliary task. When λs = 0.6 a nd λl = 0 .4,
the MT-1DCNN achieves the best fault diagnosis performance.
Therefore, in the subsequent experiments, λs and λl are set to 0.6
and 0.4, respectively.
TABLE III
PERFORMANCE RESULTS FOR DIFFERENT WEIGHTS (SNR=5 DB).
λs = 1 λs = 0.8 λs = 0.6 λs = 0 .4 λs = 0.2
λl = 1 0.829±0.013 0.829±0.013 0.833±0.012 0.825±0.003 0.812±0.005
λl = 0.8 0.830±0.004 0.829±0.010 0.839±0.003 0.818±0.010 0.815±0.010
λl = 0.6 0.832±0.016 0.818±0.021 0.839±0.009 0.827±0.006 0.821±0.009
λl = 0.4 0.833±0.011 0.841±0.013 0.847±0.007 0.818±0.006 0.8 27±0.010
λl = 0.2 0.825±0.015 0.834±0.015 0.823±0.018 0.826±0 .015 0.826±0.011
D. Effectiveness of MTL for the Fault Diagnosis Task
In this section, we discuss the impact of the MTL on the FDT.
Under SNR=−5 dB, we compare the MT-1DCNN with three
network structures. The three networks are CNN-F (only the
FDB is included), CNN-FS (including the FDB and the SIB),
and the CNN-FL (including the FDB and the LIB). The results
of the four networks are shown in TABLE IV.
TABLE IV
PERFORMANCE RESULTS OF THE FAULT DIAGNOSIS TASK (SNR=5 DB).
MT-1DCNN CNN-FS CNN-FL CNN-F
Accuracy 0.847±0.007 0.817±0.018 0.800±0.015 0.724±0.028
Recall 0.837±0.009 0.805±0.018 0.787±0.017 0.702±0.032
Precision 0.840±0.007 0.808±0.020 0.790±0.017 0.712±0.030
Compared with the CNN-F, the CNN-FS with the SIB has a
more than 9% improvement in the accuracy of the FDT, which
shows that the speed information is quite important for the FDT,
and it can effectively assist the network in fault diagnosis. In
addition, compared with the CNN-F, the CNN-FL with load
identification branch improves accuracy of fault diagnosis by
7.5%, which also proves that the load information of rotating
mechanical system can promote performance of the FDT.
Compared with the LIT, the addition of the SIT improves the
fault diagnosis performance of the network. This is because the
vibration signals has greater changes at different speeds, so the
network can reduce the occurrence of misjudgment with the
help of the speed information. Compared with the CNN-FS, the
CNN-FL and the CNN-F, the MT-1DCNN has a considerable
improvement in accuracy, recall, and precision. It means that
load information and speed information can complement each
other and improve the diagnostic performance of the network.
The experiment results prove that the MTL can effectively
improve the fault diagnosis performance of the network.
Subsequently, we use the t-distributed stochastic neighbor
embedding (T-SNE) [41] technique to visualize the distribution
Metric Method
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.
0018-9456 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2020.3017900, IEEE
Transactions on Instrumentation and Measurement
6
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
of the final output of the MT-1DCNN, the CNN-FS, the CNN-
FL and the CNN-F in 2D embedded space. The visualization
results are shown in Fig. 4, where color represents health
condition of the wheelset bearing. Coordinates of each point
represent its position in the 2D embedded space, and distance
between two points represents their similarity. We use the
Fisher score [42] as the metric to quantify quality of the
projection. The fisher scores are 2.679, 2.298, 1.633, and 1.524
for the four networks. Obviously, the output of the network with
the MTL ability has a better discrimination ability. In particular,
the proposed MT-1DCNN performs at least 14% better than
other comparison networks in terms of the Fisher score. It
indicates that the MT-1DCNN can effectively improve the
feature learning ability of the network.
Dim 1
Dim 2
MT-1DCNN (Fi sher score: 2. 679) CNN-FS (Fi sher score: 2.29 8)
CNN-FL ( Fisher scor e: 1.633) CNN-F (Fi sher score: 1.524)
F7
F8
F9
F10
F11
Dim 1
Dim 1 Dim 1
Dim 2
F1
F2
F3
F4
F5
F6
Dim 2Dim 2
Fig. 4. Group plot visualization for the MT-1DCNN, the CNN-FS, the
CNN-FL and the CNN-F (SNR=5dB)
E. Effectiveness of MTL for Speed Identification Task
In this section, we discuss the impact of the MTL on the SIT.
Under SNR=−5 dB, we set up a comparative experiment of
three network structures including the CNN-S (only the SIB is
included), the CNN-FS and the MT-1DCNN. The speed
identification results of every networks are shown in TABLE
V.
Surprisingly, although the MT-1DCNN did not specifically
optimize the SIT, with the addition of the auxiliary tasks, the
speed identification performance of the network has been
improved. After adding the FDT, the speed identification
accuracy of the CNN-FS is 7% higher than that of the CNN-S.
The MT-1DCNN is 1% higher than the CNN-FS after the LIT
is added. This shows that the MT-1DCNN is not only suitable
for the FDT, but also effectively improve the performance of
related tasks at the same time. This inspires us to explore more
methods with MTL in the future work, such as combination of
life prediction task and the FDT.
TABLE V
PERFORMANCE RESULTS OF SPEED IDENTIFICATION TASK (SNR=5 DB).
MT-1DCNN CNN-FS CNN-S
Accuracy 0.887±0.012 0.878±0.007 0.805±0.004
Recall 0.888±0.012 0.879±0.007 0.806±0.005
Precision 0.887±0.012 0.880±0.008 0.809±0.004
F. Effectiveness of MTL for Load Identification Task
In this section, we discuss the impact of the MTL on the LIT.
Under SNR=−5 dB, we set up a comparative experiment of
three network structures including the CNN-L (only the LIB is
included), the CNN-FL and the MT-1DCNN. The load
identification results of each network are shown in TABLE VI.
Similarly, with the addition of related tasks, the network's
load identification performance has been improved. The
accuracy of the CNN-FL is 7% higher than that of the CNN-L;
the accuracy of the MT-1DCNN is 4% higher than that of the
CNN-FL. This again demonstrates effectiveness of the MTL in
dealing with multiple related tasks. However, the performance
of the LIT is not particularly good compared with that of the
SIT. This is because, due to the existence of standardized
operations, the difference of signals under different loads is
severely eliminated. Secondly, this is a result obtained under
SNR=5 dB, and strong noise interferes with the judgment of
the network. Thirdly, the goal of the MT-1DCNN is to improve
performance of the FDT, so there is no special optimization for
auxiliary tasks.
TABLE VI
PERFORMANCE RESULTS OF LOAD IDENTIFICATION TASK (SNR=5 DB).
MT-1DCNN CNN-FL CNN-L
Accuracy 0.640±0.012 0.596±0.011 0.522±0.023
Recall 0.637±0.012 0.593±0.011 0.519±0.022
Precision 0.638±0.013 0.593±0.009 0.516±0.023
G. Comparison with Excellent Methods
In this section, we compare the diagnostic performance of the
MT-1DCNN and five excellent networks under four SNR
scenarios (10 dB, 5 dB, 0 dB, and −5 dB), which are used to
simulate working condition of the wheelset bearing under
different noise levels. The fault diagnosis results of every
networks are shown in TABLE VII.
Obviously, accuracy, recall and precision of the MT-1DCNN
are better than that of other comparison methods under the four
SNR scenarios. In particular, when the noise is strong, the MT-
1DCNN is considerable better than other comparison methods
in terms of diagnostic accuracy. For example, when SNR=−5
dB, the fault diagnosis accuracy of the MT-1DCNN is more
than 13% higher than that of the Wen-2DCNN, which has the
best performance among the comparison methods. It means that
the MT-1DCNN has a relatively strong anti-noise ability even
though without any additional de-noising preprocessing. With
increasing of SNR, the fault diagnosis performance of the
network increases. Under SNR=0 dB, which means the power
of noise is about one times of the original signal power, the MT-
1DCNN still obtains 96.3% in accuracy of fault diagnosis.
Under this noise, the Wen-2DCNN obtained the best results in
the comparison methods, but the accuracy is only 91.7%. Under
SNR=10 dB, accuracy of the MT-1DCNN can achieve more
than 99.4%. It indicates that even though in the noisy
background environment, the MT-1DCNN can still achieve a
great diagnostic performance and has certain practical
application potential. Moreover, the standard deviation of the
MT-1DCNN is smaller than other methods in most cases, which
shows a good stability.
In order to analyze the diagnosis results of each category and
to understand its precision and recall in detail, confusion
matrices of the proposed MT-1DCNN under SNR=10 dB and
SNR=−5 dB are provided in Fig. 5 and Fig. 6, respectively,
where row represents predicted label; column represents true
label; the diagonal is the number of accurate diagnoses for every
categories; the bottom row shows the precision of every
Metric Method
Metric Method
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.
0018-9456 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2020.3017900, IEEE
Transactions on Instrumentation and Measurement
7
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
TABLE VII
PERFORMANCE RESULTS OF THE MT-1DCNN AND THE FIVE COMPARISON NETWORKS UNDER THE FOUR SNR SCENARIOS.
SNR MT-1DCNN Wei-1DCNN Wen-2DCNN Zhang-1DCNN Guo-2DCNN BPNN
10 dB
Accuracy 0.994±0.001 0.981±0.005 0.991±0.001 0.988±0.003 0.894±0.011 0.670±0.003
Recall 0.993±0.001 0.980±0.005 0.990±0.001 0.988±0.002 0.886±0.012 0.659±0.007
Precision 0.993±0.001 0.980±0.005 0.990±0.001 0.987±0.002 0.887±0.011 0.661±0.006
5 dB
Accuracy 0.985±0.001 0.966±0.006 0.980±0.002 0.964±0.006 0.840±0.010 0.621±0.004
Recall 0.984±0.001 0.964±0.007 0.978±0.002 0.963±0.010 0.826±0.010 0.602±0.004
Precision 0.984±0.001 0.964±0.007 0.979±0.002 0.963±0.006 0.828±0.012 0.608±0.003
0 dB
Accuracy 0.963±0.004 0.894±0.009 0.917±0.004 0.866±0.015 0.754±0.010 0.528±0.001
Recall 0.962±0.004 0.889±0.009 0.910±0.005 0.853±0.017 0.736±0.011 0.499±0.002
Precision 0.960±0.004 0.889±0.010 0.910±0.004 0.864±0.017 0.737±0.010 0.508±0.004
−5 dB
Accuracy 0.847±0.007 0.666±0.008 0.712±0.005 0.656±0.024 0.546±0.006 0.363±0.003
Recall 0.837±0.009 0.640±0.008 0.686±0.007 0.636±0.019 0.510±0.007 0.317±0.003
Precision 0.840±0.007 0.648±0.015 0.693±0.007 0.651±0.021 0.517±0.008 0.330±0.003
TABLE VIII
NUMBER OF PARAMETERS AND ONE BATCH SIZE (96 SAMPLES) TEST TIME FOR THE MT-1DCNN AND THE FIVE COMPARISON NETW ORKS.
MT-1DCNN Wei-1DCNN Wen-2DCNN Zhang-1DCNN Guo-2DCNN BPNN
Number of Parameter 5.5×104 5.4×10
4 5.9×10
5 1.3×10
5 1.5×10
4 2.7×10
6
Test Time/s 1.204 0.563 0.505 0.569 0.501 0.497
categories; and the rightmost column represents the number of
testing samples of every categories. It is seen that the main error
comes from the wrong classification between fault modes.
According to their recall values, F4 is the most confusing fault
type for the MT-1DCNN. The major of wrongly classified F4
samples are classified into F3. In addition, under SNR=10 dB,
the recall and the precision of the network in most fault modes
are close to 100%, which indicates that the MT-1DCNN has
good diagnostic performance when the noise is weak. Even
under SNR=−5 dB, precision of the MT-1DCNN for the normal
category can still reach 95.55%, which shows that the network
can still distinguish normal samples and fault samples well
under strong noise.
Fig. 5. Confusion matrix of the proposed MT-1DCNN (SNR=10 dB).
Fig. 6. Confusion matrix of the proposed MT-1DCNN (SNR=5 dB).
H. Computational Burden of the Networks
Computational burden of the networks is an important metric
to measure performance of bearing fault diagnosis methods. So,
this section quantitatively calculates the number of parameters
and test time for each method. TABLE VIII summarizes the
total number of parameters and one batch size (96 samples) test
time for every networks. It is seen that the MT-1DCNN has a
very lightweight structure, which obtains better performance
than other networks (such as the Wen-2DCNN) by using a
small number of parameters, which proves that the MT-1DCNN
has a higher parameter utilization rate. However, the MT-
1DCNN needs more test time. This is not surprising, since the
MT-1DCNN has to process three different tasks simultaneously,
which obviously leads to more test time. It is worth pointing out
that the test time of the MT-1DCNN on 96 samples is 1.204
seconds, which is acceptable in engineering practice.
V. DISCUSSIONS ABOUT THE MT-1DCNN
A. Understanding Feature Learning Mechanism of MTL
To understand the feature learning mechanism of the MT-
1DCNN, we use the T-SNE [39] technology to visualize the
distribution of features in different layers of the network in the
2D space. Fig. 7 shows the visualization of the shallow features,
the shared features of the trunk network, the features of every
task-specific branch and the final output.
It is seen that the shallow features learned by the trunk
network do not contain specific task information. As the
network deepens, under the constraints of related tasks, the
trunk network learns the domain-specific information required
for multiple related tasks. So, the shared features (subfigures
A2, B2 and C2) learnt by the trunk network contain information
that can be used for related tasks. Although the discrimination
of the shared features is not particularly obvious, relevant
Metric Method
Metric Method
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.
0018-9456 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2020.3017900, IEEE
Transactions on Instrumentation and Measurement
8
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
Dim 1 Dim 1
Dim 1
Dim 1
Dim 2Dim 2Dim 2
Dim 2Dim 2Dim 2
Dim 2
Dim 2
Dim 1
Dim 1
Dim 1 Dim 1
Dim 1 Dim 1
Dim 1Dim 1
Dim 2
Dim 2Dim 2
Dim 2
1
2
3
4
12 3 4
F
S
L
F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F1 1
S1 S2 S3 S4 L1 L2 L3 L4
(A1)
(B1)
(C1)
(A2)
(B2)
(C2)
(A3)
(B3)
(C3)
(A4)
(B4)
(C4)
SIT FDT LIT
Fig. 7. Group plot visualization of 2D features in different layers for the MT-1DCNN under SNR=10 dB, which are visualized by fault categories
(subfigures A1-A4), speed categories (subfigures B1-B4) and load categories (subfigures C1-C4).
auxiliary tasks can provide additional supervision information
and make the features learned by the network have a better
generalization ability. This also brings another benefit, that is,
the local minima of different tasks are in different positions in
MTL networks. Through joint learning, it can help the network
to escape the local minima. In single-task learning network,
gradient backpropagation tends to fall into local minima. Then,
these shared features are sent to the task-specific branches, from
which the features that can be used for specific tasks are
selected. In this way, the network allows the features that are
dedicated to one task of the shared features to be used by other
tasks, and such features are often not easy to learn in a single-
task learning network. According to TABLE IV, TABLE V
and TABLE VI, we observe consistent results, that is, the
classification results of the current task are improved after
introducing auxiliary tasks. This shows that certain features that
are dedicated to one task are indeed useful for its related tasks.
Finally, the final results are obtained by classifiers of different
tasks. The proposed MT-1DCNN can learn the shared features
for multiple related tasks, and can also process each task
separately. This preserves the independence of each task and
allows them to connect and promote each other. It is worth
noting that multiple tasks handled by the MT-1DCNN should
be related in some extent, which is assumption behind the
proposed method.
B. Combining the MTL Principle with Other Architecture
This section explores the applicability of the proposed multi-
task principle with other deep learning architectures. We
construct two network architectures, namely long short-term
memory (LSTM) and multi-task LSTM (MT-LSTM). We
firstly design the LSTM with two-layer LSTM cell, where the
length of time steps is 64 and the dimension of input size is 32.
Then, we replace the trunk network in the MT-1DCNN with
LSTM, keep other network structure the same as in the MT-
1DCNN, and finally construct MT-LSTM. The fault diagnosis
accuracy of the two networks under the four SNR scenarios (10
dB, 5 dB, 0 dB, and −5 dB) are shown in TABLE IX. The
experiment results show that the MTL principle successfully
combine with LSTM to improve its fault diagnosis performance.
TABLE IX
PERFORMANCE RESULTS OF MT-LSTM AND LSTM UNDER THE FOUR SNR
SCENARIOS.
10 dB 5 dB 0 dB −5 dB
LSTM 0.967±0.002 0.954±0.003 0.914±0.005 0.739±0.012
MT-LSTM 0.987±0.001 0.981±0.002 0.948±0.005 0.818±0.010
C. Experiments on the CWRU bearing dataset
In this section, the proposed method is tested on the bearing
dataset of Case Western Reserve University (CWRU). The
CWRU dataset is a public dataset. In this dataset, a total of four
load conditions are set, which are 0 hp, 1 hp, 2 hp and 3 hp. The
data used in this experiment comes from the drive end of the
test bench motor and contains four different health states,
namely healthy state, inner ring fault, outer ring fault, and ball
fault. All three types of faults are produced by electro-discharge
machining. Their diameters are 7 mils, 14 mils and 21 mils
respectively. We treat different degrees of failure as an
independent bearing health condition. Therefore, this dataset
contains 10 health conditions and four load conditions. After
data augmentation, 9320 training samples, 4200 validation
samples and 4200 test samples are obtained. The fault diagnosis
results of six networks under SNR = −5 dB are shown in
TABLE X. It is worth noting that MT-1DCNN contains only
FDB and LIB. The experimental results show that MT-1DCNN
obtains 93.8% fault diagnosis accuracy, which is an increase of
5.8% compared to Zhang-1DCNN. This indicates that MT-
1DCNN also performs well on the CWRU bearing dataset.
Method SNR
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.
0018-9456 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2020.3017900, IEEE
Transactions on Instrumentation and Measurement
9
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
TABLE X
PERFORMANCE RESULTS OF THE SIX NETWORKS IN THE CWRU BEARING
DATASET (SNR=−5 DB)
Accuracy Recall Precision
MT-1DCNN 0.938±0.007 0.938±0.007 0.939±0.007
Wei-1DCNN 0.837±0.015 0.837±0.015 0.836±0.016
Wen-2DCNN 0.863±0.009 0.863±0.009 0.864±0.008
Zhang-1DCNN 0.880±0.002 0.880±0.002 0.889±0.005
Guo-2DCNN 0.621±0.021 0.620±0.022 0.625±0.018
BPNN 0.317±0.002 0.317±0.003 0.326±0.003
D. Novelties of the MT-1DCNN
Condition of the wheelset bearing is restricted by many
factors. As shown in Fig. 1, the vibration response of the
wheelset bearing is related not only to its health condition, but
also to its working conditions, such as speed and load.
Therefore, if a fault diagnosis model can obtain both health
condition information and working condition information, the
model can understand the bearing more comprehensively and
thereby can make a more accurate decision. However, based on
our literature review [36-38], we find that current deep
learning-based methods ignore association between working
condition and health condition. Therefore, this paper explores
possibility of enabling the network to simultaneously handle
working condition identification tasks and the FDT. We prove
that MTL can make the network effectively use and share the
features learned by different tasks, so as to improve fault
diagnosis performance. The novel MT-1DCNN is proposed,
which has achieved very competitive performance on the
wheelset bearing dataset compared to five peer networks.
MT-1DCNN first tries to use the multi-task structure to
benefit the working condition information to improve the fault
diagnosis performance of the network. In this way, our method
provides a new solution for FDT, and a general and scalable
architecture. MT-1DCNN can be easily expanded if there are
other available working condition information (such as
temperature). In addition, some task branches can also be
reduced. For example, on the CWRU dataset, MT-1DCNN
removes the SIB, it still achieved good performance. However,
the introduction of more tasks will definitely bring more
parameters and calculation cast, and it also challenges the trunk
network's feature learning ability. Therefore, we should
propose a lightweight trunk network with stronger feature
learning abilities. In this paper, inspired by [28], we apply the
wide convolution kernel to the entire network to make the
network have a more powerful ability to learn long-term
correlation features. In addition, we gradually reduce the size of
the convolution kernels and construct a shallow network
architecture to reduce the network parameters. As shown in
TABLE VII and TABLE VIII, MT-1DCNN has the same
amount of parameters as Wei-1DCNN [28], but it can handle
multiple tasks and has better diagnostic performance. This
indicates that our network improves the feature learning ability
while maintaining fewer parameters. Finally, by visualizing the
internal feature learning situation of the MT-1DCNN, we
discussed the mechanism of the MTL, which also made efforts
for interpreting CNN in fault diagnosis field.
E. Overfitting Problem
Multiple tasks are related but not the same. This diversity can
reduce the overfitting problem when learning parameters shared
in the trunk network. The more tasks we are learning
simultaneously, the more our model has to find a representation
that captures all of the tasks and the less is our chance of
overfitting on our original task [43].
However, overfitting is a common problem for supervised
learning-based networks, especially for a complex model.
During the training process, the network may lead to an
overfitted response in which the methods prioritizes signal
differentiation instead patterns characterization. In this regard,
the features considered by the methods could be specific of the
test set (e.g. noise level or electrical interferences), and not
common patterns, useful to be applied in the application of the
trained methodology in other similar systems. The following
two ideas can improve the generalization performance of the
network. 1) Unsupervised learning. Before supervised learning,
try to use unsupervised learning to characterize the patterns, and
then use supervised learning to classify, the overfitting problem
can be alleviated. 2) Transfer learning. Transferring the patterns
characterization learned by the network on other large datasets
to the current task to alleviate the network's overfitted response
to the current dataset.
VI. CONCLUSIONS
This paper proposes the end-to-end MT-1DCNN for fault
diagnosis of wheelset bearing. It introduces the MTL principle
into bearing fault diagnosis, and explores the influence of speed
information and load information on the FDT. The MT-1DCNN
acquires the shared features between these tasks by
simultaneously processing the FDT, the SIT, and the LIT. Then,
the network can obtain speed information and load information
that can assist the classifier for fault diagnosis from the shared
features. Therefore, the MT-1DCNN has a more comprehensive
feature learning mechanism, which can achieve better
performance with a lightweight network structure. The MT-
1DCNN establishes a multi-task network framework for
bearing fault diagnosis, which can not only use speed and load
tasks as auxiliary tasks, but also further use wind speed,
temperature and other tasks that related to the FDT. The
experiment results show that the MT-1DCNN has considerable
advantages over the five peer networks in accuracy, precision
and recall. We also prove that the MTL principle can
simultaneously improve the performance of the FDT, the SIT,
and the LIT. Moreover, possibility of combining the MTL
principle with another network architecture is preliminarily
validated.
REFERENCES
[1] H. Cao, F. Fan, K. Zhou and Z. He, "Wheel-bearing fault diagnosis of trains
using empirical wavelet transform," Measurement, vol. 82, 2016.
[2] Z. Li, J. Chen, Y. Zi and J. Pan, "Independence-Oriented VMD to Identify
Fault Feature for Wheel Set Bearing Fault Diagnosis of High Speed
Locomotive," Mech. Syst. Signal Pr., vol. 85, pp. 512-529, 2017.
[3] X. Wang, Z. Yang and X. Yan, "Novel Particle Swarm Optimization-Based
Variational Mode Decomposition Method for the Fault Diagnosis of
Complex Rotating Machinery," IEEE/ASME Transactions on
Mechatronics, vol. 23, no. 1, pp. 68-79, 2018.
Metric
Method
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.
0018-9456 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2020.3017900, IEEE
Transactions on Instrumentation and Measurement
10
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
[4] X. Zhang, J. Wang, Z. Liu and J. Wang, "Weak Feature Enhancement in
Machinery Fault Diagnosis Using Empirical Wavelet Transform and an
Improved Adaptive Bistable Stochastic Resonance," ISA T., vol. 84, pp.
283-295, 2019.
[5] Y. Kong, T. Wang and F. Chu, "Meshing Frequency Modulation Assisted
Empirical Wavelet Transform for Fault Diagnosis of Wind Turbine
Planetary Ring Gear," Renew. Energ., vol. 132, pp. 1373-1388, 2019.
[6] Z. Liu, Y. Jin, M.J. Zuo and Z. Feng, "Time-Frequency Representation
Based on Robust Local Mean Decomposition for Multicomponent AM-
FM Signal Analysis," Mech. Syst. Signal Pr., vol. 95, pp. 468-487, 2017.
[7] Z. Liu, M.J. Zuo, Y. Jin, D. Pan and Y. Qin, "Improved Local Mean
Decomposition for Modulation Information Mining and Its Application to
Machinery Fault Diagnosis," J. Sound Vib., vol. 397, pp. 266-281, 2017.
[8] M. Kang, J. Kim, J. Kim, A.C.C. Tan, E.Y. Kim and B. Choi, "Reliable
Fault Diagnosis for Low-Speed Bearings Using Individually Trained
Support Vector Machines With Kernel Discriminative Feature Analysis,"
IEEE T. Power Electr., vol. 30, no. 5, pp. 2786-2797, 2015.
[9] L. Ren, W. Lv, S. Jiang and Y. Xiao, "Fault Diagnosis Using a Joint Model
Based on Sparse Representation and SVM," IEEE T. Instrum. Meas., vol.
65, no. 10, pp. 2313-2320, 2016.
[10] P. Baraldi, F. Cannarile, F. Di Maio and E. Zio, "Hierarchical K-Nearest
Neighbours Classification and Binary Differential Evolution for Fault
Diagnostics of Automotive Bearings Operating under Variable
Conditions," Eng. Appl. Artif. Intel., vol. 56, pp. 1-13, 2016.
[11] D.H. Pandya, S.H. Upadhyay and S.P. Harsha, "Fault Diagnosis of Rolling
Element Bearing with Intrinsic Mode Function of Acoustic Emission Data
Using APF-KNN," Expert Syst. Appl., vol. 40, no. 10, pp. 4137-4145, 2013.
[12] V.N. Ghate and S.V. Dudul, "Optimal MLP Neural Network Classifier for
Fault Detection of Three Phase Induction Motor," Expert Syst. Appl., vol.
37, no. 4, pp. 3468-3481, 2010.
[13] J. Zheng, H. Pan and J. Cheng, "Rolling Bearing Fault Detection and
Diagnosis Based on Composite Multiscale Fuzzy Entropy and Ensemble
Support Vector Machines," Mech. Syst. Signal Pr., vol. 85, pp. 746-759,
2017.
[14] K. Choi, S. Singh, A. Kodali, K.R. Pattipati, J.W. Sheppard and S.M.
Namburu, et al., "Novel Classifier Fusion Approaches for Fault Diagnosis
in Automotive Systems," IEEE T. Instrum. Meas., vol. 58, no. 3, pp. 602-
611, 2009.
[15] Y. LeCun, Y. Bengio and G. Hinton, "Deep Learning.," Nature, vol. 521,
pp. 436-444, 2015.
[16] D. Peng, Z. Liu, H. Wang, Y. Qin and L. Jia, "A Novel Deeper One-
Dimensional CNN With Residual Learning for Fault Diagnosis of
Wheelset Bearings in High-Speed Trains," IEEE Access, vol. 7, pp. 10278-
10293, 2019.
[17] W. Zhang, X. Li and Q. Ding, "Deep Residual Learning-Based Fault
Diagnosis Method for Rotating Machinery," ISA T., 2018.
[18] H. Wang, Z. Liu, D. Peng and Y. Qin, "Understanding and Learning
Discriminant Features Based on Multi-Attention 1DCNN for Wheelset
Bearing Fault Diagnosis," IEEE T. Ind. Inform., 2019.
[19] J. Pan, Y. Zi, J. Chen, Z. Zhou and B. Wang, "LiftingNet: A Novel Deep
Learning Network With Layerwise Feature Learning From Noisy
Mechanical Data for Fault Classification," IEEE T. Ind. Electron., vol. 65,
no. 6, pp. 4973-4982, 2018.
[20] J. Jiao, M. Zhao, J. Lin and C. Ding, "Deep Coupled Dense Convolutional
Network With Complementary Data for Intelligent Fault Diagnosis," IEEE
T. Ind. Electron., vol. 66, no. 12, pp. 9858-9867, 2019.
[21] R. Liu, F. Wang, B. Yang and S.J. Qin, "Multi-scale Kernel based Residual
Convolutional Neural Network for Motor Fault Diagnosis Under Non-
stationary Conditions," IEEE T. Ind. Inform., pp. 1, 2019.
[22] G. Xu, M. Liu, Z. Jiang, W. Shen and C. Huang, "Online Fault Diagnosis
Method Based on Transfer Convolutional Neural Networks," IEEE T.
Instrum. Meas., vol. 69, no. 2, pp. 509-520, 2020.
[23] I. Kao, W. Wang, Y. Lai and J. Perng, "Analysis of Permanent Magnet
Synchronous Motor Fault Diagnosis Based on Learning," IEE E T. Instrum.
Meas., vol. 68, no. 2, pp. 310-324, 2019.
[24] L. Wen, X. Li and L. Gao, "A New Two-Level Hierarchical Diagnosis
Network Based on Convolutional Neural Network," IEEE T. Instrum.
Meas., vol. 69, no. 2, pp. 330-338, 2020.
[25] R. Huang, J. Li, W. Li and L. Cui, "Deep Ensemble Capsule Network for
Intelligent Compound Fault Diagnosis Using Multisensory Data," IEEE T.
Instrum. Meas., pp. 1, 2019.
[26] X. Ding and Q. He, "Energy-Fluctuated Multiscale Feature Learning With
Deep ConvNet for Intelligent Spindle Bearing Fault Diagnosis," IEEE T.
Instrum. Meas., vol. 66, no. 8, pp. 1926-1935, 2017.
[27] D. Peng, H. Wang, Z. Liu, W. Zhang, M.J. Zuo and J. Chen, "Multi-branch
and Multi-scale CNN for Fault Diagnosis of Wheelset Bearings under
Strong Noise and Variable Load Condition," IEEE T. Ind. Inform., 2020.
[28] Z. Wei, P. Gaoliang, L. Chuanhao, C. Yuanhang and Z. Zhujun, "A New
Deep Learning Model for Fault Diagnosis with Good Anti-Noise and
Domain Adaptation Ability on Raw Vibration Signals.," Sensors (Basel,
Switzerland), vol. 17, no. 2, 2017.
[29] L. Su, L. Ma, N. Qin, D. Huang and A.H. Kemp, "Fault Diagnosis of High-
Speed Train Bogie by Residual-Squeeze Net," IEEE T. Ind. Inform., vol.
15, no. 7, pp. 3856-3863, 2019.
[30] L. Wen, X. Li, L. Gao and Y. Zhang, "A New Convolutional Neural
Network-Based Data-Driven Fault Diagnosis Method," IEEE T. Ind.
Electron., vol. 65, no. 7, pp. 5990-5998, 2018.
[31] X. Guo, L. Chen and C. Shen, "Hierarchical Adaptive Deep Convolution
Neural Network and Its Application to Bearing Fault Diagnosis,"
Measurement, vol. 93, 2016.
[32] R. Caruana, "Multitask Learning," Mach. Learn., vol. 28, no. 1, 1997.
[33] Y. Zhang and Q. Yang, "An Overview of Multi-Task Learning," National
Science Review, vol. 5, no. 01, pp. 30-43, 2018.
[34] K. Thung and C. Wee, "A Brief Review on Multi-Task Learning,"
Multimed. Tools Appl., vol. 77, no. 22, 2018.
[35] Y. Yan, E. Ricci, R. Subramanian, G. Liu, O. Lanz an d N. Sebe, "A Multi-
Task Learning Framework for Head Pose Estimation under Target
Motion," IEEE T. Pattern Anal., vol. 38, no. 6, pp. 1070-1083, 2016.
[36] S. Guo, B. Zhang, T. Yang, D. Lyu and W. Gao, "Multi-Task
Convolutional Neural Network with Information Fusion for Bearing Fault
Diagnosis and Localization," IEEE T. Ind. Electron., pp. 1, 2019.
[37] R. Liu, B. Yang and A.G. Hauptmann, "Simultaneous Bearing Fault
Recognition and Remaining Useful Life Prediction Using Joint-Loss
Convolutional Neural Network," IEEE T. Ind. Inform., vol. 16, no. 1, pp.
87-96, 2020.
[38] X. Cao, B. Chen and N. Zeng, "A deep domain adaption model with multi-
task networks for planetary gearbox fault diagnosis," Neurocomputing, vol.
409, pp. 173-190, 2020.
[39] L. Min, C. Qiang and S. Yan, "Network In Network," Computer Science,
2013.
[40] I. Goodfellow, Y. Bengio and A. Courville, "Deep Learning,", 2016.
[41] V.D.M. Laurens and G. Hinton, "Visualizing Data Using T-SNE," J. Mach.
Learn. Res., vol. 9, no. 2605, pp. 2579-2605, 2008.
[42] Z. Wang and Z. Qian, "Effects of concentration and size of silt particles on
the performance of a double-suction centrifugal pump," Energy, vol. 123,
pp. 36-46, 2017.
[43] S. Ruder, "An Overview of Multi-Task Learning in Deep Neural
Networks," [Online]. Available: arxiv:1706.05098.
Zhiliang Liu was born in Rizhao,
Shandong, China in 1984. Moreover,
received the Ph. D. degree in the School of
Automation Engineering from University
of Electronic Science and Technology of
China (UESTC), Chengdu, China, in 2013.
From 2009 to 2011, he studied in
University of Alberta as a visiting scholar
for two years. From 2013 to 2015, he was an assistant professor
with the School of Mechanical and Electrical Engineering,
University of Electronic Science and Technology of China
(UESTC). From 2015 to the present, he is an associate professor
in the same department. His research interests include fault
diagnosis and prognostics of rotating machinery by using
advanced signal processing and data mining methods. He
published more than 70 papers including 20+ SCI-Indexed
journal papers. He currently held 10+ research grants from
National Natural Science Foundation of China, Open Grants of
National Key Laboratory, China Postdoctoral Science
Foundation, et al.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.
0018-9456 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2020.3017900, IEEE
Transactions on Instrumentation and Measurement
11
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
Huan Wang was born in Hunan, China.
He received the B.S. degree in the school
of Mechanical and Electrical Engineering
from University of Electronic Science and
Technology of China, Chengdu, China, in
2016, where he is currently pursuing the
M.S. degree in the school of Mechanical
and Electrical Engineering. His research
interests include mechanical fault diagnosis, image recognition,
deep learning and machine learning.
Junjie Liu was born in Chongqing, China,
in 1994. He received the B.S. degree in
mechanical engineering from University of
Electronic Science and Technology of
China, Chengdu, China, in 2017, where he
is currently pursuing the M.S. degree in
mechanical engineering. His research
interests include transfer learning,
equipment reliability, fault diagnosis and health management.
Yong Qin is the Professor, vice dean of
State Key laboratory of Rail Traffic
Control and Safety, Beijing Jiaotong
University. He received his BSc and MSc
degrees in Transportation Automation and
Control Engineering from Shanghai
Railway University, China, in 1993 and
1996, respectively, and the Ph.D. degree
from China Academy of Railway Sciences in Information
Engineering and Control in 1999. He is also the member of
IEEE ITS and RS, senior member of IET. He has authored or
coauthored more than 100 publication papers(SCI/EI), 1 ESI
highly cited paper and 5 books, has 23 patents granted including
2 USA patents, also won 11 science and technology progress
award of ministry. His research area mainly focused on
Prognostics and Health Management for railway transportation
system, transportation network safety and reliability, rail
operation planning and optimization.
Dandan Peng was born in Shanxi, China,
in 1992. She received the B.S. and M.S.
degrees in the School of Mechanical and
Electrical Engineering, University of
Electronic Science and Technology of
China, Chengdu, China, in 2016 and 2019,
respectively, and is currently working
toward the Ph.D. degree in Mechanical
Engineering in KU Leuven, Leuven, Belgium. Her research
interests include Hilbert Huang transform, convolutional neural
network, machinery condition monitoring and fault diagnosis.
Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.
... Data-driven intelligent fault diagnosis has shown high diagnostic efficiency in recent years [2][3][4]. With its powerful feature extraction capability and excellent training ability, deep learning technology can research and analyse massive, multisourced, and complex data [5][6][7]. Although deep learning methods have proven their effectiveness in fault diagnosis on balanced datasets, their performance under imbalanced conditions must be further explored. ...
Article
Full-text available
Bearing‐fault diagnosis in rotating machinery is essential for ensuring the safety and reliability of mechanical systems. However, under complicated working conditions, the number of normal mechanical equipment samples can far exceed the number of faulty ones. When the data are so imbalanced, data fault diagnosis cannot be easily conducted using conventional deep learning methods. This study proposes a fault diagnosis method based on a dual‐branch interactive fusion network, which improves the accuracy and stability of bearing‐fault diagnosis. First, a dual‐branch feature representation network comprising an iterative attention‐feature fusion residual neural network and a long short‐term memory network is designed for extracting different modal features. Meanwhile, intermodal fusion of the extracted features is performed through multilayer perception. Based on the cost‐sensitive regularization loss, a new joint loss function is then designed for network training. Finally, the effectiveness of the proposed method is verified through comparative experiments, visualization analyses, ablation experiments, and generalization performance experiments.
Article
Industrial real-time fault diagnosis is vital to ensure efficient and safe production. In the literature, existing methods usually do not systematically consider some realistic constraints in dealing with the above problem, such as real-time model update, sample imbalance, and high cost of labeling. In this paper, a minority-prioritized online semi-supervised random vector functional link network approach, termed MPOS-RVFL, is proposed to cope with the above issues. Specifically, the pseudo-labeling technique is introduced to fully exploit the information from unlabeled samples in the online data stream. In this context, the approach incorporating minority anchors prioritization, minority weight, and pseudo-label is developed to enhance the model’s capability in accurately identifying minority samples. Several experiments with a real-world gearbox fault dataset are conducted to verify the practicality of MPOS-RVFL. The results demonstrate that the proposed method outperforms the existing state-of-the-art approaches. The source code is available at https://github.com/THUFDD/MPOS-RVFL.
Article
Although gas concentration prediction based on deep learning has made significant progress, the accuracy is typically achieved on the basis of a large number of training samples, making it challenging to meet the requirements of real industrial scenarios. Moreover, traditional neural networks often face issues such as insufficient feature extraction or overfitting in the condition of small sample. In this work, a novel detection method that combines one-dimensional convolutional neural network (1DCNN) featuring efficient channel attention (ECA) mechanism with extreme gradient boosting regressor (XGBR) is proposed to address the aforementioned issue, and simultaneously, a high-quality dataset of H <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> S with small sample has also been collected through an automated gas data acquisition system fully operated by a computerized environment. Due to the special ensemble structure and regularization terms, XGBR can resist overfitting under small sample condition. Furthermore, the deep feature extraction capabilities of neural networks, coupled with the characteristic of attention mechanism to focus on key features, empower ECA-1DCNN to efficiently extract features. The experimental results demonstrated that the R <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> of ECA-1DCNN-XGBR reached 0.9999, and a RMSE of 0.584 and an MAE of 0.374 were simultaneously achieved. Meanwhile, compared with traditional machine learning and deep learning models, the proposed method performed best in regression prediction tasks. These results indicate proposed method performs excellently in the prediction of H <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> S gas concentrations under small sample, with high accuracy and reliability.
Article
Maintaining the operational stability of an aero-engine heavily relies on the health of the rotor bearings. Data-driven fault diagnosis approaches have achieved significant success due to their superior ability to identify faults. However, the application of conventional deep learning methods, which heavily rely on a huge amount of vibration signals, is unrealistic in most aero-engine related cases due to the limited number of fault samples. To tackle this problem, a multi-task siamese network (MTSN) is proposed, which combines data augmentation and metric learning into a unified framework. First, in the multi-classification task, a limited number of samples are augmented using Mixup and learned by a convolutional neural network. Then, in the metric learning task, the similarity between sample pairs is learned by the shared encoder in the siamese network. To ensure effective learning in both tasks, the loss functions of the two learning tasks are balanced by introducing uncertainty. Based on two public bearing fault datasets, we conducted a series of experiments. The experimental results show that the learning tasks mutually reinforce each other, and the MTSN exhibits good fault diagnosis performance with limited data.
Chapter
The identification of the fault-types in power lines is the prerequisite for avoiding large-scale blackouts and restoring an abnormal or a faulty power system to its normal operation. Given an increasingly steady foundation for the application of artificial intelligence in power systems, machine learning has become one of the major directions applied to studying the fault problems in the systems. Performances of different machine learning algorithms may not be exactly the same in fault-type identification. In this paper the fault data of voltage and current obtained from a fault simulation model are collected and formed into sample sets. Five models of machine learning algorithms, i.e., one-dimensional convolutional neural network, principal component analysis & support vector machine, support vector machine, random forest, and K-nearest neighbor, are constructed and programmed in Python to identify various fault-types and then cross-validate their identification performances. After these models are trained in the training set, they are tested in the test set to obtain the values of such performance indexes as accuracy, precision, recall, and F1 score of the fault-type identification, and the confusion matrixes for the quantitative analysis of the misidentified and misclassified samples. The results show that the above five models built by the five machine learning algorithms perform significantly in identifying short-circuit fault-types, and all the values of the performance indexes exceed 98.81%. Especially, among them the model of the one-dimensional convolutional neural network outperforms the other four ones owing to its impressive overall performance.
Article
CNN-based approaches have been widely developed and applied in intelligent diagnosis. However, only some CNN models make sufficient use of various favorable information in images, limiting adequate feature extraction. Moreover, existing models are still dominated by single-channel types, with limited multisensor collaborative diagnostic framework development. To alleviate the above problems, we propose a diagnostic framework called lightweight adaptive fusion improved convolutional neural network (LAFICNN). Specifically, a global spatial attention mechanism is first developed, which considers the nature of images and enhances the model’s attention to the long-range dependencies of signals. Then, a novel Conv Block that can better utilize the spatial information in the image is designed, and a single-channel lightweight improved convolutional neural network (LICNN) is constructed. In addition, an adaptive fusion module (AFM) is proposed, which can adaptively assign learnable weights to features from different branches at a shallow layer of the network without increasing the training parameters. Finally, the LAFICNN is proposed by embedding the AFM into the LICNN to realize multisensor collaborative diagnosis. Experimental results on a rolling bearing dataset show that LAFICNN improves the performance by 0.8% to 4.2% compared to the existing works. The stability of the LAFICNN is verified using a gearbox dataset.
Article
Full-text available
The critical issue for fault diagnosis of wheelset bearings in high-speed trains is to extract fault features from vibration signals. To handle high complexity, strong coupling and low signal-to-noise ratio of the vibration signals, this paper proposes a novel multi-branch and multi-scale convolutional neural network that can automatically learn and fuse abundant and complementary fault information from the multiple signal components and time scales of the vibration signals. The proposed method combines the conventional filtering methods and the idea of the multi-scale learning, which can extend the breadth and depth of the feature learning process. Consequently, the proposed network can perform better. The experimental results on the wheelset bearing dataset demonstrate that the proposed method has better anti-noise ability and load domain adaptability, and can diagnose 12 fault types more accurately compared with the five state-of-the-art networks.
Article
The planetary gearbox plays an important role in many advanced electromechanical mechanisms. The mechanical fault is a major factor that threatens the service performance of a planetary gearbox. Deep learning (DL) algorithms have been widely used to identify faults and health status of industrial equipment. However, owing to diversified equipment structures, variable working conditions and disparate data acquisition, the service performance of DL-based methods may degrade significantly when applied to different industrial sites. Domain adaptation emerges as a promising idea that aims to transfer knowledge from a source domain to a different but related target domain. In this paper, we introduce a novel deep domain-adaptive multi-task learning model Y-Net, which is exploited to enable domain-adaptive diagnosis of faults in planetary gearboxes. The SE-Res modules are utilized to reduce the redundancy of the model and improving the separability of deep features. Furthermore, a soft joint maximum mean discrepancy (SJMMD) is introduced to link the two pipeline in order to reduce both the marginal and conditional distribution discrepancy of the learned features, with the enhancement of auxiliary soft labels. The domain adaption between different planetary gearbox under variant operating condition is realized by the Y-Net. Experiments demonstrate the superiority of the proposed SJMMD over conventional maximum mean discrepancy, especially when the datasets of different domains suffer different imbalances.
Article
With the manufacturing industry stepping into the emerging new era of big-data and intelligence, the amount of data collected from perception and monitoring systems with multiple smart sensors has increased tremendously. Such huge amount of multisensory data may not only power many aspects of fault diagnosis, but also bring great opportunities and challenges in modern manufacturing industry. In addition, with respect to intelligent fault diagnosis for machinery, few researches have been focused on the compound fault diagnosis under big-data circumstance. Therefore, a novel intelligent compound fault decoupling method based on deep capsule network and ensemble learning is developed for the compound fault decoupling and diagnosis using multisensory data. First, a decoupling capsule network (DCN) is constructed as the basic model. Second, taking the full advantage of multisensory data, the DCN model can be pre-trained with multiple sensor data, respectively, which can obtain various pre-trained DCN models. Finally, combining with ensemble learning skill, the pre-trained DCN models are integrated by a combination strategy to obtain the deep ensemble capsule network (DECN) model for intelligent compound fault decoupling and diagnosis. The performance of DECN model is validated by an automobile transmission dataset with two compound faults, and the experimental results illustrate that the DECN model obtains higher diagnosis accuracy and decouples the compound fault correctly.
Article
Recently, deep learning based fault diagnosis methods have been widely studied for rolling bearings. However, these neural networks lack of interpretability for fault diagnosis tasks. That is, how to understand and learn discriminant fault features from complex monitoring signals remains a great challenge. Considering this challenge, this paper explores the use of attention mechanism in fault diagnosis networks and designs attention module by fully considering characteristics of rolling bearing faults to enhance fault-related features and to ignore irrelevant features. Powered by the proposed attention mechanism, a multi-attention one-dimensional convolutional neural network (MA1DCNN) is further proposed to diagnose wheelset bearing faults. The MA1DCNN can adaptively recalibrate features of each layer and can enhance the feature learning of fault impulses. Experimental results on the wheelset bearing dataset show that the proposed multi-attention mechanism can significantly improve discriminant feature representation, thus the MA1DCNN outperforms eight state-of-the-arts networks.
Article
Accurate fault information is critical for optimal scheduling of production activities, improving system reliability, and reducing operation and maintenance cost. In recent years, many fault diagnosis methods for rolling element bearings have been developed based on deep learning. Most of them are totally data-driven and do not consider the domain knowledge that has been used in fault diagnosis for years. Meanwhile, operating conditions such as rotating speed and load that have great influence on vibration signals are also ignored. It may cause a decrease in accuracy when bearing type or operating condition changes. To address these problems, this paper proposes a rolling element bearing fault diagnosis and localization approach based on multi-task convolutional neural network (CNN) with information fusion. In the proposed approach, domain knowledge, operating conditions, and vibration signals are fused into a 3-dimensional input that can be processed well by CNN. Then a multi-task CNN with dynamic training rates is constructed to simultaneously accomplish two tasks, fault diagnosis and localization. Experimental results on two rolling element bearing testbeds with different bearing types and operating conditions are presented and compared with existing state-of-the-art approaches to demonstrate the effectiveness and accuracy of the proposed approach.
Article
Motor fault diagnosis is imperative to enhance the reliability and security of industrial systems. However, since motors are often operated under non-stationary conditions, the high complexity of vibration signals raises notable difficulties for fault diagnosis. Therefore, considering the special physical characteristics of motor signals under non-stationary conditions, a multi-scale kernel based residual network (MK-ResCNN) is proposed in this paper for motor fault diagnosis. Our contributions mainly fall into two aspects. First, we notice that each motor fault category has various patterns in vibration signals due to the changing operational conditions of the motor. To capture these patterns, a multi-scale kernel algorithm are applied in the CNN architecture. Second, since the motor vibration signals are made up of many different components from different transfer paths, they are very complex and variable. To enable the architecture to extract fault features from deep and hierarchical representation spaces, sufficient depth of the network is needed, which will lead to the degradation problem. In the proposed method, residual learning is embedded into the multi-scale kernel CNN to avoid performance degradation and build a deeper network. To validate the effectiveness of the proposed networks, a normal motor and five motors with different failures are tested. The results and comparisons with state-of-the-art methods highlight the superiority of the proposed method.
Article
Fault diagnosis of high-speed train (HST) bogie is essential in guaranteeing the normal daily operation of an HST. In prior works, feature extraction from multisensor vibration signals mainly relies on signal processing methods, which is independent of the classification process. Based on convolutional neural networks (CNNs), this paper presents a novel fault diagnosis system using the residual-squeeze net (RSNet), which is directly applicable to raw data (time sequences) and does not require any signal transformation or postprocessing. In this network, information fusion is achieved by using the convolutional layer. More specifically, via the squeeze operation, an optimal combination of channels is learnt by training the network. Experimental results obtained by using SIMPACK simulation data demonstrate the effectiveness of the proposed approach in both complete failure case and single failure case, with diagnosis accuracy near 100%. The proposed approach also shows good performance in identifying the locations of faulty components. Comparisons between RSNet and competitive methods shows the advantages of RSNet for fault classification.
Article
Fault diagnosis and remaining useful life (RUL) prediction are always two major issues in modern industrial systems, which are usually regarded as two separated tasks to make the problem easier but ignore the fact that there are certain information of these two tasks can be shared to improve the performance. Therefore, to capture common features between different relative problems, a joint-loss convolutional neural network (JL-CNN) architecture is proposed in this paper, which can implement bearing fault recognition and RUL prediction in parallel by sharing the parameters and partial networks, meanwhile keeping the output layers of different tasks. The JL-CNN is constructed based on a CNN, which is a widely used deep learning method because of its powerful feature extraction ability. During optimization phase, a joint-loss function is designed to enable the proposed approach to learn the diagnosis-prognosis features and improve generalization while reducing the overfitting risk and computation cost. Moreover, because the information behind the signals of different problems has been shared and exploited deeper, the generalization and the accuracy of results can also be improved. Finally, the effectiveness of the JL-CNN method is validated by run-to-failure dataset. Compared with support vector regression (SVR) and traditional CNN, the MSE of the proposed method decreases 82.7% and 24.9% respectively. Therefore, results and comparisons show that the proposed method can be applied for the inter-crossed applications between fault diagnosis and RUL prediction.
Article
Fault detection and diagnosis (FDD) is crucial for stable, reliable, and safe operation of industrial equipment. In recent years, deep learning models have been widely used in data-driven FDD methods because of their automatic feature learning capability. In general, these models are trained on historical sensor data, and therefore, it is very difficult to meet the real-time requirement of online FDD applications. Since transfer learning can solve different but similar problems in the target domain efficiently and effectively with the knowledge learned from the source domain, this paper proposes an online fault diagnosis method based on a deep transfer convolutional neural network (TCNN) framework. The TCNN framework is made up of an online CNN based on LeNet-5 and several offline CNNs with a shallow structure. First, time-domain signal data are converted into images that contain abundant fault information and are suitable as the input of CNN. Then, the online CNN is constructed to automatically extract representative features from the converted images and classify faults. Finally, in order to improve the real-time performance of the online CNN, several offline CNNs are also constructed and pretrained on related data sets. By directly transferring the shallow layers of the trained offline CNNs to the online CNN, the online CNN can significantly improve the real-time performance and successfully address the issue of achieving the desired diagnostic accuracy within limited training time. The proposed method is validated on two bearing data sets and one pump data set, respectively. The prediction accuracy of the proposed method using three data sets are 99.88%, 99.13%, and 99.98%, respectively. The experimental results also indicate that the improvement of accuracy is 19.21% for the motor bearing case, 29.82% for the rolling mill bearing case, and 33.26% for the pump case during the early stage of learning.
Article
In recent years, artificial intelligent techniques have been extensively explored in the field of health monitoring and fault diagnosis due to their powerful capabilities. In this paper, we propose a deep coupled dense convolutional network (CDCN) with complementary data to integrate information fusion, feature extraction and fault classification together for intelligent diagnosis. In this framework, built-in and external sensor data are first developed to form the input of network in parallel. Then a one-dimensional coupled dense convolutional network is proposed, which not only could naturally build deeper network with alleviating the loss of features and gradient vanishing, but also develops a double-level information fusion strategy, including self-information fusion and mutual-information fusion, to facilitate the transmission of fault information and capture more comprehensive features. Finally, the extracted joint features are used for fault recognition and classification. The proposed approach is evaluated on a planetary gearbox test-bed. The results demonstrate the validity and superiority of the proposed method.