ArticlePDF Available

Multi-task Learning Based on Lightweight 1DCNN for Fault Diagnosis of Wheelset Bearings

August 2020
IEEE Transactions on Instrumentation and Measurement PP(99):1-1

August 2020
PP(99):1-1

DOI:10.1109/TIM.2020.3017900

Authors:

Zhiliang Liu

University of Electronic Science and Technology of China

Yong Qin

Beijing Jiaotong University

Show all 5 authorsHide

In recent years, deep learning has been proven to be a promising bearing fault diagnosis technology. However, most of existing methods are based on single-task learning. Fault diagnosis task is treated as an independent task, and rich correlation information contained in different tasks is ignored. Therefore, this paper explores possibility of using speed identification and load identification tasks as two auxiliary tasks to improve performance of the fault diagnosis task, and proposes a multi-task one-dimensional convolutional neural network (MT-1DCNN). Specifically, the MT-1DCNN utilizes trunk network to learn shared features required for every task, and then processes different tasks through multiple taskspecific branches. In this way, the MT-1DCNN can utilize features learned by related tasks to improve performance of the fault diagnosis task. Experiment results with wheelset bearing dataset show that the multi-task learning can make full use of the feature information captured by the speed identification and the load identification tasks to improve fault diagnosis performance of the network, and the MT-1DCNN has better performance than five excellent networks in accuracy.

Content uploaded by Zhiliang Liu

Content may be subject to copyright.

0018-9456 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIM.2020.3017900, IEEE

Transactions on Instrumentation and Measurement

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT



Abstract—In recent years, deep learning has been

proven to be a promising bearing fault diagnosis

technology. However, most of existing methods are based

on single-task learning. Fault diagnosis task is treated as

an independent task, and rich correlation information

contained in different tasks is ignored. Therefore, this

paper explores possibility of using speed identification and

load identification tasks as two auxiliary tasks to improve

performance of the fault diagnosis task, and proposes a

multi-task one-dimensional convolutional neural network

(MT-1DCNN). Specifically, the MT-1DCNN utilizes trunk

network to learn shared features required for every task,

and then processes different tasks through multiple task-

specific branches. In this way, the MT-1DCNN can utilize

features learned by related tasks to improve performance

of the fault diagnosis task. Experiment results with

wheelset bearing dataset show that the multi-task learning

can make full use of the feature information captured by the

speed identification and the load identification tasks to

improve fault diagnosis performance of the network, and

the MT-1DCNN has better performance than five excellent

networks in accuracy.

Index Terms—Multi-task learning, Convolutional neural

network, Bearing fault diagnosis, Vibration analysis.

I. INTRODUCTION

HEELSET bearing is core component of high-speed train

(HST) bogie, its mechanical performance greatly affect

safety and reliability of the HST operation. Therefore,

automatic health monitoring for wheelset bearing is of great

significance [1]. Due to long-term operation of the HST under

time-variant conditions such as speed, load, and operation

environment, vibration signals from wheelset bearing are easily

interfered. It brings a big challenge for accurate fault diagnosis

based on vibration analysis.

The main work of fault diagnosis research is to extract useful

information from vibration signals, and then use classification

methods to obtain robust diagnosis results. Scholars have

proposed various signal processing methods to extract

This work was supported by the National Natural Science Foundation of

China (61833002). (Corresponding authors: Huan Wang and Yong Qin).

Zhiliang Liu, Huan Wang, Junjie Liu are with the School of Mechanical and

Electrical Engineering, University of Electronic Science and Technology of

China, Chengdu, 611731, China.

Yon Qin is with the State Key laboratory of Rail Traffic Control and Safety,

Beijing Jiaotong University, Beijing, 100044, China.

Dandan Peng is with the Department of Mechanical Engineering, KU Leuven,

Leuven, Belgium BE-3001.

representative features. For example, variational mode

decomposition [2, 3], empirical wavelet transform [4, 5] and

local mean decomposition [6, 7]. In addition, support vector

machine (SVM) [8, 9], k-nearest neighbor [10, 11] and multi-

layer perceptron [12] are o ft en use d as class ifiers t o predic t fault

types. For instance, Zheng et al. [13] proposed a fault diagnosis

method based on multi-scale fuzzy entropy and SVM for rolling

bearing. Kihoon et al. [14] reduced diagnostic errors by fusing

multiple classifier decisions. However, these methods rely

heavily on the domain knowledge of professionals, and they

cannot comprehensively extract the complex dynamic features

of the signals. Robustness and accuracy of these methods need

to be further improved.

As an efficient feature extraction and pattern recognition

method, deep learning attracts more and more attention from

researchers [15]. In particular, convolutional neural network

(CNN) has achieved significant success in fault diagnosis of

rotating machinery due to its unique feature learning

mechanism through convolution operation [16-27]. For

example, Liu et al. [21] proposed a residual CNN with a multi-

scale kernel for motor fault diagnosis in non-stationary

conditions. Zhang et al. [28] proposed a deep CNN with wide

first-layer kernels, which can better learn the long-time

information of vibration signals. These methods are based on

one-dimensional CNN (1DCNN) [17-21, 29], which uses the

1DCNN to automatically learn useful information of vibration

signals and diagnose the health condition of machinery. In

addition, Wen et al. [30] transformed vibration signals into two-

dimensional (2D) image, and then used two-dimensional CNN

(2DCNN) to learn the useful features. The 2DCNN-based

methods usually require converting 1D signal into 2D matrix

(e.g. time-frequency spectra) [30, 31]. Compared with 2DCNN,

1DCNN can learn the features directly, and the structure is

relatively simple, so it is more suitable for bearing fault

diagnosis.

The above deep learning networks are all based on single-

task learning. Their network parameter optimization is

constrained by fault diagnosis task (FDT), and thus the features

learned by the network are only applicable to the diagnosis of

mechanical health condition. This approach seems reasonable,

but there are implicit shortcomings. Many problems in the real

world cannot be decomposed into independent subtasks. Even

if it can be decomposed, its subtasks are related to each other

and are connected by some sharing factors or sharing features

[32]. Therefore, if the real problem is treated as multiple

independent single tasks, the rich associated information among

these tasks is ignored.

Multi-task learning (MTL) [32-35] is a machine learning

method aimed at solving multiple tasks at the same time. It can

Multi-task Learning Based on Lightweight 1DCNN for

Fault Diagnosis of Wheelset Bearings

Zhiliang Liu, Member, IEEE, Huan Wang, Junjie Liu, Yong Qin, Member, IEEE, Dandan Peng

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.

Transactions on Instrumentation and Measurement

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

use the useful information learned by related tasks to improve

performance of the network. Rich et al. [32] summarized that

the MTL is an approach to inductive transfer that improves

generalization by using domain information contained in the

training signals of related tasks as an inductive bias. Simply put,

this method can learn the shared features of multiple tasks, and

allows the features that are specific to a task of this shared

features to be used by other tasks, which can effectively

promote the results of the main task and other auxiliary tasks.

This advantage is not possessed by single-task learning.

There are some research on MTL in the field of fault

diagnosis. For example, Guo et al. [36] introduced MTL to the

field of fault diagnosis and proposed a multi-task neural

network for processing fault mode task and fault location task

simultaneously. Liu et al. [37] proposed a MTL method that

simultaneously predicts the fault category and remaining useful

life. Cao et al. [38] used MTL to diagnose health status of

planetary gearbox. These methods proved effectiveness of the

MTL in the FDT, but lacks in-depth analysis and interpretation

of feature learning mechanism of the MTL. In addition, these

works ignore the correlation between the working condition and

the health condition.

Vibration response of rotating machinery is related not only

to the health condition, but also to the working condition, such

as speed and load. Fig. 1 shows the vibration responses of the

wheelset bearing with two different fault categories at different

speeds and loads. The fault category, speed and load of rotating

mechanical system have a great influence on the vibration

responses. If we make the network learn these related tasks

together, and make them share the learned features, the network

can have a more comprehensive understanding of the vibration

signals, and the learning features also have better generalization

performance.

F3 F8

(a)

(b)

(c)

(d)

Fig. 1. Vibration signals with different fault categories (F3, F8) at different

speeds and vertical loads. (a) speed 60km/h, vertical load 56KN; (b)

speed 120km/h, vertical load 56KN; (c) speed 60km/h, vertical load

272KN; (d) speed 120km/h, vertical load 272KN. (Note: F3 and F8 are

described in detail in Section IV)

Therefore, this paper introduces the MTL principle into the

bearing fault diagnosis, and proposes a multi-task one-

dimensional convolutional neural network (MT-1DCNN). The

MT-1DCNN aims to enhance the performance of the network

by using the two auxiliary tasks: speed identification and load

identification. Specifically, the MT-1DCNN processes three

tasks at the same time, and first learns the shared features

among multiple tasks through the trunk network. Subsequently,

the MT-1DCNN uses multiple task-specific branches to process

these tasks. The input of these branches is the shared features

learned by trunk network. Every task-specific branch can take

advantage of the shared features learned by multiple tasks. In

this way, the features specific to one task of the shared features

can be used by other tasks, so that the network can fully

understand the characteristics of the signals and improve the

accuracy of each task. In addition, each task has an independent

loss function. The overall loss of the MT-1DCNN is obtained

by adding up the loss functions of these tasks according to a

certain weight. Powered by MTL, the MT-1DCNN can process

three tasks simultaneously in a lightweight network structure,

and good results can be obtained.

Contributions of this paper are summarized as follows:

1) We introduce the MTL principle to wheelset bearing fault

diagnosis. Effectiveness of the MTL principle has been

demonstrated with implementations based on two deep learning

architectures.

2) We propose a lightweight CNN-based network that uses

vibration signals to simultaneously deal with three related tasks:

fault diagnosis task, speed identification task, and load

identification task.

3) We conduct a set of performance comparison with the

wheelset bearing dataset. In addition, we interpret feature

learning mechanism of MTL by using visualization technique.

The paper is organized as follows. Section II defines MTL in

detail. Section III describes the MT-1DCNN meticulously.

Section IV verifies the effectiveness and superiority of the MT-

1DCNN with the wheelset bearing dataset. Section V discusses

four aspects of the MT-1DCNN. Section VI summarizes the

whole paper.

II. MULTI-TASK LEARNING CONCEPT

Given m learning tasks 1

{}

i

 where all tasks or a subset of

them are related, MTL aims to improve the learning of a model

for the i



by using the knowledge contained in all or some of

the m tasks [33].

Based on this definition, we focus on supervised learning in

MTL since most FDTs fall in this setting. In the setting of

supervised learning tasks, usually a task i

 is accompanied by

a training dataset Di consisting of ni training samples, i.e.,

{,}

iiin

jjj

Dgl



, where i

gis the j-th training instance in i



and i

l is its corresponding label. Here we consider a special

setting for MTL that the training data Di for each task is the

same. In this setting, the network learns shared features from

the same data set that can be used for multiple task processing.

This sharing feature can be shared among these tasks, so as to

improve the generalization performance of the network. Thus,

sharing what is learned by different tasks while tasks are trained

in parallel is the central idea of MTL.

III. MT-1DCNN BASED FAULT DIAGNOSIS METHOD

This study is devoted to explore application of the MTL in

improvement of wheelset bearing fault diagnosis. HST working

condition (such as speed and load) is closely related to vibration

response of wheelset bearing. A fault diagnosis method is

expected to integrate all these comprehensive information of

wheelset bearings and to improve diagnosis results. Therefore,

this paper proposes MT-1DCNN, which can learn fault

information and working condition information of vibration

signals at the same time. Focusing on the FDT, this method

introduces speed identification task (SIT) and load

identification task (LIT) as two auxiliary tasks to obtain more

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.

Transactions on Instrumentation and Measurement

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

generalized shared features through multi-task collaborative

learning. The overall structure of the MT-1DCNN is shown in

Fig. 2. The MT-1DCNN mainly consists of three parts: trunk

network, task-specific branches, and multi-loss function.

F11F10F2F1

MT-1DCNN

F11

GAP+Softmax

L4L3L2L1

GAP+Softmax

S4S3S2S1

GAP+Softmax

Inpu t 1D

Signal

Fa ult D ia gno si s Br an chSpeed Identification Branch Load Identification Branch

ss ll

Loss L L L



 

Trunk Network

Multi-los s Function

Fig. 2. The overall architecture of the MT-1DCNN.

A. Trunk Network

In the MT-1DCNN, the trunk network takes the 1D vibration

signal as input, and then uses multiple convolutional layers to

learn the rich features contained in the raw signal. Different

from other CNNs, the trunk network can learn not only the

fault-related features, but also the features specific to fault-

related tasks. That is, the trunk network can learn shared

features of multiple related tasks, which contain all the feature

information needed to process these tasks. Inspired by Wei et

al. [28], we build a lightweight and excellent trunk network,

whose structure is described in Fig. 2 and TABLE I. The trunk

network consists of five convolution modules, each of which

consists of a convolution layer and a ReLU activation function.

The length of the input 1D signals is 2048×1. To capture the

long-term features of the signal, the convolution kernel size of

the first and the second layers of the trunk network is set to 12×1.

To reduce the network parameters, we gradually reduce the size

of the convolution kernel. In addition, to reduce the complexity

of the trunk network, we set the number of channels to 16 in the

first convolution layer, and then gradually increases to 32. In

the network, the stride size of each convolutional layer is set to

2 to achieve the down-sampling, which can avoid the

information loss caused by using the max-pooling. It can be

seen that our trunk network can capture the long-term features

and short-term features of the input signals with small model

complexity, as well as effectively learn the shared features of

multiple tasks.

B. Task-specific Branches

This study aims to use the MTL principle to simultaneously

process the FDT, the SIT, and the LIT, so that they can share

features with each other and promote the performance of the

FDT. Among them, the FDT is to diagnose the health condition

of the bearing. The SIT and the LIT are to perceive speed and

load of the rotating mechanical system, respectively. To this

end, we design three task-specific branches, which are fault

diagnosis branch (FDB), speed identification branch (SIB) and

load identification branch (LIB). These three branches share the

learning features of the trunk network. Therefore, the

introduction of the SIB and the LIB enables the trunk network

to learn the speed and load information implicitly including in

the vibration responses. The FDB can make full use of the rich

information of the shared features to accurately distinguish

different fault categories.

TABLE I

NETWORK CONFIGURATION OF THE MT-1DCNN ARCHITECTURE

Layer Layer

Typ e

Kernel

Size Channel Stride Padding

1 Conv 12×1 16 2 Yes

2 Conv 12×1 16 2 Yes

3 Conv 9×1 24 2 Yes

4 Conv 9×1 24 2 Yes

5 Conv 6×1 32 2 Yes

Task-specific Branches

Speed Identification Fault Diagnosis Branch Load Identification

Layer Size Stride

Type channel padding

Layer Size Stride

Type channel padding

Layer Size Stride

Type channel padding

Conv 6×1 2

32 Yes

Conv 6×1 2

32 Yes

Conv 6×1 2

32 Yes

Conv 3×1 2

64 Yes

Conv 3×1 2

64 Yes

Conv 3×1 2

64 Yes

Global Average Pooling Global Average Pooling Global Average Pooling

Softmax Softmax Softmax

As shown in Fig. 2 and TABLE I, suppose that the shared

features learned by the trunk network is 1

(;),

MfX X





,

where X is the input signal of the network, T = 2048 is the length

of the signal, f t represents the function learned by the trunk

network, and t



is the parameter of f t. The FDB, the SIB and

the LIB take M as input, and then use two convolution modules

to learn the features (Yf, Ys and Yl) that are used for specific tasks

processing from M. This process can be expressed as Eq. (1).

, , (; ), (;), (;),

fsl f s l

fsl

YYY f M fM fM





 (1)

where f f, f s, and f l are the feature extraction functions learned

by FDB, SIB and LIB respectively, and θf, θf, and θl are the

corresponding parameters.

Then, a global average pooling layer (GAP) [39] is used to

compress the global information of each channel on Yf, Ys and

Yl into a channel descriptor, so as to get feature vectors zf, zs,

and zl. The j-th element of zf is calculated by Eq. (2).

() (),

ff ffC

zGAPY Yuz



 

 (2)

where W and C are the length and number of channels of Yf

respectively.

The GAP can compress the input features into a vector,

which greatly reduces the network parameters. Therefore, it can

effectively avoid the over fitting problem caused by the full

connection layer. In addition, the GAP is more native to the

convolution structure by enforcing correspondences between

feature channels and categories [39].

Wheelset bearing has many fault types, so the FDT is a multi-

class classification problem. In this study, we also consider the

SIT and the LIT into multi-class classification problems. The

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.

Transactions on Instrumentation and Measurement

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

softmax activation function is used for the three tasks. The

softmax function maps the input feature vector to a range from

0 to 1, and makes the sum of all elements of the vector equal to

1, so that it is generally used as the classifier to estimate the

probability distribution belonging to different classes. We

assume that k f, ks and kl are the number of health condition,

speed condition and load condition, respectively. In this paper,

k f = 11 and ks = kl = 4. The softmax function is expressed as Eq.

(3).

exp( )

() , 1,2, , ,

exp( )

Qz j k







 (3)

where ˆj

zis the j-th element of ˆ

z,and ˆ

z is the input vector of

softmax activation function. ˆ

()

Qz is the estimated probability

distribution of ˆ

z belonging to the j-th class.

C. Multi-loss Function

When designing a deep neural network, the choice of the loss

function is always an important aspect. Mean squared error and

mean absolute error often lead to poor performance when used

with gradient-based optimization. Some output units that

saturate produce very small gradients when combined with

these cost functions [40]. Recently, the cross-entropy loss

function gets its popularity and has been widely used in

classification tasks because of its better performance.

The MT-1DCNN needs to process three different

classification tasks simultaneously, so three cross-entropy loss

functions are set. They are Lf for the FDT, Ls for the SIT, and Ll

for the LIT. These three loss functions are independent of each

other. The cross-entropy loss function is mainly used to

evaluate the error of the estimated softmax output probability

distribution and the target class probability distribution.

Suppose pf, ps and pl are the target distribution of the FDT, the

SIT, and the LIT, respective ly, and qf, qs and ql are the estimated

distributions of the three tasks. Lf, Ls and Ll are expressed as Eq.

(4).

111

log( ); log( ); log( )

kkk

fsl

fssll

jjs jjl jj

jjj

LpqLpqLpq



  



(4)

In order to make the three tasks co-trained in the MT-

1DCNN and use the features learned by the SIT and the LIT to

improve the fault diagnosis performance of the network, adding

Lf, Ls and Ll directly to obtain the final loss function of the

network is not optimal. This is because the contribution of

different auxiliary tasks to the FDT is inconsistent, and if the

auxiliary tasks are given too much weights, the features learned

by the network may be more biased to solve the auxiliary tasks.

Therefore, we introduce two hyperparameters (i.e. λs and λl) to

control the weights of the SIT and the LIT, respectively. The

total loss of the MT-1DCNN are expressed as Eq. (5).

ss ll

Loss L L L



  (5)

The accuracy of each task is different, which could result in

unbalance problem in the loss function. To address this problem,

we introduce two weight coefficients in the loss function to

balance accuracies of the three task branches. Selection of the

two hyperparameters will be discussed in Section IV.C. Then,

we use the stochastic gradient descent method to minimize the

loss function in Eq. (5). During training, the network

simultaneously processes the FDT, the SIT, and the LIT. In

other words, the network updates the parameters θt, θf, θs, and

θl at the same time. As training progresses, the trunk network

learn more generalized shared features, and task-specific

branches can take advantage of the features learned from other

tasks to achieve better performance.

IV. EXPERIMENT RESULTS AND ANALYSIS

In this section, effectiveness and superiority of MTL and the

proposed MT-1DCNN is verified on the wheelset bearing

dataset.

A. Test Rig for Fault Experiments

As shown in Fig. 3, the wheelset bearing test rig is

established to evaluate the effectiveness of the MT-1DCNN.

The test rig is designed to simulate a real train operating

environment. It is mainly composed of a drive motor, a belt

transmission system, and a control system. In addition, we also

set up a vertical loading device, a lateral loading device, and

two fan motors to simulate wind resistance and 2D loads during

the real train operation. An axle and its two supporting bearings

are assembled to the test rig. The experimental data are

collected by acceleration sensors installed in the axle boxes, and

the sampling frequency is 5120 Hz.

Wheel and Axle

Later al Loadi ng

Set Drive Motor

Belt Tr ansmission

System

Fan Motor

Vertical Loading

Set

Axle Box

and Test Bearing

Fan Motor

Fig. 3. The wheelset bearing test rig.

In this test rig, we collected bearings with naturally-

generated faults from real operation lines, including a total of

11 health conditions. TABLE II shows the health condition

information of the experimental bearings, which are marked as

F1, F2, ..., F11. To simulate the working environment of

wheelset bearings on a real train as much as possible, four

operation speeds (60, 90, 120, and 150 km / h) and four vertical

loads (56, 146, 236, and 272 KN) were set in each healthy

condition. Therefore, the SIT has four categories, which are

respectively labeled as S1, S2, S3, and S4; the LIT has four

categories, which are respectively labeled as L1, L2, L3, and

L4.

TABLE II

ELEVEN HEALTH CONDITIONS OF WHEELSET BEARINGS.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.

Transactions on Instrumentation and Measurement

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

B. Experiment Setup

The data are randomly divided into training set, validation set

and test set according to the ratio of 3:1:2, and then sliding

segmentation method is used for data augmentation. The sliding

segmentation is an efficient vibration signal data augmentation

method, which has been used in [16, 28]. In our experiment, the

length of each sample is 2048, and the step size of sliding

segmentation is set to 256. Finally, 45,652 training samples,

13,332 validation samples and 19,700 test samples are obtained.

And four repeating experiments are carried out for each method.

The MT-1DCNN is implemented by Keras library and

Python 3.5. Network training and testing are performed on a

workstation with Ubuntu 16.04 operating system, an Intel Core

i9-9900K CPU and a GTX 2080 GPU with 8G video memory.

Each sample accelerates the convergence speed of the network

by subtracting the mean and dividing the variance. During the

training, the learn rate is 0.0001 and batch size is 96.

Accuracy, recall and precision are used as evaluation metrics

to comprehensively measure the performance of classification

methods. They are defined as Eq. (6-8).

100%

TP TN

Accuracy TP FN FP TN





 , (6)

= 100%

Recall TP FN 

, (7)

=100%

Precision TP FP 

, (8)

where FP, TP, FN and TN refer to the number of false positive

samples, true positive samples, false negative samples and true

negative samples, respectively.

To better simulate the strong noise interference of rolling

bearings in real operation, we add white Gaussian noise into the

raw signals. The definition of SNR is shown as Eq. (9).

ignal

noise

SNR log P







, (9)

where Psignal and Pnoise are power of the signal and the noise,

respectively.

We compared the MT-1DCNN with the following five

excellent networks. They are Wen-2DCNN [30], Guo-2DCNN

[31], Wei-1DCNN [28], Zhang-1DCNN [17]. In addition, the

5-layer BPNN is tested for comparison as well. The hidden

layers are 1024, 512, 256, 96, and 11, respectively. Notably, the

training strategy is consistent for every network in all validation.

C. Selecting Weight Coefficients in the Loss Function

In the MT-1DCNN, the two auxiliary tasks have different

contributions to the FDT, which may results in the unbalance

problem in the loss function. The unbalance problem could be

alleviated by tuning the two weight coefficients, whose values

are determined by a grid search with the accuracy of the FDB

in this paper.

In this experiment, λs and λl are sequentially set to 0.2, 0.4,

0.6, 0.8 and 1 under SNR=−5 dB. In other words, we have

conducted 25 different experiments to discuss the influence of

weight coefficients on fault diagnosis performance. The

accuracy of fault diagnosis under different weight coefficient

settings is shown in TABLE III.

Obviously, different weight settings have an impact on the

fault diagnosis performance of the proposed network. This

shows that the three tasks interact with each other, and the

network can learn their correlation. In addition, it also proves

that different auxiliary tasks have different contributions to the

FDT, so it is necessary to set different weight coefficients for

different auxiliary tasks. On the other hand, when λs and λl

increase from 0.2 to 1, the fault diagnosis accuracy of the MT-

1DCNN increases first and then decreases. This is because a

lower weight makes the auxiliary task cannot provide enough

contribution, and a higher weight makes the network more

inclined to process the auxiliary task. When λs = 0.6 a nd λl = 0 .4,

the MT-1DCNN achieves the best fault diagnosis performance.

Therefore, in the subsequent experiments, λs and λl are set to 0.6

and 0.4, respectively.

TABLE III

PERFORMANCE RESULTS FOR DIFFERENT WEIGHTS (SNR=−5 DB).

λs = 1 λs = 0.8 λs = 0.6 λs = 0 .4 λs = 0.2

λl = 1 0.829±0.013 0.829±0.013 0.833±0.012 0.825±0.003 0.812±0.005

λl = 0.8 0.830±0.004 0.829±0.010 0.839±0.003 0.818±0.010 0.815±0.010

λl = 0.6 0.832±0.016 0.818±0.021 0.839±0.009 0.827±0.006 0.821±0.009

λl = 0.4 0.833±0.011 0.841±0.013 0.847±0.007 0.818±0.006 0.8 27±0.010

λl = 0.2 0.825±0.015 0.834±0.015 0.823±0.018 0.826±0 .015 0.826±0.011

D. Effectiveness of MTL for the Fault Diagnosis Task

In this section, we discuss the impact of the MTL on the FDT.

Under SNR=−5 dB, we compare the MT-1DCNN with three

network structures. The three networks are CNN-F (only the

FDB is included), CNN-FS (including the FDB and the SIB),

and the CNN-FL (including the FDB and the LIB). The results

of the four networks are shown in TABLE IV.

TABLE IV

PERFORMANCE RESULTS OF THE FAULT DIAGNOSIS TASK (SNR=−5 DB).

MT-1DCNN CNN-FS CNN-FL CNN-F

Accuracy 0.847±0.007 0.817±0.018 0.800±0.015 0.724±0.028

Recall 0.837±0.009 0.805±0.018 0.787±0.017 0.702±0.032

Precision 0.840±0.007 0.808±0.020 0.790±0.017 0.712±0.030

Compared with the CNN-F, the CNN-FS with the SIB has a

more than 9% improvement in the accuracy of the FDT, which

shows that the speed information is quite important for the FDT,

and it can effectively assist the network in fault diagnosis. In

addition, compared with the CNN-F, the CNN-FL with load

identification branch improves accuracy of fault diagnosis by

7.5%, which also proves that the load information of rotating

mechanical system can promote performance of the FDT.

Compared with the LIT, the addition of the SIT improves the

fault diagnosis performance of the network. This is because the

vibration signals has greater changes at different speeds, so the

network can reduce the occurrence of misjudgment with the

help of the speed information. Compared with the CNN-FS, the

CNN-FL and the CNN-F, the MT-1DCNN has a considerable

improvement in accuracy, recall, and precision. It means that

load information and speed information can complement each

other and improve the diagnostic performance of the network.

The experiment results prove that the MTL can effectively

improve the fault diagnosis performance of the network.

Subsequently, we use the t-distributed stochastic neighbor

embedding (T-SNE) [41] technique to visualize the distribution

Metric Method

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.

Transactions on Instrumentation and Measurement

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

of the final output of the MT-1DCNN, the CNN-FS, the CNN-

FL and the CNN-F in 2D embedded space. The visualization

results are shown in Fig. 4, where color represents health

condition of the wheelset bearing. Coordinates of each point

represent its position in the 2D embedded space, and distance

between two points represents their similarity. We use the

Fisher score [42] as the metric to quantify quality of the

projection. The fisher scores are 2.679, 2.298, 1.633, and 1.524

for the four networks. Obviously, the output of the network with

the MTL ability has a better discrimination ability. In particular,

the proposed MT-1DCNN performs at least 14% better than

other comparison networks in terms of the Fisher score. It

indicates that the MT-1DCNN can effectively improve the

feature learning ability of the network.

Dim 1

Dim 2

MT-1DCNN (Fi sher score: 2. 679) CNN-FS (Fi sher score: 2.29 8)

CNN-FL ( Fisher scor e: 1.633) CNN-F (Fi sher score: 1.524)

F10

F11

Dim 1

Dim 1 Dim 1

Dim 2



Dim 2Dim 2

Fig. 4. Group plot visualization for the MT-1DCNN, the CNN-FS, the

CNN-FL and the CNN-F (SNR=−5dB)

E. Effectiveness of MTL for Speed Identification Task

In this section, we discuss the impact of the MTL on the SIT.

Under SNR=−5 dB, we set up a comparative experiment of

three network structures including the CNN-S (only the SIB is

included), the CNN-FS and the MT-1DCNN. The speed

identification results of every networks are shown in TABLE

Surprisingly, although the MT-1DCNN did not specifically

optimize the SIT, with the addition of the auxiliary tasks, the

speed identification performance of the network has been

improved. After adding the FDT, the speed identification

accuracy of the CNN-FS is 7% higher than that of the CNN-S.

The MT-1DCNN is 1% higher than the CNN-FS after the LIT

is added. This shows that the MT-1DCNN is not only suitable

for the FDT, but also effectively improve the performance of

related tasks at the same time. This inspires us to explore more

methods with MTL in the future work, such as combination of

life prediction task and the FDT.

TABLE V

PERFORMANCE RESULTS OF SPEED IDENTIFICATION TASK (SNR=−5 DB).

MT-1DCNN CNN-FS CNN-S

Accuracy 0.887±0.012 0.878±0.007 0.805±0.004

Recall 0.888±0.012 0.879±0.007 0.806±0.005

Precision 0.887±0.012 0.880±0.008 0.809±0.004

F. Effectiveness of MTL for Load Identification Task

In this section, we discuss the impact of the MTL on the LIT.

Under SNR=−5 dB, we set up a comparative experiment of

three network structures including the CNN-L (only the LIB is

included), the CNN-FL and the MT-1DCNN. The load

identification results of each network are shown in TABLE VI.

Similarly, with the addition of related tasks, the network's

load identification performance has been improved. The

accuracy of the CNN-FL is 7% higher than that of the CNN-L;

the accuracy of the MT-1DCNN is 4% higher than that of the

CNN-FL. This again demonstrates effectiveness of the MTL in

dealing with multiple related tasks. However, the performance

of the LIT is not particularly good compared with that of the

SIT. This is because, due to the existence of standardized

operations, the difference of signals under different loads is

severely eliminated. Secondly, this is a result obtained under

SNR=−5 dB, and strong noise interferes with the judgment of

the network. Thirdly, the goal of the MT-1DCNN is to improve

performance of the FDT, so there is no special optimization for

auxiliary tasks.

TABLE VI

PERFORMANCE RESULTS OF LOAD IDENTIFICATION TASK (SNR=−5 DB).

MT-1DCNN CNN-FL CNN-L

Accuracy 0.640±0.012 0.596±0.011 0.522±0.023

Recall 0.637±0.012 0.593±0.011 0.519±0.022

Precision 0.638±0.013 0.593±0.009 0.516±0.023

G. Comparison with Excellent Methods

In this section, we compare the diagnostic performance of the

MT-1DCNN and five excellent networks under four SNR

scenarios (10 dB, 5 dB, 0 dB, and −5 dB), which are used to

simulate working condition of the wheelset bearing under

different noise levels. The fault diagnosis results of every

networks are shown in TABLE VII.

Obviously, accuracy, recall and precision of the MT-1DCNN

are better than that of other comparison methods under the four

SNR scenarios. In particular, when the noise is strong, the MT-

1DCNN is considerable better than other comparison methods

in terms of diagnostic accuracy. For example, when SNR=−5

dB, the fault diagnosis accuracy of the MT-1DCNN is more

than 13% higher than that of the Wen-2DCNN, which has the

best performance among the comparison methods. It means that

the MT-1DCNN has a relatively strong anti-noise ability even

though without any additional de-noising preprocessing. With

increasing of SNR, the fault diagnosis performance of the

network increases. Under SNR=0 dB, which means the power

of noise is about one times of the original signal power, the MT-

1DCNN still obtains 96.3% in accuracy of fault diagnosis.

Under this noise, the Wen-2DCNN obtained the best results in

the comparison methods, but the accuracy is only 91.7%. Under

SNR=10 dB, accuracy of the MT-1DCNN can achieve more

than 99.4%. It indicates that even though in the noisy

background environment, the MT-1DCNN can still achieve a

great diagnostic performance and has certain practical

application potential. Moreover, the standard deviation of the

MT-1DCNN is smaller than other methods in most cases, which

shows a good stability.

In order to analyze the diagnosis results of each category and

to understand its precision and recall in detail, confusion

matrices of the proposed MT-1DCNN under SNR=10 dB and

SNR=−5 dB are provided in Fig. 5 and Fig. 6, respectively,

where row represents predicted label; column represents true

label; the diagonal is the number of accurate diagnoses for every

categories; the bottom row shows the precision of every

Metric Method

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.

Transactions on Instrumentation and Measurement

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

TABLE VII

PERFORMANCE RESULTS OF THE MT-1DCNN AND THE FIVE COMPARISON NETWORKS UNDER THE FOUR SNR SCENARIOS.

SNR MT-1DCNN Wei-1DCNN Wen-2DCNN Zhang-1DCNN Guo-2DCNN BPNN

10 dB

Accuracy 0.994±0.001 0.981±0.005 0.991±0.001 0.988±0.003 0.894±0.011 0.670±0.003

Recall 0.993±0.001 0.980±0.005 0.990±0.001 0.988±0.002 0.886±0.012 0.659±0.007

Precision 0.993±0.001 0.980±0.005 0.990±0.001 0.987±0.002 0.887±0.011 0.661±0.006

5 dB

Accuracy 0.985±0.001 0.966±0.006 0.980±0.002 0.964±0.006 0.840±0.010 0.621±0.004

Recall 0.984±0.001 0.964±0.007 0.978±0.002 0.963±0.010 0.826±0.010 0.602±0.004

Precision 0.984±0.001 0.964±0.007 0.979±0.002 0.963±0.006 0.828±0.012 0.608±0.003

0 dB

Accuracy 0.963±0.004 0.894±0.009 0.917±0.004 0.866±0.015 0.754±0.010 0.528±0.001

Recall 0.962±0.004 0.889±0.009 0.910±0.005 0.853±0.017 0.736±0.011 0.499±0.002

Precision 0.960±0.004 0.889±0.010 0.910±0.004 0.864±0.017 0.737±0.010 0.508±0.004

−5 dB

Accuracy 0.847±0.007 0.666±0.008 0.712±0.005 0.656±0.024 0.546±0.006 0.363±0.003

Recall 0.837±0.009 0.640±0.008 0.686±0.007 0.636±0.019 0.510±0.007 0.317±0.003

Precision 0.840±0.007 0.648±0.015 0.693±0.007 0.651±0.021 0.517±0.008 0.330±0.003

TABLE VIII

NUMBER OF PARAMETERS AND ONE BATCH SIZE (96 SAMPLES) TEST TIME FOR THE MT-1DCNN AND THE FIVE COMPARISON NETW ORKS.

MT-1DCNN Wei-1DCNN Wen-2DCNN Zhang-1DCNN Guo-2DCNN BPNN

Number of Parameter 5.5×104 5.4×10

4 5.9×10

5 1.3×10

5 1.5×10

4 2.7×10

Test Time/s 1.204 0.563 0.505 0.569 0.501 0.497

categories; and the rightmost column represents the number of

testing samples of every categories. It is seen that the main error

comes from the wrong classification between fault modes.

According to their recall values, F4 is the most confusing fault

type for the MT-1DCNN. The major of wrongly classified F4

samples are classified into F3. In addition, under SNR=10 dB,

the recall and the precision of the network in most fault modes

are close to 100%, which indicates that the MT-1DCNN has

good diagnostic performance when the noise is weak. Even

under SNR=−5 dB, precision of the MT-1DCNN for the normal

category can still reach 95.55%, which shows that the network

can still distinguish normal samples and fault samples well

under strong noise.

Fig. 5. Confusion matrix of the proposed MT-1DCNN (SNR=10 dB).

Fig. 6. Confusion matrix of the proposed MT-1DCNN (SNR=−5 dB).

H. Computational Burden of the Networks

Computational burden of the networks is an important metric

to measure performance of bearing fault diagnosis methods. So,

this section quantitatively calculates the number of parameters

and test time for each method. TABLE VIII summarizes the

total number of parameters and one batch size (96 samples) test

time for every networks. It is seen that the MT-1DCNN has a

very lightweight structure, which obtains better performance

than other networks (such as the Wen-2DCNN) by using a

small number of parameters, which proves that the MT-1DCNN

has a higher parameter utilization rate. However, the MT-

1DCNN needs more test time. This is not surprising, since the

MT-1DCNN has to process three different tasks simultaneously,

which obviously leads to more test time. It is worth pointing out

that the test time of the MT-1DCNN on 96 samples is 1.204

seconds, which is acceptable in engineering practice.

V. DISCUSSIONS ABOUT THE MT-1DCNN

A. Understanding Feature Learning Mechanism of MTL

To understand the feature learning mechanism of the MT-

1DCNN, we use the T-SNE [39] technology to visualize the

distribution of features in different layers of the network in the

2D space. Fig. 7 shows the visualization of the shallow features,

the shared features of the trunk network, the features of every

task-specific branch and the final output.

It is seen that the shallow features learned by the trunk

network do not contain specific task information. As the

network deepens, under the constraints of related tasks, the

trunk network learns the domain-specific information required

for multiple related tasks. So, the shared features (subfigures

A2, B2 and C2) learnt by the trunk network contain information

that can be used for related tasks. Although the discrimination

of the shared features is not particularly obvious, relevant

Metric Method

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.

Transactions on Instrumentation and Measurement

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

Dim 1 Dim 1

Dim 1

Dim 2Dim 2Dim 2

Dim 2

Dim 1

Dim 1 Dim 1

Dim 1Dim 1

Dim 2

Dim 2Dim 2

Dim 2

12 3 4



F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F1 1



S1 S2 S3 S4 L1 L2 L3 L4

(A1)

(B1)

(C1)

(A2)

(B2)

(C2)

(A3)

(B3)

(C3)

(A4)

(B4)

(C4)

SIT FDT LIT

Fig. 7. Group plot visualization of 2D features in different layers for the MT-1DCNN under SNR=10 dB, which are visualized by fault categories

(subfigures A1-A4), speed categories (subfigures B1-B4) and load categories (subfigures C1-C4).

auxiliary tasks can provide additional supervision information

and make the features learned by the network have a better

generalization ability. This also brings another benefit, that is,

the local minima of different tasks are in different positions in

MTL networks. Through joint learning, it can help the network

to escape the local minima. In single-task learning network,

gradient backpropagation tends to fall into local minima. Then,

these shared features are sent to the task-specific branches, from

which the features that can be used for specific tasks are

selected. In this way, the network allows the features that are

dedicated to one task of the shared features to be used by other

tasks, and such features are often not easy to learn in a single-

task learning network. According to TABLE IV, TABLE V

and TABLE VI, we observe consistent results, that is, the

classification results of the current task are improved after

introducing auxiliary tasks. This shows that certain features that

are dedicated to one task are indeed useful for its related tasks.

Finally, the final results are obtained by classifiers of different

tasks. The proposed MT-1DCNN can learn the shared features

for multiple related tasks, and can also process each task

separately. This preserves the independence of each task and

allows them to connect and promote each other. It is worth

noting that multiple tasks handled by the MT-1DCNN should

be related in some extent, which is assumption behind the

proposed method.

B. Combining the MTL Principle with Other Architecture

This section explores the applicability of the proposed multi-

task principle with other deep learning architectures. We

construct two network architectures, namely long short-term

memory (LSTM) and multi-task LSTM (MT-LSTM). We

firstly design the LSTM with two-layer LSTM cell, where the

length of time steps is 64 and the dimension of input size is 32.

Then, we replace the trunk network in the MT-1DCNN with

LSTM, keep other network structure the same as in the MT-

1DCNN, and finally construct MT-LSTM. The fault diagnosis

accuracy of the two networks under the four SNR scenarios (10

dB, 5 dB, 0 dB, and −5 dB) are shown in TABLE IX. The

experiment results show that the MTL principle successfully

combine with LSTM to improve its fault diagnosis performance.

TABLE IX

PERFORMANCE RESULTS OF MT-LSTM AND LSTM UNDER THE FOUR SNR

SCENARIOS.

10 dB 5 dB 0 dB −5 dB

LSTM 0.967±0.002 0.954±0.003 0.914±0.005 0.739±0.012

MT-LSTM 0.987±0.001 0.981±0.002 0.948±0.005 0.818±0.010

C. Experiments on the CWRU bearing dataset

In this section, the proposed method is tested on the bearing

dataset of Case Western Reserve University (CWRU). The

CWRU dataset is a public dataset. In this dataset, a total of four

load conditions are set, which are 0 hp, 1 hp, 2 hp and 3 hp. The

data used in this experiment comes from the drive end of the

test bench motor and contains four different health states,

namely healthy state, inner ring fault, outer ring fault, and ball

fault. All three types of faults are produced by electro-discharge

machining. Their diameters are 7 mils, 14 mils and 21 mils

respectively. We treat different degrees of failure as an

independent bearing health condition. Therefore, this dataset

contains 10 health conditions and four load conditions. After

data augmentation, 9320 training samples, 4200 validation

samples and 4200 test samples are obtained. The fault diagnosis

results of six networks under SNR = −5 dB are shown in

TABLE X. It is worth noting that MT-1DCNN contains only

FDB and LIB. The experimental results show that MT-1DCNN

obtains 93.8% fault diagnosis accuracy, which is an increase of

5.8% compared to Zhang-1DCNN. This indicates that MT-

1DCNN also performs well on the CWRU bearing dataset.

Method SNR

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.

Transactions on Instrumentation and Measurement

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

TABLE X

PERFORMANCE RESULTS OF THE SIX NETWORKS IN THE CWRU BEARING

DATASET (SNR=−5 DB)

Accuracy Recall Precision

MT-1DCNN 0.938±0.007 0.938±0.007 0.939±0.007

Wei-1DCNN 0.837±0.015 0.837±0.015 0.836±0.016

Wen-2DCNN 0.863±0.009 0.863±0.009 0.864±0.008

Zhang-1DCNN 0.880±0.002 0.880±0.002 0.889±0.005

Guo-2DCNN 0.621±0.021 0.620±0.022 0.625±0.018

BPNN 0.317±0.002 0.317±0.003 0.326±0.003

D. Novelties of the MT-1DCNN

Condition of the wheelset bearing is restricted by many

factors. As shown in Fig. 1, the vibration response of the

wheelset bearing is related not only to its health condition, but

also to its working conditions, such as speed and load.

Therefore, if a fault diagnosis model can obtain both health

condition information and working condition information, the

model can understand the bearing more comprehensively and

thereby can make a more accurate decision. However, based on

our literature review [36-38], we find that current deep

learning-based methods ignore association between working

condition and health condition. Therefore, this paper explores

possibility of enabling the network to simultaneously handle

working condition identification tasks and the FDT. We prove

that MTL can make the network effectively use and share the

features learned by different tasks, so as to improve fault

diagnosis performance. The novel MT-1DCNN is proposed,

which has achieved very competitive performance on the

wheelset bearing dataset compared to five peer networks.

MT-1DCNN first tries to use the multi-task structure to

benefit the working condition information to improve the fault

diagnosis performance of the network. In this way, our method

provides a new solution for FDT, and a general and scalable

architecture. MT-1DCNN can be easily expanded if there are

other available working condition information (such as

temperature). In addition, some task branches can also be

reduced. For example, on the CWRU dataset, MT-1DCNN

removes the SIB, it still achieved good performance. However,

the introduction of more tasks will definitely bring more

parameters and calculation cast, and it also challenges the trunk

network's feature learning ability. Therefore, we should

propose a lightweight trunk network with stronger feature

learning abilities. In this paper, inspired by [28], we apply the

wide convolution kernel to the entire network to make the

network have a more powerful ability to learn long-term

correlation features. In addition, we gradually reduce the size of

the convolution kernels and construct a shallow network

architecture to reduce the network parameters. As shown in

TABLE VII and TABLE VIII, MT-1DCNN has the same

amount of parameters as Wei-1DCNN [28], but it can handle

multiple tasks and has better diagnostic performance. This

indicates that our network improves the feature learning ability

while maintaining fewer parameters. Finally, by visualizing the

internal feature learning situation of the MT-1DCNN, we

discussed the mechanism of the MTL, which also made efforts

for interpreting CNN in fault diagnosis field.

E. Overfitting Problem

Multiple tasks are related but not the same. This diversity can

reduce the overfitting problem when learning parameters shared

in the trunk network. The more tasks we are learning

simultaneously, the more our model has to find a representation

that captures all of the tasks and the less is our chance of

overfitting on our original task [43].

However, overfitting is a common problem for supervised

learning-based networks, especially for a complex model.

During the training process, the network may lead to an

overfitted response in which the methods prioritizes signal

differentiation instead patterns characterization. In this regard,

the features considered by the methods could be specific of the

test set (e.g. noise level or electrical interferences), and not

common patterns, useful to be applied in the application of the

trained methodology in other similar systems. The following

two ideas can improve the generalization performance of the

network. 1) Unsupervised learning. Before supervised learning,

try to use unsupervised learning to characterize the patterns, and

then use supervised learning to classify, the overfitting problem

can be alleviated. 2) Transfer learning. Transferring the patterns

characterization learned by the network on other large datasets

to the current task to alleviate the network's overfitted response

to the current dataset.

VI. CONCLUSIONS

This paper proposes the end-to-end MT-1DCNN for fault

diagnosis of wheelset bearing. It introduces the MTL principle

into bearing fault diagnosis, and explores the influence of speed

information and load information on the FDT. The MT-1DCNN

acquires the shared features between these tasks by

simultaneously processing the FDT, the SIT, and the LIT. Then,

the network can obtain speed information and load information

that can assist the classifier for fault diagnosis from the shared

features. Therefore, the MT-1DCNN has a more comprehensive

feature learning mechanism, which can achieve better

performance with a lightweight network structure. The MT-

1DCNN establishes a multi-task network framework for

bearing fault diagnosis, which can not only use speed and load

tasks as auxiliary tasks, but also further use wind speed,

temperature and other tasks that related to the FDT. The

experiment results show that the MT-1DCNN has considerable

advantages over the five peer networks in accuracy, precision

and recall. We also prove that the MTL principle can

simultaneously improve the performance of the FDT, the SIT,

and the LIT. Moreover, possibility of combining the MTL

principle with another network architecture is preliminarily

validated.

REFERENCES

[1] H. Cao, F. Fan, K. Zhou and Z. He, "Wheel-bearing fault diagnosis of trains

using empirical wavelet transform," Measurement, vol. 82, 2016.

[2] Z. Li, J. Chen, Y. Zi and J. Pan, "Independence-Oriented VMD to Identify

Fault Feature for Wheel Set Bearing Fault Diagnosis of High Speed

Locomotive," Mech. Syst. Signal Pr., vol. 85, pp. 512-529, 2017.

[3] X. Wang, Z. Yang and X. Yan, "Novel Particle Swarm Optimization-Based

Variational Mode Decomposition Method for the Fault Diagnosis of

Complex Rotating Machinery," IEEE/ASME Transactions on

Mechatronics, vol. 23, no. 1, pp. 68-79, 2018.

Metric

Method

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.

Transactions on Instrumentation and Measurement

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

[4] X. Zhang, J. Wang, Z. Liu and J. Wang, "Weak Feature Enhancement in

Machinery Fault Diagnosis Using Empirical Wavelet Transform and an

Improved Adaptive Bistable Stochastic Resonance," ISA T., vol. 84, pp.

283-295, 2019.

[5] Y. Kong, T. Wang and F. Chu, "Meshing Frequency Modulation Assisted

Empirical Wavelet Transform for Fault Diagnosis of Wind Turbine

Planetary Ring Gear," Renew. Energ., vol. 132, pp. 1373-1388, 2019.

[6] Z. Liu, Y. Jin, M.J. Zuo and Z. Feng, "Time-Frequency Representation

Based on Robust Local Mean Decomposition for Multicomponent AM-

FM Signal Analysis," Mech. Syst. Signal Pr., vol. 95, pp. 468-487, 2017.

[7] Z. Liu, M.J. Zuo, Y. Jin, D. Pan and Y. Qin, "Improved Local Mean

Decomposition for Modulation Information Mining and Its Application to

Machinery Fault Diagnosis," J. Sound Vib., vol. 397, pp. 266-281, 2017.

[8] M. Kang, J. Kim, J. Kim, A.C.C. Tan, E.Y. Kim and B. Choi, "Reliable

Fault Diagnosis for Low-Speed Bearings Using Individually Trained

Support Vector Machines With Kernel Discriminative Feature Analysis,"

IEEE T. Power Electr., vol. 30, no. 5, pp. 2786-2797, 2015.

[9] L. Ren, W. Lv, S. Jiang and Y. Xiao, "Fault Diagnosis Using a Joint Model

Based on Sparse Representation and SVM," IEEE T. Instrum. Meas., vol.

65, no. 10, pp. 2313-2320, 2016.

[10] P. Baraldi, F. Cannarile, F. Di Maio and E. Zio, "Hierarchical K-Nearest

Neighbours Classification and Binary Differential Evolution for Fault

Diagnostics of Automotive Bearings Operating under Variable

Conditions," Eng. Appl. Artif. Intel., vol. 56, pp. 1-13, 2016.

[11] D.H. Pandya, S.H. Upadhyay and S.P. Harsha, "Fault Diagnosis of Rolling

Element Bearing with Intrinsic Mode Function of Acoustic Emission Data

Using APF-KNN," Expert Syst. Appl., vol. 40, no. 10, pp. 4137-4145, 2013.

[12] V.N. Ghate and S.V. Dudul, "Optimal MLP Neural Network Classifier for

Fault Detection of Three Phase Induction Motor," Expert Syst. Appl., vol.

37, no. 4, pp. 3468-3481, 2010.

[13] J. Zheng, H. Pan and J. Cheng, "Rolling Bearing Fault Detection and

Diagnosis Based on Composite Multiscale Fuzzy Entropy and Ensemble

Support Vector Machines," Mech. Syst. Signal Pr., vol. 85, pp. 746-759,

2017.

[14] K. Choi, S. Singh, A. Kodali, K.R. Pattipati, J.W. Sheppard and S.M.

Namburu, et al., "Novel Classifier Fusion Approaches for Fault Diagnosis

in Automotive Systems," IEEE T. Instrum. Meas., vol. 58, no. 3, pp. 602-

611, 2009.

[15] Y. LeCun, Y. Bengio and G. Hinton, "Deep Learning.," Nature, vol. 521,

pp. 436-444, 2015.

[16] D. Peng, Z. Liu, H. Wang, Y. Qin and L. Jia, "A Novel Deeper One-

Dimensional CNN With Residual Learning for Fault Diagnosis of

Wheelset Bearings in High-Speed Trains," IEEE Access, vol. 7, pp. 10278-

10293, 2019.

[17] W. Zhang, X. Li and Q. Ding, "Deep Residual Learning-Based Fault

Diagnosis Method for Rotating Machinery," ISA T., 2018.

[18] H. Wang, Z. Liu, D. Peng and Y. Qin, "Understanding and Learning

Discriminant Features Based on Multi-Attention 1DCNN for Wheelset

Bearing Fault Diagnosis," IEEE T. Ind. Inform., 2019.

[19] J. Pan, Y. Zi, J. Chen, Z. Zhou and B. Wang, "LiftingNet: A Novel Deep

Learning Network With Layerwise Feature Learning From Noisy

Mechanical Data for Fault Classification," IEEE T. Ind. Electron., vol. 65,

no. 6, pp. 4973-4982, 2018.

[20] J. Jiao, M. Zhao, J. Lin and C. Ding, "Deep Coupled Dense Convolutional

Network With Complementary Data for Intelligent Fault Diagnosis," IEEE

T. Ind. Electron., vol. 66, no. 12, pp. 9858-9867, 2019.

[21] R. Liu, F. Wang, B. Yang and S.J. Qin, "Multi-scale Kernel based Residual

Convolutional Neural Network for Motor Fault Diagnosis Under Non-

stationary Conditions," IEEE T. Ind. Inform., pp. 1, 2019.

[22] G. Xu, M. Liu, Z. Jiang, W. Shen and C. Huang, "Online Fault Diagnosis

Method Based on Transfer Convolutional Neural Networks," IEEE T.

Instrum. Meas., vol. 69, no. 2, pp. 509-520, 2020.

[23] I. Kao, W. Wang, Y. Lai and J. Perng, "Analysis of Permanent Magnet

Synchronous Motor Fault Diagnosis Based on Learning," IEE E T. Instrum.

Meas., vol. 68, no. 2, pp. 310-324, 2019.

[24] L. Wen, X. Li and L. Gao, "A New Two-Level Hierarchical Diagnosis

Network Based on Convolutional Neural Network," IEEE T. Instrum.

Meas., vol. 69, no. 2, pp. 330-338, 2020.

[25] R. Huang, J. Li, W. Li and L. Cui, "Deep Ensemble Capsule Network for

Intelligent Compound Fault Diagnosis Using Multisensory Data," IEEE T.

Instrum. Meas., pp. 1, 2019.

[26] X. Ding and Q. He, "Energy-Fluctuated Multiscale Feature Learning With

Deep ConvNet for Intelligent Spindle Bearing Fault Diagnosis," IEEE T.

Instrum. Meas., vol. 66, no. 8, pp. 1926-1935, 2017.

[27] D. Peng, H. Wang, Z. Liu, W. Zhang, M.J. Zuo and J. Chen, "Multi-branch

and Multi-scale CNN for Fault Diagnosis of Wheelset Bearings under

Strong Noise and Variable Load Condition," IEEE T. Ind. Inform., 2020.

[28] Z. Wei, P. Gaoliang, L. Chuanhao, C. Yuanhang and Z. Zhujun, "A New

Deep Learning Model for Fault Diagnosis with Good Anti-Noise and

Domain Adaptation Ability on Raw Vibration Signals.," Sensors (Basel,

Switzerland), vol. 17, no. 2, 2017.

[29] L. Su, L. Ma, N. Qin, D. Huang and A.H. Kemp, "Fault Diagnosis of High-

Speed Train Bogie by Residual-Squeeze Net," IEEE T. Ind. Inform., vol.

15, no. 7, pp. 3856-3863, 2019.

[30] L. Wen, X. Li, L. Gao and Y. Zhang, "A New Convolutional Neural

Network-Based Data-Driven Fault Diagnosis Method," IEEE T. Ind.

Electron., vol. 65, no. 7, pp. 5990-5998, 2018.

[31] X. Guo, L. Chen and C. Shen, "Hierarchical Adaptive Deep Convolution

Neural Network and Its Application to Bearing Fault Diagnosis,"

Measurement, vol. 93, 2016.

[32] R. Caruana, "Multitask Learning," Mach. Learn., vol. 28, no. 1, 1997.

[33] Y. Zhang and Q. Yang, "An Overview of Multi-Task Learning," National

Science Review, vol. 5, no. 01, pp. 30-43, 2018.

[34] K. Thung and C. Wee, "A Brief Review on Multi-Task Learning,"

Multimed. Tools Appl., vol. 77, no. 22, 2018.

[35] Y. Yan, E. Ricci, R. Subramanian, G. Liu, O. Lanz an d N. Sebe, "A Multi-

Task Learning Framework for Head Pose Estimation under Target

Motion," IEEE T. Pattern Anal., vol. 38, no. 6, pp. 1070-1083, 2016.

[36] S. Guo, B. Zhang, T. Yang, D. Lyu and W. Gao, "Multi-Task

Convolutional Neural Network with Information Fusion for Bearing Fault

Diagnosis and Localization," IEEE T. Ind. Electron., pp. 1, 2019.

[37] R. Liu, B. Yang and A.G. Hauptmann, "Simultaneous Bearing Fault

Recognition and Remaining Useful Life Prediction Using Joint-Loss

Convolutional Neural Network," IEEE T. Ind. Inform., vol. 16, no. 1, pp.

87-96, 2020.

[38] X. Cao, B. Chen and N. Zeng, "A deep domain adaption model with multi-

task networks for planetary gearbox fault diagnosis," Neurocomputing, vol.

409, pp. 173-190, 2020.

[39] L. Min, C. Qiang and S. Yan, "Network In Network," Computer Science,

2013.

[40] I. Goodfellow, Y. Bengio and A. Courville, "Deep Learning,", 2016.

[41] V.D.M. Laurens and G. Hinton, "Visualizing Data Using T-SNE," J. Mach.

Learn. Res., vol. 9, no. 2605, pp. 2579-2605, 2008.

[42] Z. Wang and Z. Qian, "Effects of concentration and size of silt particles on

the performance of a double-suction centrifugal pump," Energy, vol. 123,

pp. 36-46, 2017.

[43] S. Ruder, "An Overview of Multi-Task Learning in Deep Neural

Networks," [Online]. Available: arxiv:1706.05098.

Zhiliang Liu was born in Rizhao,

Shandong, China in 1984. Moreover,

received the Ph. D. degree in the School of

Automation Engineering from University

of Electronic Science and Technology of

China (UESTC), Chengdu, China, in 2013.

From 2009 to 2011, he studied in

University of Alberta as a visiting scholar

for two years. From 2013 to 2015, he was an assistant professor

with the School of Mechanical and Electrical Engineering,

University of Electronic Science and Technology of China

(UESTC). From 2015 to the present, he is an associate professor

in the same department. His research interests include fault

diagnosis and prognostics of rotating machinery by using

advanced signal processing and data mining methods. He

published more than 70 papers including 20+ SCI-Indexed

journal papers. He currently held 10+ research grants from

National Natural Science Foundation of China, Open Grants of

National Key Laboratory, China Postdoctoral Science

Foundation, et al.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.

Transactions on Instrumentation and Measurement

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

Huan Wang was born in Hunan, China.

He received the B.S. degree in the school

of Mechanical and Electrical Engineering

from University of Electronic Science and

Technology of China, Chengdu, China, in

2016, where he is currently pursuing the

M.S. degree in the school of Mechanical

and Electrical Engineering. His research

interests include mechanical fault diagnosis, image recognition,

deep learning and machine learning.

Junjie Liu was born in Chongqing, China,

in 1994. He received the B.S. degree in

mechanical engineering from University of

Electronic Science and Technology of

China, Chengdu, China, in 2017, where he

is currently pursuing the M.S. degree in

mechanical engineering. His research

interests include transfer learning,

equipment reliability, fault diagnosis and health management.

Yong Qin is the Professor, vice dean of

State Key laboratory of Rail Traffic

Control and Safety, Beijing Jiaotong

University. He received his BSc and MSc

degrees in Transportation Automation and

Control Engineering from Shanghai

Railway University, China, in 1993 and

1996, respectively, and the Ph.D. degree

from China Academy of Railway Sciences in Information

Engineering and Control in 1999. He is also the member of

IEEE ITS and RS, senior member of IET. He has authored or

coauthored more than 100 publication papers(SCI/EI), 1 ESI

highly cited paper and 5 books, has 23 patents granted including

2 USA patents, also won 11 science and technology progress

award of ministry. His research area mainly focused on

Prognostics and Health Management for railway transportation

system, transportation network safety and reliability, rail

operation planning and optimization.

Dandan Peng was born in Shanxi, China,

in 1992. She received the B.S. and M.S.

degrees in the School of Mechanical and

Electrical Engineering, University of

Electronic Science and Technology of

China, Chengdu, China, in 2016 and 2019,

respectively, and is currently working

toward the Ph.D. degree in Mechanical

Engineering in KU Leuven, Leuven, Belgium. Her research

interests include Hilbert Huang transform, convolutional neural

network, machinery condition monitoring and fault diagnosis.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on August 28,2020 at 02:06:46 UTC from IEEE Xplore. Restrictions apply.

Multimodal imbalanced‐data fault diagnosis method based on a dual‐branch interactive fusion network

Article

Full-text available

May 2024
IET SCI MEAS TECHNOL

Bearing‐fault diagnosis in rotating machinery is essential for ensuring the safety and reliability of mechanical systems. However, under complicated working conditions, the number of normal mechanical equipment samples can far exceed the number of faulty ones. When the data are so imbalanced, data fault diagnosis cannot be easily conducted using conventional deep learning methods. This study proposes a fault diagnosis method based on a dual‐branch interactive fusion network, which improves the accuracy and stability of bearing‐fault diagnosis. First, a dual‐branch feature representation network comprising an iterative attention‐feature fusion residual neural network and a long short‐term memory network is designed for extracting different modal features. Meanwhile, intermodal fusion of the extracted features is performed through multilayer perception. Based on the cost‐sensitive regularization loss, a new joint loss function is then designed for network training. Finally, the effectiveness of the proposed method is verified through comparative experiments, visualization analyses, ablation experiments, and generalization performance experiments.

Real-Time Gesture-Based Control of a Quadruped Robot Using a Stacked Convolutional Bi-Long Short-Term Memory (Bi-LSTM) Neural Network

Conference Paper

Feb 2024

An accurate lightweight algorithm for Bearings Fault Diagnosis based on DPW ATTCNN model

Article

May 2024

Yongfei Li

Imbalanced Real-Time Fault Diagnosis Based on Minority-Prioritized Online Semi-Supervised Random Vector Functional Link Network

Article

Jan 2024

Industrial real-time fault diagnosis is vital to ensure efficient and safe production. In the literature, existing methods usually do not systematically consider some realistic constraints in dealing with the above problem, such as real-time model update, sample imbalance, and high cost of labeling. In this paper, a minority-prioritized online semi-supervised random vector functional link network approach, termed MPOS-RVFL, is proposed to cope with the above issues. Specifically, the pseudo-labeling technique is introduced to fully exploit the information from unlabeled samples in the online data stream. In this context, the approach incorporating minority anchors prioritization, minority weight, and pseudo-label is developed to enhance the model’s capability in accurately identifying minority samples. Several experiments with a real-world gearbox fault dataset are conducted to verify the practicality of MPOS-RVFL. The results demonstrate that the proposed method outperforms the existing state-of-the-art approaches. The source code is available at https://github.com/THUFDD/MPOS-RVFL.

A novel method for H 2 S concentration prediction under small sample based on ECA-1DCNN-XGBR

Article

Jun 2024

Although gas concentration prediction based on deep learning has made significant progress, the accuracy is typically achieved on the basis of a large number of training samples, making it challenging to meet the requirements of real industrial scenarios. Moreover, traditional neural networks often face issues such as insufficient feature extraction or overfitting in the condition of small sample. In this work, a novel detection method that combines one-dimensional convolutional neural network (1DCNN) featuring efficient channel attention (ECA) mechanism with extreme gradient boosting regressor (XGBR) is proposed to address the aforementioned issue, and simultaneously, a high-quality dataset of H 2 S with small sample has also been collected through an automated gas data acquisition system fully operated by a computerized environment. Due to the special ensemble structure and regularization terms, XGBR can resist overfitting under small sample condition. Furthermore, the deep feature extraction capabilities of neural networks, coupled with the characteristic of attention mechanism to focus on key features, empower ECA-1DCNN to efficiently extract features. The experimental results demonstrated that the R 2 of ECA-1DCNN-XGBR reached 0.9999, and a RMSE of 0.584 and an MAE of 0.374 were simultaneously achieved. Meanwhile, compared with traditional machine learning and deep learning models, the proposed method performed best in regression prediction tasks. These results indicate proposed method performs excellently in the prediction of H 2 S gas concentrations under small sample, with high accuracy and reliability.

Multitask Learning for Aero-Engine Bearing Fault Diagnosis With Limited Data

Article

Jan 2024

Maintaining the operational stability of an aero-engine heavily relies on the health of the rotor bearings. Data-driven fault diagnosis approaches have achieved significant success due to their superior ability to identify faults. However, the application of conventional deep learning methods, which heavily rely on a huge amount of vibration signals, is unrealistic in most aero-engine related cases due to the limited number of fault samples. To tackle this problem, a multi-task siamese network (MTSN) is proposed, which combines data augmentation and metric learning into a unified framework. First, in the multi-classification task, a limited number of samples are augmented using Mixup and learned by a convolutional neural network. Then, in the metric learning task, the similarity between sample pairs is learned by the shared encoder in the siamese network. To ensure effective learning in both tasks, the loss functions of the two learning tasks are balanced by introducing uncertainty. Based on two public bearing fault datasets, we conducted a series of experiments. The experimental results show that the learning tasks mutually reinforce each other, and the MTSN exhibits good fault diagnosis performance with limited data.

Research on Cancer Recognition Based on Deep Transfer Learning and CT Images

Conference Paper

Mar 2024

Jian Huang

A Multi-task Bearing Fault Diagnosis Method Based on Multi-scale Residual CNN with Dual Attention

Conference Paper

Nov 2023

Fault-Type Identification in Power Systems Based on Cross-Validating Machine Learning Algorithms

Chapter

Mar 2024

The identification of the fault-types in power lines is the prerequisite for avoiding large-scale blackouts and restoring an abnormal or a faulty power system to its normal operation. Given an increasingly steady foundation for the application of artificial intelligence in power systems, machine learning has become one of the major directions applied to studying the fault problems in the systems. Performances of different machine learning algorithms may not be exactly the same in fault-type identification. In this paper the fault data of voltage and current obtained from a fault simulation model are collected and formed into sample sets. Five models of machine learning algorithms, i.e., one-dimensional convolutional neural network, principal component analysis & support vector machine, support vector machine, random forest, and K-nearest neighbor, are constructed and programmed in Python to identify various fault-types and then cross-validate their identification performances. After these models are trained in the training set, they are tested in the test set to obtain the values of such performance indexes as accuracy, precision, recall, and F1 score of the fault-type identification, and the confusion matrixes for the quantitative analysis of the misidentified and misclassified samples. The results show that the above five models built by the five machine learning algorithms perform significantly in identifying short-circuit fault-types, and all the values of the performance indexes exceed 98.81%. Especially, among them the model of the one-dimensional convolutional neural network outperforms the other four ones owing to its impressive overall performance.

LAFICNN: A Novel Convolutional Adaptive Fusion Framework for Fault Diagnosis of Rotating Machinery

Article

Jan 2024

CNN-based approaches have been widely developed and applied in intelligent diagnosis. However, only some CNN models make sufficient use of various favorable information in images, limiting adequate feature extraction. Moreover, existing models are still dominated by single-channel types, with limited multisensor collaborative diagnostic framework development. To alleviate the above problems, we propose a diagnostic framework called lightweight adaptive fusion improved convolutional neural network (LAFICNN). Specifically, a global spatial attention mechanism is first developed, which considers the nature of images and enhances the model’s attention to the long-range dependencies of signals. Then, a novel Conv Block that can better utilize the spatial information in the image is designed, and a single-channel lightweight improved convolutional neural network (LICNN) is constructed. In addition, an adaptive fusion module (AFM) is proposed, which can adaptively assign learnable weights to features from different branches at a shallow layer of the network without increasing the training parameters. Finally, the LAFICNN is proposed by embedding the AFM into the LICNN to realize multisensor collaborative diagnosis. Experimental results on a rolling bearing dataset show that LAFICNN improves the performance by 0.8% to 4.2% compared to the existing works. The stability of the LAFICNN is verified using a gearbox dataset.

Multibranch and Multiscale CNN for Fault Diagnosis of Wheelset Bearings Under Strong Noise and Variable Load Condition

Article

Full-text available

Jan 2020

The critical issue for fault diagnosis of wheelset bearings in high-speed trains is to extract fault features from vibration signals. To handle high complexity, strong coupling and low signal-to-noise ratio of the vibration signals, this paper proposes a novel multi-branch and multi-scale convolutional neural network that can automatically learn and fuse abundant and complementary fault information from the multiple signal components and time scales of the vibration signals. The proposed method combines the conventional filtering methods and the idea of the multi-scale learning, which can extend the breadth and depth of the feature learning process. Consequently, the proposed network can perform better. The experimental results on the wheelset bearing dataset demonstrate that the proposed method has better anti-noise ability and load domain adaptability, and can diagnose 12 fault types more accurately compared with the five state-of-the-art networks.

A deep domain adaption model with multi-task networks for planetary gearbox fault diagnosis

Article

May 2020
NEUROCOMPUTING

The planetary gearbox plays an important role in many advanced electromechanical mechanisms. The mechanical fault is a major factor that threatens the service performance of a planetary gearbox. Deep learning (DL) algorithms have been widely used to identify faults and health status of industrial equipment. However, owing to diversified equipment structures, variable working conditions and disparate data acquisition, the service performance of DL-based methods may degrade significantly when applied to different industrial sites. Domain adaptation emerges as a promising idea that aims to transfer knowledge from a source domain to a different but related target domain. In this paper, we introduce a novel deep domain-adaptive multi-task learning model Y-Net, which is exploited to enable domain-adaptive diagnosis of faults in planetary gearboxes. The SE-Res modules are utilized to reduce the redundancy of the model and improving the separability of deep features. Furthermore, a soft joint maximum mean discrepancy (SJMMD) is introduced to link the two pipeline in order to reduce both the marginal and conditional distribution discrepancy of the learned features, with the enhancement of auxiliary soft labels. The domain adaption between different planetary gearbox under variant operating condition is realized by the Y-Net. Experiments demonstrate the superiority of the proposed SJMMD over conventional maximum mean discrepancy, especially when the datasets of different domains suffer different imbalances.

Deep Ensemble Capsule Network for Intelligent Compound Fault Diagnosis Using Multisensory Data

Article

Dec 2019

With the manufacturing industry stepping into the emerging new era of big-data and intelligence, the amount of data collected from perception and monitoring systems with multiple smart sensors has increased tremendously. Such huge amount of multisensory data may not only power many aspects of fault diagnosis, but also bring great opportunities and challenges in modern manufacturing industry. In addition, with respect to intelligent fault diagnosis for machinery, few researches have been focused on the compound fault diagnosis under big-data circumstance. Therefore, a novel intelligent compound fault decoupling method based on deep capsule network and ensemble learning is developed for the compound fault decoupling and diagnosis using multisensory data. First, a decoupling capsule network (DCN) is constructed as the basic model. Second, taking the full advantage of multisensory data, the DCN model can be pre-trained with multiple sensor data, respectively, which can obtain various pre-trained DCN models. Finally, combining with ensemble learning skill, the pre-trained DCN models are integrated by a combination strategy to obtain the deep ensemble capsule network (DECN) model for intelligent compound fault decoupling and diagnosis. The performance of DECN model is validated by an automobile transmission dataset with two compound faults, and the experimental results illustrate that the DECN model obtains higher diagnosis accuracy and decouples the compound fault correctly.

Understanding and Learning Discriminant Features based on Multi-Attention 1DCNN for Wheelset Bearing Fault Diagnosis

Article

Nov 2019

Recently, deep learning based fault diagnosis methods have been widely studied for rolling bearings. However, these neural networks lack of interpretability for fault diagnosis tasks. That is, how to understand and learn discriminant fault features from complex monitoring signals remains a great challenge. Considering this challenge, this paper explores the use of attention mechanism in fault diagnosis networks and designs attention module by fully considering characteristics of rolling bearing faults to enhance fault-related features and to ignore irrelevant features. Powered by the proposed attention mechanism, a multi-attention one-dimensional convolutional neural network (MA1DCNN) is further proposed to diagnose wheelset bearing faults. The MA1DCNN can adaptively recalibrate features of each layer and can enhance the feature learning of fault impulses. Experimental results on the wheelset bearing dataset show that the proposed multi-attention mechanism can significantly improve discriminant feature representation, thus the MA1DCNN outperforms eight state-of-the-arts networks.

Multi-Task Convolutional Neural Network with Information Fusion for Bearing Fault Diagnosis and Localization

Article

Sep 2019

Accurate fault information is critical for optimal scheduling of production activities, improving system reliability, and reducing operation and maintenance cost. In recent years, many fault diagnosis methods for rolling element bearings have been developed based on deep learning. Most of them are totally data-driven and do not consider the domain knowledge that has been used in fault diagnosis for years. Meanwhile, operating conditions such as rotating speed and load that have great influence on vibration signals are also ignored. It may cause a decrease in accuracy when bearing type or operating condition changes. To address these problems, this paper proposes a rolling element bearing fault diagnosis and localization approach based on multi-task convolutional neural network (CNN) with information fusion. In the proposed approach, domain knowledge, operating conditions, and vibration signals are fused into a 3-dimensional input that can be processed well by CNN. Then a multi-task CNN with dynamic training rates is constructed to simultaneously accomplish two tasks, fault diagnosis and localization. Experimental results on two rolling element bearing testbeds with different bearing types and operating conditions are presented and compared with existing state-of-the-art approaches to demonstrate the effectiveness and accuracy of the proposed approach.

Multiscale Kernel Based Residual Convolutional Neural Network for Motor Fault Diagnosis Under Nonstationary Conditions

Article

Sep 2019

Motor fault diagnosis is imperative to enhance the reliability and security of industrial systems. However, since motors are often operated under non-stationary conditions, the high complexity of vibration signals raises notable difficulties for fault diagnosis. Therefore, considering the special physical characteristics of motor signals under non-stationary conditions, a multi-scale kernel based residual network (MK-ResCNN) is proposed in this paper for motor fault diagnosis. Our contributions mainly fall into two aspects. First, we notice that each motor fault category has various patterns in vibration signals due to the changing operational conditions of the motor. To capture these patterns, a multi-scale kernel algorithm are applied in the CNN architecture. Second, since the motor vibration signals are made up of many different components from different transfer paths, they are very complex and variable. To enable the architecture to extract fault features from deep and hierarchical representation spaces, sufficient depth of the network is needed, which will lead to the degradation problem. In the proposed method, residual learning is embedded into the multi-scale kernel CNN to avoid performance degradation and build a deeper network. To validate the effectiveness of the proposed networks, a normal motor and five motors with different failures are tested. The results and comparisons with state-of-the-art methods highlight the superiority of the proposed method.

Fault Diagnosis of High-Speed Train Bogie by Residual-Squeeze Net

Article

Jul 2019

Fault diagnosis of high-speed train (HST) bogie is essential in guaranteeing the normal daily operation of an HST. In prior works, feature extraction from multisensor vibration signals mainly relies on signal processing methods, which is independent of the classification process. Based on convolutional neural networks (CNNs), this paper presents a novel fault diagnosis system using the residual-squeeze net (RSNet), which is directly applicable to raw data (time sequences) and does not require any signal transformation or postprocessing. In this network, information fusion is achieved by using the convolutional layer. More specifically, via the squeeze operation, an optimal combination of channels is learnt by training the network. Experimental results obtained by using SIMPACK simulation data demonstrate the effectiveness of the proposed approach in both complete failure case and single failure case, with diagnosis accuracy near 100%. The proposed approach also shows good performance in identifying the locations of faulty components. Comparisons between RSNet and competitive methods shows the advantages of RSNet for fault classification.

Simultaneous Bearing Fault Recognition and Remaining Useful Life Prediction Using Joint Loss Convolutional Neural Network

Article

May 2019

Fault diagnosis and remaining useful life (RUL) prediction are always two major issues in modern industrial systems, which are usually regarded as two separated tasks to make the problem easier but ignore the fact that there are certain information of these two tasks can be shared to improve the performance. Therefore, to capture common features between different relative problems, a joint-loss convolutional neural network (JL-CNN) architecture is proposed in this paper, which can implement bearing fault recognition and RUL prediction in parallel by sharing the parameters and partial networks, meanwhile keeping the output layers of different tasks. The JL-CNN is constructed based on a CNN, which is a widely used deep learning method because of its powerful feature extraction ability. During optimization phase, a joint-loss function is designed to enable the proposed approach to learn the diagnosis-prognosis features and improve generalization while reducing the overfitting risk and computation cost. Moreover, because the information behind the signals of different problems has been shared and exploited deeper, the generalization and the accuracy of results can also be improved. Finally, the effectiveness of the JL-CNN method is validated by run-to-failure dataset. Compared with support vector regression (SVR) and traditional CNN, the MSE of the proposed method decreases 82.7% and 24.9% respectively. Therefore, results and comparisons show that the proposed method can be applied for the inter-crossed applications between fault diagnosis and RUL prediction.

Online Fault Diagnosis Method Based on Transfer Convolutional Neural Networks

Article

Mar 2019

Fault detection and diagnosis (FDD) is crucial for stable, reliable, and safe operation of industrial equipment. In recent years, deep learning models have been widely used in data-driven FDD methods because of their automatic feature learning capability. In general, these models are trained on historical sensor data, and therefore, it is very difficult to meet the real-time requirement of online FDD applications. Since transfer learning can solve different but similar problems in the target domain efficiently and effectively with the knowledge learned from the source domain, this paper proposes an online fault diagnosis method based on a deep transfer convolutional neural network (TCNN) framework. The TCNN framework is made up of an online CNN based on LeNet-5 and several offline CNNs with a shallow structure. First, time-domain signal data are converted into images that contain abundant fault information and are suitable as the input of CNN. Then, the online CNN is constructed to automatically extract representative features from the converted images and classify faults. Finally, in order to improve the real-time performance of the online CNN, several offline CNNs are also constructed and pretrained on related data sets. By directly transferring the shallow layers of the trained offline CNNs to the online CNN, the online CNN can significantly improve the real-time performance and successfully address the issue of achieving the desired diagnostic accuracy within limited training time. The proposed method is validated on two bearing data sets and one pump data set, respectively. The prediction accuracy of the proposed method using three data sets are 99.88%, 99.13%, and 99.98%, respectively. The experimental results also indicate that the improvement of accuracy is 19.21% for the motor bearing case, 29.82% for the rolling mill bearing case, and 33.26% for the pump case during the early stage of learning.

Deep Coupled Dense Convolutional Network With Complementary Data for Intelligent Fault Diagnosis

Article

Mar 2019

In recent years, artificial intelligent techniques have been extensively explored in the field of health monitoring and fault diagnosis due to their powerful capabilities. In this paper, we propose a deep coupled dense convolutional network (CDCN) with complementary data to integrate information fusion, feature extraction and fault classification together for intelligent diagnosis. In this framework, built-in and external sensor data are first developed to form the input of network in parallel. Then a one-dimensional coupled dense convolutional network is proposed, which not only could naturally build deeper network with alleviating the loss of features and gradient vanishing, but also develops a double-level information fusion strategy, including self-information fusion and mutual-information fusion, to facilitate the transmission of fault information and capture more comprehensive features. Finally, the extracted joint features are used for fault recognition and classification. The proposed approach is evaluated on a planetary gearbox test-bed. The results demonstrate the validity and superiority of the proposed method.

Multi-task Learning Based on Lightweight 1DCNN for Fault Diagnosis of Wheelset Bearings

Abstract

Recommended publications

A New Bearing Fault Diagnosis Method Based on Signal-to-Image Mapping and Convolutional Neural Netwo...

Understanding and Learning Discriminant Features based on Multi-Attention 1DCNN for Wheelset Bearing...

A Sound and Vibration Fusion Method for Fault Diagnosis of Rolling Bearings under Speed-Varying Cond...

Transfer Deep Learning Network for Rolling Bearing Fault Diagnosis of Wind Turbines