ArticlePDF Available

An end-to-end fault diagnostics method based on convolutional neural network for rotating machinery with multiple case studies

March 2022
Journal of Intelligent Manufacturing 33(4)

March 2022
33(4)

DOI:10.1007/s10845-020-01671-1

Authors:

Yiwei Wang

Beihang University (BUAA)

Jian Zhou

北京航空航天大学

Lianyu Zheng

Beihang University (BUAA)

The fault diagnostics of rotating components are crucial for most mechanical systems since the rotating components faults are the main form of failures of many mechanical systems. In traditional diagnostics approaches, extracting features from raw input is an important prerequisite and normally requires manual extraction based on signal processing techniques. This suffers of some drawbacks such as the strong dependence on domain expertise, the high sensitivity to different mechanical systems, the poor flexibility and generalization ability, and the limitations of mining new features, etc. In this paper, we proposed an end-to-end fault diagnostics model based on a convolutional neural network for rotating machinery using vibration signals. The model learns features directly from the one-dimensional raw vibration signals without any manual feature extraction. To fully validate its effectiveness and robustness, the proposed model is tested on four datasets, including two public ones and two datasets of our own, covering the applications of ball screw, bearing and gearbox. The method of manual, signal processing based feature extraction combined with a classifier is also explored for comparison. The results show that the manually extracted features are sensitive to the various applications, thus needing fine-tuning, while the proposed framework has a good robustness for rotating machinery fault diagnostics with high accuracies for all the four applications, without any application-specific manual fine-tuning.

A typical architecture of CNN (Jing et al. 2017)

…

Confusion matrices of case study 1, given by a BP neural network and b SVM

…

Test bench of case study 2 [Case Western Reserve University Bearing Data Center Website, https://csegroups.case.edu/ bearingdatacenter/home]

…

Private test bench for bearing fault

…

+21

sualization of the raw vibration data of the eight heath states

…

Figures - uploaded by Lianyu Zheng

Content may be subject to copyright.

Content uploaded by Lianyu Zheng

Content may be subject to copyright.

Journal of Intelligent Manufacturing

https://doi.org/10.1007/s10845-020-01671-1

An end-to-end fault diagnostics method based on convolutional

neural network for rotating machinery with multiple case studies

Yiwei Wang1·Jian Zhou1·Lianyu Zheng1·Christian Gogu2

Received: 29 November 2019 / Accepted: 15 September 2020

Abstract

The fault diagnostics of rotating components are crucial for most mechanical systems since the rotating components faults

are the main form of failures of many mechanical systems. In traditional diagnostics approaches, extracting features from raw

input is an important prerequisite and normally requires manual extraction based on signal processing techniques. This suffers

of some drawbacks such as the strong dependence on domain expertise, the high sensitivity to different mechanical systems,

the poor ﬂexibility and generalization ability, and the limitations of mining new features, etc. In this paper, we proposed an

end-to-end fault diagnostics model based on a convolutional neural network for rotating machinery using vibration signals.

The model learns features directly from the one-dimensional raw vibration signals without any manual feature extraction.

To fully validate its effectiveness and robustness, the proposed model is tested on four datasets, including two public ones

and two datasets of our own, covering the applications of ball screw, bearing and gearbox. The method of manual, signal

processing based feature extraction combined with a classiﬁer is also explored for comparison. The results show that the

manually extracted features are sensitive to the various applications, thus needing ﬁne-tuning, while the proposed framework

has a good robustness for rotating machinery fault diagnostics with high accuracies for all the four applications, without any

application-speciﬁc manual ﬁne-tuning.

Keywords Fault diagnostics ·Rotating machinery ·Vibration signals ·Convolutional neural network

Introduction

Rotating machinery is the essential equipment playing a

crucial character in the modern industry. As indispensable

key transmission devices of rotating machinery, the typi-

cal rotating components such as ball screws, bearings, and

gears, are the leading cause of failure in essential industrial

equipment such as induction motors, wheelset of high-speed

railway bogie, aero-engines, wind-turbine, etc. According to

statistics, 30–51% of rotating machinery failure are caused

by these key components (Islam and Kim 2019a; Zhao

et al. 2020). Failure of the rotating components results

in machine performance degradation, unwanted downtime,

BLianyu Zheng

lyzheng@buaa.edu.cn

1School of Mechanical Engineering and Automation, Beihang

University, Beijing 100191, China

2Institut Clément Ader (UMR CNRS 5312)

INSA/UPS/ISAE/Mines Albi, Université de Toulouse,

31400 Toulouse, France

economic losses and even human casualties. Normally, the

rotating components are installed deep inside the machine

and undergo a long degradation process from healthy to

failure. It is not practical to frequently shut down and disas-

semble the machines to examine their health state. If damaged

rotating components are left unattended, it may cause sec-

ondary damage for the machines. On the other hand, due to

different working conditions and other uncertainties, even

the same type of rotating components may exhibit their own

degradation process individually, making it difﬁcult to accu-

rately estimate the health states based on statistics of large

samples. Therefore, online monitoring and real-time fault

diagnostics of individual rotating components based on man-

ufacturing big data is an urgent demand.

Smart manufacturing, which is characterized by the inte-

gration of Artiﬁcial Intelligence (AI) with recent emerging

technologies (Lee et al. 2018), enables online monitoring and

massive manufacturing data acquisition from sensors and ter-

minals installed in equipment. However, the data must be

converted into useful information before it can be of value

to the industry. Prognostics and health management (PHM)

123

Journal of Intelligent Manufacturing

Fault classification

SVM

Signal processing based

feature extraction

Raw vibration

data

Data

segmentation

Time domain features

Frequency features

Time-frequency

features

FNN

Deep learning flow

LSTM

Results

Raw vibration

data

Data

segmentation Results

(a)

(b)

Fig. 1 Diagnostics methods: traditional versus deep learning based

is such a bridge converting manufacturing big data to use-

ful information. As an emerging discipline receiving great

attention from both academia and various industries, PHM

has been listed as a part of the “standard architecture of smart

manufacturing” proposed by China. PHM deeply fuses AI

into manufacturing industries through a complete architec-

ture containing functions such as intelligent fault diagnostics,

prognostics, predictive maintenance, etc. (Vogl et al. 2019;

Xia and Xi 2019). This fusion enables timely online fault

diagnostics of devices as well as their future state predic-

tion, and consequently, results in the improvement of the

maintainability, supportability, reliability and safety of essen-

tial industrial equipment. As an important part consisting of

PHM, intelligent fault diagnostics provide solutions for real-

time fault diagnostics of individual rotating components.

For rotating machinery, the vibration signal is widely

used for fault diagnostics due to various advantages, such

as continuous monitoring without stopping the machines,

ease of use, sensitivity towarding faults etc. Traditional intel-

ligent fault diagnostics normally contains two sequential

steps of manually extracting features from raw vibration

signals followed by establishing the mapping between the

extracted features and the corresponding states based on

classiﬁcation techniques such as support vector machine

(SVM) (Goyal et al. 2019) or feedforward neural network

(FNN), as shown in Fig. 1a. Whether a fault sensitive feature

can be extracted affects the performance of the diagnos-

tics model signiﬁcantly, and hence lots of effort are devoted

to extracting suitable features before a classiﬁcation algo-

rithm can be employed. The features are normally extracted

from time domain (Park et al. 2018), frequency domain,

or time-frequency domain using various signal processing

techniques such as fast Fourier transform, Hilbert-Huang

transform (Feng and Pan 2012), empirical mode decompo-

sition (Liu et al. 2018), variation mode decomposition (Yan

and Jia 2018), wavelet transform (Dhamande and Chaudhari

2018 ; Wang et al. 2018a), intrinsic time scale decompo-

sition (Feng et al. 2016), local mean decomposition (Wang

et al. 2018b), etc. Manually extracting features, while having

led to satisfying results in the past, also exhibits some draw-

backs. The complex signal processing techniques required by

feature extraction highly depend on the expertise and prior

knowledge, and also require lots of human labour. In addition,

manually extracted features are normally empirical and thus

sensitive to changes. These empirical features reduce the ﬂex-

ibility and the generalization ability of the diagnostics model,

i.e., the model performs highly accurately for one particu-

lar diagnostics task while much less accurately for another

task. Therefore, signiﬁcant human labour and expertise are

required for exploring and designing suitable features for dif-

ferent diagnostics tasks (Jing et al. 2017). These difﬁculties

in feature extraction seriously hinder fault diagnostics evolv-

ing into a mature technology that can be widely deployed in

industry.

The strong feature-learning ability of deep learning such

as auto encoder and convolutional neural network (CNN)

provides a potential solution to the aforementioned draw-

backs (Hamadache et al. 2019; Zhao et al. 2019;Lietal.

2019a; Jia et al. 2018). The hierarchical structures of multiple

neural layers enable deep learning networks to directly mine

information from raw data layer by layer (Fig. 1b). Compared

with other deep learning methods, CNN signiﬁcantly reduces

the number of parameters to be optimized by the strategies of

weight sharing and sub-sampling. CNN also has strong anti-

noise ability because of its insensitivity to the local change

due to the convolution process. Inspired by the successful

employment of CNN in image classiﬁcation area, it is easy

to think of converting waveform signal into images and then

using CNN for fault diagnostics. Hoang and Kang (2019)

converted vibration signal into grayscale images through a

simple method proposed by Nguyen et al. (2013), and then

fed the images into CNN for bearing diagnostics. Chen et al.

(2019) proposed a scheme combining discrete wavelet trans-

formation (DWT) with CNN for planetary gearboxes fault

diagnostics. A series of sets of wavelet coefﬁcients of DWT

were used as the input of CNN. Wang et al. (2019) proposed

a conversion method converting vibration signals from mul-

tiple sensors to images. A bottleneck layer optimized CNN

123

Journal of Intelligent Manufacturing

was used for rotating machinery diagnostics. Islam and Kim

(2019b) used 2D representation of acoustic emission sig-

nal processed by wavelet packet transform as the input of

an adaptive deep CNN for bearing fault diagnostics. Wang

et al. (2017) converted time sequences signal of gear box

into time-frequency images using continuous wavelet anal-

ysis and then fed the images into a deep CNN. Zhu et al.

(2019a) transformed multiple vibration signals of a rotor

into symmetrized dot pattern (SDP) images before classiﬁed

by CNN. Zhu et al. (2019b) employed short-time Fourier

transform to convert one-dimensional signals of bearing into

a time-frequency graph and then a novel capsule network

was proposed for diagnosing. Liang et al. (2020) employed

wavelet transform to extract time-frequency image features

from raw signals. Generative Adversarial Networks (GANs)

were used to generate additional fake training images for data

augmentation purposes. A CNN model was built for fault

modes classiﬁcation. The proposed method was validated on

a gearbox application. Chen et al. (2020) used cyclic spec-

tral analysis to obtain the two-dimensional Cyclic Spectral

Coherence maps of vibration signals and a CNN model was

constructed to learn high-level feature representations and

conduct fault classiﬁcation. The method was validated on a

public dataset of bearing faults published by the Case Western

Reserve University (CWRU). Zhang et al. (2020) processed

the raw vibration signals to gray-scale images without any

predetermined parameters and then fed into a CNN with two

dropout layers and two fully-connected layers for fault clas-

siﬁcation. CWRU bearing dataset were used for validation.

It can be seen that most studies require one additional

step that converts 1D vibration signal into 2D represen-

tations before using the CNN model, which circumvent

some drawbacks of manually feature extraction but still need

application-speciﬁc adaptation. Recently, directly extracting

features from one-dimensional raw vibrational data without

any signal processing techniques has begun to be proposed

by researchers. This provides an end-to-end solution for fault

diagnostics, which reduces the dependencies on expertise and

prior knowledge, and hence facilitates the use and deploy-

ment of diagnostics model. Wu et al. (2019) optimized the 2D

CNN to be a one-dimensional CNN that is suitable for pro-

cessing vibration signals, and validated the proposed model

on gearbox application. Li et al. (2018a) proposed a 1D CNN

model with the residual learning algorithm for bearing fault

diagnostics, and the raw data without any pre-processing

were fed into the built model. Li et al. (2020) developed an

adaptive 1D separable convolution with residual connection

network for diagnosing gear pitting. Peng et al. (2019)pro-

posed a deeper 1D CNN based on a 1D residual block for the

fault diagnostics of wheelset bearings in high-speed trains.

Wide convolution kernel and dropout technology were used

in the CNN to enhance the network’s generalization perfor-

mance. The traditional fault diagnostics, 1D CNN and 2D

CNN methods employed in the literature reviewed above are

summarized in Table 1.

The above studies focus only on one speciﬁc application.

Speciﬁcally, bearings and gearboxes are more extensively

studied than ball screws, whose fault diagnostics studies are

very limited due to lack of public dataset. The generaliz-

ability of CNN to fault diagnostics of rotating components

is necessary to be fully investigated. In this paper, we pro-

pose an end-to-end fault diagnostics method based on CNN

using raw vibration signal. A CNN model consisting of three

stacks of convolutional and pooling layer, dropout layer and

fully connected layer is proposed. The alternating convo-

lution and pooling layers of the CNN model automatically

extract feature maps from raw data layer by layer. The soft-

max function is used as the activation function of the last

fully connected layer for dealing with multi-class classiﬁca-

tion problems. No manually extracted feature is necessary. To

fully validate the effectiveness and the generalizability of the

proposed model for fault diagnostics of rotating components,

we tested on four datasets, including two public ones and two

of our own, covering the applications of ball screw, bear-

ing and gearbox. These three types of rotating components

are the typical ones that are widely used as the key compo-

nents in essential industrial equipment such as machine tools,

high-speed trains, aero-engines, wind-turbine, gas-turbine,

etc. To our best knowledge, our work ﬁrstly validates the

CNN model for fault diagnostics in such wide applications.

Moreover, the signal processing based feature extraction

combined with long short-term memory (LSTM) network

(the combination method is referred to as traditional method

here) is also explored and compared with the proposed CNN

model. Speciﬁcally, three typical engineered features, i.e.,

(a) wavelet packet energy (WPE) based on wavelet packet

decomposition, (b) instantaneous frequency (IF), and (c)

instantaneous spectral entropy (ISE) based on power spec-

trogram, are constructed from the raw vibration data and then

used as the input of an LSTM network. The proposed CNN

model is compared with the traditional method in terms of

accuracy and robustness in various applications

The remainder of the paper is organized as follows. “The

CNN-based diagnostics framework for rotating machin-

ery” section details the structure and the feature learning

mechanism of the proposed model. In “Case studies and dis-

cussions” section, the generalization of the proposed model

is veriﬁed by four case studies covering the commonly used

rotating components of ball screw, bearing and gear. The

generalizability and robustness of the proposed model is fur-

ther discussed by the comparison with traditional methods.

Finally, conclusions and perspectives are given in “Conclu-

sions and future work” section.

123

Journal of Intelligent Manufacturing

Table 1 Summary of different categories of fault diagnostics

Category Method Advantage/disadvantage References

Traditional fault diagnostics First manually extract features

based on signal processing

techniques such as Fourier

transform, Hilbert-Huang

transform, empirical mode

decomposition, wavelet

transform, etc.

Then feed the features to classiﬁer

such as support vector machine,

shallow neural networks, etc.

Signal processing techniques

highly depends on expertise and

prior knowledge

Manually extracted features are

application-speciﬁc and quite

sensitive to environment or

working conditions.

Require lots of skilled labour to

explore and design suitable

features for new diagnostics

task.

Goyal et al. (2019), Feng and Pan

(2012), Liu et al. (2018), Yan

and Jia (2018), Dhamande and

Chaudhari (2018), Wang et al.

(2018a,b), Feng et al. (2016),

andJingetal.(2017)

2D convolutional neural network First convert raw vibration signal

to 2D representations such as

grayscale images,

time-frequency images,

symmetrized dot pattern

images, cyclic spectral

coherence maps, etc.

Then feed the 2D representations

to 2D convolutional neural

network for classiﬁcation.

Circumvent some drawbacks of

manually feature extraction but

still need application-speciﬁc

adaptation.

Hoang and Kang (2019), Nguyen

et al. (2013), Wang et al.

(2017,2019), Islam and Kim

(2019b), Zhu et al. (2019a,b);

Chen et al. (2020), Zhu et al.

(2019), Chen et al. (2019),

Liang et al. (2020)andZhang

et al. 2020

1D convolutional neural network Use 1D convolutional neural

network to accomplish direct

feature extraction from raw

vibration signal and

classiﬁcation.

Provide end-to-end solutions for

fault diagnostics

Reduce the dependencies on

expertise and prior knowledge

Reduce the sensitivities to

environment or working

conditions

Facilitate the use and deployment

of diagnostics models.

Wu et al. (2019), Li et al.

(2018b,2020) and Peng et al.

(2019)

Fig. 2 A typical architecture of CNN (Jing et al. 2017)

123

Journal of Intelligent Manufacturing

The CNN-based diagnostics framework

for rotating machinery

Convolutional neural networks (CNNs), ﬁrst proposed by

LeCun for image processing, has two characteristics, i.e.,

spatially shared weights and spatial pooling (Goodfellow

et al. 2019). The architecture of a typical CNN is illus-

trated in Fig. 2, which is structured by series of stages (Jing

et al. 2017). The convolutional layer convolves multiple ﬁl-

ters with raw input data and generate feature maps. Pooling

layer often follows the convolutional layer to reduce the size

of feature map and extract the most signiﬁcant local features

(Li et al. 2019b). The last stage of the architecture consists

of a fully-connected layer, which is normally a multi-class

classiﬁcation model.

The schematic diagram of the proposed framework is illus-

trated in Fig. 3. The sliding window method was used to

segment the raw time sequence vibration data of each health

state and then reshape to a matrix before feeding into the

neural network. The one-hot encoding method is used to

manually create the labels of samples, which serve as the

output of the network. For example, if there are three classes

of data, the ﬁrst class is encoded as (1, 0, 0), the second (0,

1, 0), and the third (0, 0, 1). The self feature learning abil-

ity is realized by the hidden layers, which is comprised of

stacks of alternated convolutional layers and pooling layers.

One-dimensional convolution kernels and pooling kernels

are used in the network since the input is a one-dimensional

time series signal. The structure of the CNN model and the

feature learning process are detailed below.

Structure of the proposed CNN model

The structure of the proposed model is illustrated in Fig. 4,

including three stacks of convolution-pooling layers and a

fully connected layer. In the convolutional layer, multiple

ﬁlters are convolved with raw input data and generate trans-

lation invariant features. In the subsequent pooling layer, the

feature is compressed by sliding a ﬁxed-length window fol-

lowing several rules such as average, max and so on. In the

ﬁrst two stacks maxpooling layer is used while in the last

stack the average pooling layer is used. The data ﬂow from

the input of the network to the ﬁnal output in Fig. 4is detailed

by explaining the entities (denoted by the Greek letters) and

the actions (denoted by arrows).

1αis the input matrix of the network, which has the shape

(m1,n1). Note that we use the form (m,n) to represent a

m-by-nmatrix. The subscript of mand nas well as fand

sthat will be introduced later represents the index of the

layer.

2βiis a ﬁlter with shape (h,n), in which, i 1,2,…,f1.f1

is the number of ﬁlters in the 1st layer. h is the kernel size

of the convolution.

3γis the output matrix of the 1st convolution layer, having

the shape (m1,f1).

4Fromαto γ, the convolution operation is carried out,

which is detailed as follows. The dot product between

ﬁlter βiand a concatenation vector αk:k+h−1deﬁnes the

convolution operation.

cjϕ(βi·αk:k+h−1+b)(1)

in which, ·represents the dot product, bthe bias term

and ϕthe non-linear activation function. αk:k+h−1is a

h-length window starting from the k-th row to the (k +

h-1)-th row, which is deﬁned as:

αk:k+h−1=αk⊕αk+1 ⊕···⊕αk+h−1(2)

where ⊕is the concatenation operation of two vectors.

As deﬁned in Eq. 1, the output scalar cjcan be regarded

as the activation of the ﬁlter βion the corresponding

concatenation vector αk:k+h−1. By sliding the ﬁlter βi

through αand applying zero padding technique, m1out-

put scalar cjcan be obtained, forming a column vector ci,

also known as a feature map:

ci[c1,c2, ..., cj, ..., cm1](3)

One ﬁlter corresponds to one column vector. Since there

are f1ﬁlters in the ﬁrst layer, the output matrix γis thus

(m1,f1) matrix. From the above operation it can be seen

that one ﬁlter performs multiple convolution operations,

during which the weights of the ﬁlter are shared. The

feature map ci, obtained by convolving one ﬁlter βiover

the input data, represents the feature of the input data

extracted from a certain level. By convolving the input

data with multiple ﬁlters, a high-dimensional feature map

containing multiple column vectors that reﬂect the input

data from different perspectives are extracted.

5. μis the output matrix of the 2nd layer, having the

shape (m2/s2,n2), where s2is the pooling length of

the 2nd layer. Note that m2and n2denote input size

of the 2nd layer. Since the output of the current layer

is the input of the next layer, m2m1and n2

f1.

6Fromγto μ, max pooling operation is carried out, which

is detailed as follows. The max operation is taken over

the s2consecutive values in ci. Then the compressed

column vector his obtained as:

hi[h1,h2, ..., hl, ..., hm/s](4)

123

Journal of Intelligent Manufacturing

Fig. 3 Framework of proposed diagnostics model

layer: 1D convoluon 3

layer: 1D convoluon2

layer: Max Pooling 5

layer: 1D convoluon4

layer: Max Pooling 6

layer:

GlobalAveragePoolin g Fully connected layer

out

:( , )mnα

:( , )mfγ

222

:( , )msnμ

:( , )mnγ

Stack 1 Stack 2 Stack 3

Take average value

Take max value

, f

)

, n

)(m

, n

)

, f

)

, n

)(m

, n

)

, n

)(m

, n

)

Max(p1,p2, ,p

Fig. 4 Structure of proposed CNN network

where hl=max[c(l−1)s+1,c(l−1)s+2,···,cls]

From above we see that when a matrix goes through one

convolution layer, its number of rows keep unchanged

and the number of column equals to the number of

ﬁlters. In the case of pooling layer, the number of

columns keeps unchanged while the number of rows is

compressed depending on the pooling length.

7. In the 2nd and 3rd stacks, the convolution and pooling

propagate. The only difference is that the number of

ﬁlters and the pooling length varies.

8 The output of the 7th layer is ﬂattened and connected

with a fully connected layer, which is similar to a tra-

ditional multilayer neural network and can be applied

through different classiﬁcation. The dropout technique

is employed to prevent overﬁtting. The softmax func-

tion (Behley et al. 2013) is used as the last layer, which

gives the probability of being each label. Speciﬁcally,

assuming a K-label classiﬁcation task, the output of the

softmax function can be calculated as Eq. 5, in which

Wkand bkare the weight matrix and bias, P(yk|x;

Wk,bk) is the probability of being the k-th label (denoted

as pkin Fig. 4) given the input xand the correspond-

ing weight and bias. Here xis the vector after drop

out in the fully connected layer. The ﬁnal output of the

123

Journal of Intelligent Manufacturing

network is the health sate label with the highest proba-

bility.

⎡

⎢

⎣

P(y1|x;W1,b1)

...

P(yk|x;Wk,bk)

P(yK|x;WK,bK)

⎤

⎥

⎦

K

k1exp(Wkx+bk)

⎡

⎢

⎣

exp(W1x+b1)

...

exp(Wkx+bk)

exp(WKx+bK)

⎤

⎥

⎦

(5)

Hyperparameters

The activation function of all the convolutional layers is the

Relu function due to its ability to avoid gradient vanishing

and to its fast convergence. The loss function of the CNN

model is cross-entropy and the precision function is categor-

ical accuracy. L2 regularization term is set for the ﬁrst and

third convolution layers to reduce overﬁtting. The parameter

of L2 term is a trade-off between the effectiveness of training

and overﬁtting, i.e., a too-large value will lead to inadequate

training and a too-small value is not enough to reduce the

risk of overﬁtting. We set this value to 0.001 based on the

prior study (Ng 2004). Dropout is set for the fully connected

layer to reduce overﬁtting by directly setting the neurons of

the network to zero in a given proportion. We referred to

the study of the founder of the dropout technique (Srivastava

et al. 2014) and set this proportion to 0.5, which is a typical

value in deep learning.

The initial weight of the network is set by the glorot

uniform function, and the bias are set to 0. The weight is opti-

mized by the adaptive moment estimation (ADAM) solver

with initial learning rate 0.001 and exponentially decayed

rate 0.1. Adam solver is a combination of the Momentum

and RMSProp optimization algorithms. It designs an inde-

pendent adaptive learning rate for different parameters by

calculating the ﬁrst-order moment estimation and second-

order moment estimation of the gradient, which typically

gives better optimization performance than the alternative

stochastic gradient descent with momentum (SGDM) solver

(Kingma and Ba 2015). Adam algorithm is currently the most

widely used optimization algorithm embedded in the ﬁeld of

machine learning and deep learning.

The mini batch training strategy is adopted here. Specif-

ically, the training examples are divided into small batches.

The model parameters will be updated after each batch pass-

ing through the network. The passing through of one batch

is called one iteration. When the entire training example is

passed through the network once and each example has the

opportunity to update the model parameters, it is one epoch.

The execution environment is an Intel e5-2620v4 CPU and

a GeForce RTX2080Ti GPU. The above network setting and

the execution environment will be used in all the following

cases.

Case studies and discussions

Case 1: Ball screw lubrication states diagnostics

Experiment and data preparation

In this case study, the proposed model is validated for diag-

nosing the lubrication states of the ball screw. Ball screws

are crucial mechanical components being intensively used in

many engineering systems that requires precise positioning

such as the feed system in machine tool, and in high precision

leveling systems for aircrafts and missiles (Li et al. 2018a).

The growing demand for high speed and large lead for ball

screws makes it increasingly important to keep good lubrica-

tion in order to reduce the friction. Indeed, correct lubrication

is vital to ball screws since the lubrication affects signiﬁ-

cantly its performance. Poor lubrication may increase the

friction and impairs the positioning accuracy of ball screws.

In addition, abnormal vibration caused by poor lubrication

accelerates the damage of the machine tool and affects the

quality of machining. Therefore, monitoring and online diag-

nosing of the lubrication state of the ball screw is important

for improving position accuracy and lifetime of ball screws.

Very few reports are available regarding ball screw lubri-

cation state diagnostics. Motivated by this, we design an

experiment that simulates the different lubrication states of

ball screws. The experiment is carried out in the test bench

which was originally designed for measuring the friction

torque of a ball screw, as shown in Fig. 5. The drive sys-

tem drives the nut moving along the screw back and forth.

Three states labeled as “Grease”, “Oil”, and “Absent” are

simulated by (1) lubricating the ball screw using grease, (2)

lubricating using oil and (3) removing the original lubricant,

respectively. These three health states simulate the typical

lubrication states that ball screws may encounter in real work-

ing environment. The vibration signals corresponding to the

three states are acquired at a sampling rate 5 kHz with the

data acquisition system Prosig P8020, as shown in Fig. 6.

128-s data are acquired for each lubrication states.

The raw time domain signals of one round trip (forward

and reverse motion) of the nut under “Absent” lubrication is

shown in Fig. 7. Two parts can be clearly seen, which cor-

respond to the signals of forward and reverse motions of the

nut, respectively. The abrupt “peaks” are due to the sharp

slowdown and stop of the nut near the end of the motions.

The data near the beginning and end of the motions are dis-

carded. Only the “steady state” data in the middle stage of

the motions are retained. For conciseness, the full raw sig-

nals under “Oil” and “Grease” lubrications are not presented.

Instead, the retained segment of the forward motion of the

three lubrication conditions are given in Fig. 8a. It can be seen

that the differences among the three states are quite small,

thus we further transform the signal into frequency domain

123

Journal of Intelligent Manufacturing

Fig. 5 Ball screw test bench

Fig. 6 Data acquisition set-up

Forwar d motion Rever se motion

Retained data Retained data

Peak due to sharp

slowd own

Absent

Fig. 7 Raw signal under “Absent” lubrication

using FFT, as shown in Fig. 8b. The differences among the

three cases are not obvious and it is hard to see appropriate

patterns, making it more challenging to correctly distinguish

different lubrication conditions.

The raw vibration signal is divided into segments to form

the input samples of the network. For each state, there are

128 ×5000 6.4 ×105data point. 6400 samples are selected

as one segment and is further reshaped to a (64, 100) matrix. It

is worth pointing out that the sample length should be traded

off between the number of samples and the feature infor-

mation that one sample contains. A too-short length of time

window may carry incomplete feature information, leading

to the difﬁculty of diagnostics, while a long length of time

window will result in insufﬁcient training data. Based on the

sampling rate of data used in this paper as well as other related

research works, we take 6400 data points as one sample. 80%

data (80 samples) are reserved for training and the rest 20%

(20 samples) for testing. Finally, the training/testing samples

taken from each lubrication state form the overall training

sets (80 ×3240 samples) and the testing sets (20 ×3

60 samples). The input/output shape, the kernel size, stride

and number of ﬁlters of each layer during the training process

are reported in Table 2. Note that the above hyperparameters

(the kernel size, stride and number of ﬁlters of each layer)

remain unchanged in all the following case studies.

Results and discussions

The diagnostics accuracy on the test set is 100% and all

the three states are correctly classiﬁed (thus the confusion

matrix is not given). In order to better illustrate the fea-

ture learning process of the CNN model, the t-distributed

123

Journal of Intelligent Manufacturing

(a) (b)

Fig. 8 Retained segment of the forward motion of the three lubrication conditions: (a) in the time domain, (b) in the frequency domain

Table 2 Parameters of the

proposed model No. of

layer

Layer Input shape Kernel size/stride/number of ﬁlters Output shape

1 Convolution (240, 64, 100) (3, 100)/1/64 (240, 64, 64)

2 Maxpooling (240, 64, 64) 3/3 (240, 21, 64)

3 Convolution (240, 21, 64) 3 ×64/1/128 (240, 21, 128)

4 Maxpooling (240, 21, 128) 3/3 (240, 7, 128)

5 Convolution (240, 7, 128) 3 ×128/1/128 (240, 7, 128)

6 Average pooling (240, 7, 128) / (240, 128)

7 Dropout (240, 128) / (240, 64)

8 Fully connected (240, 64) / (240, 3)

stochastic neighbour embedding (t-SNE) technique (Maaten

and Hinton 2008) is used to illustrate the output of each layer.

t-SNE is a machine learning algorithm for high dimensional

data visualization using nonlinear dimensionality reduction

technique. For the current case study, the feature outputted

by each layer is high-dimensional, whose shape is given in

Table 2(e.g., after dropout layer, the feature that is fed into the

fully connected layer for classiﬁcation is a 1-by-64 vector).

We use the t-SNE technique to reduce the feature after each

layer to two-dimensional space in order to show how the data

“ﬂow” from input to output, and thus to see how the features

belonging to the same state aggregate. Figure 9shows this

process during testing, in which the distance between points

represents the similarity of different samples. The symbols

“1” (red), “2” (blue) and “3” (green) in the ﬁgures represents

the “Absent”, “Oil”, and “Grease” lubrication states, respec-

tively. We see that the in the input layer, the dots of the three

states are completely mixed and no pattern can be observed

to distinguish different fault modes. With the convolutional

and pooling operations implemented, the mixed dots gradu-

ally separated, and in the output layer, dots belonging to the

same state are clustered and dots belongs to different state

are completely separated. In the output layer of Fig. 9,it

can be seen that all the features belonging to the same state

are clustered and the features belonging to different state are

completely separated. There is no confusion, corresponding

to 100% accuracy.

123

Journal of Intelligent Manufacturing

Fig. 9 Model testing process

visualization by t-SNE, case

study 1

Input l ayer 1st layer 2nd layer

3rd layer 4th layer 5th layer

6th layer Output layer

Two well-known and widely used machine learning meth-

ods, i.e., feedforward backpropagation (BP) network and

support vector machine (SVM) are used here as comparison

methods. The sample length remains the same as that used

by the CNN model, and accordingly, the number of train-

ing samples and testing samples are unchanged. Since these

two methods normally accept low-dimensional or moderate-

dimensional data as input, each raw sample is pre-processed

by wavelet packet decomposition (WPD) to extract a feature

vector. Speciﬁcally, based on our prior knowledge on the

study of fault diagnostics using the vibration signal, a ﬁve-

level WPD is applied on each raw sample and accordingly

2532 frequency sub-bands of the raw sample are obtained.

The energy of each sub-band is calculated and concatenated

to form a 1-by-32 feature vector, which is the input of the BP

neural network and SVM.

A typical three-layer BP neural network is used. The num-

ber of neurons in the input layer and output layer are 32 and

3, which equals to the length of the feature vector and to the

three lubrication states, respectively. A hidden layer contain-

ing 10 neurons is adopted. Note that the number of neurons

in the hidden layer is typically an empirical value. Too many

or too few neurons may reduce the classiﬁcation accuracy

of the network. We gradually increased the number of neu-

rons from 5 to 20, and ﬁnally set this value to 10, where the

network achieved its highest accuracy.

The basic SVM for binary classiﬁcation is employed and

is turned into multi-classes classiﬁers by the strategy of “one-

vs-all”. The strategy involves training a single SVM classiﬁer

for each class, with the samples of that class as positive

samples and all other samples as negatives. Speciﬁcally, for

the current case study of ball screw, we trained three basic

SVM classiﬁers that are able to diagnose “Absent”, “Oil”, and

“Grease”, respectively. The Radial Basis Function (RBF) is

used as the kernel function.

The diagnostics accuracies of BP neural network and

SVM are 95%, and 90%. The corresponding confusion matri-

ces are given in Fig. 10a, b, respectively. The last column in

the matrix shows the percentages of examples predicted to

belong to each label that are correctly classiﬁed (also called

precision, or positive predictive value). For example, in the

1st row of Fig. 10a, 19 samples are classiﬁed by the proposed

model as lubrication state “Absent”, while 18 out of these 19

are correctly classiﬁed. One sample that should belong to

label “Oil” are incorrectly classiﬁed as “Absent”. The pre-

cision for label “Absent” hence equals to 18/19 94.7%.

The row at the bottom of the matrix shows the percentages

of all the samples belonging to each class that are correctly

classiﬁed (also called sensitivity or true positive rate, recall,

probability of detection, etc.). For example, in the 1st col-

umn of Fig. 10a there are 20 samples of “Absent”. 18 out

of 20 are correctly classiﬁed and one sample is incorrectly

identiﬁed. The sensitivity for label “Absent” is thus 18/20

90%. The value in the bottom right corner is the overall

classiﬁcation accuracy of the network, which equals to the

number of correctly classiﬁed samples divided by the total

number of testing samples, in this case, 95%.

123

Journal of Intelligent Manufacturing

Fig. 10 Confusion matrices of

case study 1, given by aBP

neural network and bSVM

30.0 %

2.0%

0.0%

3.33%

26.7 %

0.0%

6.2%

33.3 %

100%

0.0%

80.0 %

20.0 %

100%

0.0%

90.0 %

10.0 %

81.8 %

18.2 %

88.9 %

11.1 %

100%

0.0%

90.0 %

10.0 %

30.0 %

1.6%

0.0%

3.3%

31.7 %

0.0%

33.3 %

100%

0.0%

95.0 %

5.0%

100%

0.0%

90.0 %

10%

94.7 %

5.3%

90.5 %

9.5%

100%

0.0%

95.0 %

5.0%

Absent Oil GreaseAbsent Oil Grease

TargetTarget

Output of network

Output of netw ork

(a) (b)

Compared with the proposed CNN model, the overall

accuracies of the BP neural network and SVM are lower. Fig-

ure 10 indicates that confusions occur between the lubrication

states of “Absent” and “Oil”. Objectively, in real working

environment, the differences between “Absent” and “Oil”

are indeed small. This is also reﬂected in Fig. 8that the time-

domain and frequency-domain signals of these two states are

very similar and hard to be visually distinguished. In contrast

to the BP neural network and SVM, the CNN can well clas-

sify these two states.

Case study 2: Bearing fault diagnostics using CWRU

dataset

Data description and preparation

The public bearing fault dataset from Case Western Reserve

University (CWRU) (Case Western Reserve University

Bearing Data Center Website, https://csegroups.case.edu/

bearingdatacenter/home) is used to validate the proposed

model in this section. A benchmark study of CWRU dataset

was studied by (Smith and Randall 2015). As shown in

Fig. 11, the test bench consists of a 2 hp motor (1 hp 

735 W), a torque transducer/encoder, a dynamometer, and

control electronics. SKF-6202 deep groove ball bearings are

used as the test bearings and support the motor shaft. The

experiments were performed under four working conditions,

as reported in Table 3. Four fault modes: “Outer race fault”,

“Inner race fault”, “Ball fault” and “Normal” are introduced.

For each fault mode, a single fault point with three severi-

ties levels, i.e., fault diameters of 0.007 mil, 0.014 mil, and

0.021 mil were seeded, which is regarded as a different fault

mode. Therefore, there are 10 fault modes. Vibration data of

each fault type under each working condition were collected

using accelerometers, which were attached to the housing

with magnetic bases. The sampling rate was 48 kHz.

Fig. 11 Test bench of case study 2 [Case Western Reserve Uni-

versity Bearing Data Center Website, https://csegroups.case.edu/

bearingdatacenter/home]

Table 3 Description of working conditions

Working condition Motor load (hp) Motor speed (rpm)

1 0 1797

2 1 1772

3 2 1750

4 3 1730

The number of data acquired of each fault type under each

working condition are reported in Table 4. The amount of

data provided is different for each working condition. To

adapt this variation, the data preparation is adjusted. The

training/testing ratio is set to be 4:1. For conditions 2–4, the

number of data for each label is truncated to 4.8 ×105.6.4×

103data points are segmented and further reshaped to (64,

100) matrix as one sample. Thus 4.8 ×105/6.4 ×10375

samples for each fault mode are obtained. For condition 1,

the mode of the inner race damage with 0.014 mil is left out

due to insufﬁcient data. Indeed, we are aware of some data

augmentation techniques such as adding noise or Genera-

tive Adversarial Network that may mitigate the problem of

123

Journal of Intelligent Manufacturing

Table 4 Description of CWRU bearing fault data

Fault mode Fault label Number of data points available in each working condition

Condition 1 Condition 2 Condition 3 Condition 4

Normal 1 243,938 483,903 483,903 485,643

Inner race fault with fault diameter 0.007 mil 2 243,938 486,224 485,643 485,643

Inner race fault with fault diameter 0.014 mil 3 63,788 489,125 487,964 485,063

Inner race fault with fault diameter 0.021 mil 4 244,339 485,063 491,446 491,446

Outer race fault with fault diameter 0.007 mil 5 243,538 486,804 486,804 487,964

Outer race fault with fault diameter 0.014 mil 6 245,140 484,483 486,804 488,545

Outer race fault with fault diameter 0.021 mil 7 246,342 489,125 487,964 489,125

Ball fault with fault diameter 0.007 mil 8 243,938 487,384 486,804 488,545

Ball fault with fault diameter 0.014 mil 9 249,146 486,224 487,384 486,804

Ball fault with fault diameter 0.021 mil 10 243,938 486,804 487,384 486,804

Fig. 12 Model testing process

visualization under condition 2,

case study 2

Input layer 1

layer 2

layer

layer 4

layer 5

layer

layer Output layer

1 - red

2 - yellow

3 - blue

4 - green

5 - verdigris

6 - magenta

7 - gray

8 - pink

9 - meadow

10 - purple

insufﬁcient data, but such an investigation will be left for our

coming work. The data of other modes is truncated to 2.4 ×

105.2.4×103data points are segmented and further reshaped

to (24, 100) as one sample. 100 samples are obtained for each

fault label, and there are hence 80 ×9720 samples for

training and 20 ×9180 samples for testing.

Results and discussions

For each condition the diagnostics accuracy on the test set is

100%. Due to the space limitation, only the feature learning

process during testing of condition 2 are visualized by t-SNE,

asshowninFig.12. The 10 symbols in different colours

represents the 10 fault labels of condition 2. It can be seen

that from the 5th layer, features of same fault mode have

already been well aggregated and the features belonging to

different modes have been well separated.

Case study 3: Bearing fault diagnostics with private

dataset

Experiment and data preparation

In this case study, we validate the proposed model with the

bearing fault dataset acquired from our own test bench, as

shown in Fig. 13. Seven health states are considered, includ-

123

Journal of Intelligent Manufacturing

Bearing

Accelerometer

Laser sensor

Speed

controller

Motor

Fig. 13 Private test bench for bearing fault

ing the normal state, four types of single-point faults (i.e.,

inner race, outer race and ball), and three types of compound

faults (i.e., inner race and ball, inner race and outer race, outer

race and ball). Based on the literature and based on our obser-

vations and experiences, these faults are the most frequently

occurs. The vibration data are collected from an NSK-6308

deep groove ball bearing in the experiment performed under

three motor speeds 1500 rpm, 2000 rpm and 2500 rpm at

the sampling rate 20 kHz. For each health state under each

motor speed, the data acquisition lasts for 256 seconds, thus

5.12 ×106data points are acquired. 6.4 ×103data points

are segmented and further reshaped to (64, 100) matrix as

one sample. Thus 800 samples are obtained for each health

state. The train/test ratio is set to 4:1. Figure 14 illustrates the

vibration signal recorded in one second corresponding to the

eight health states. For conﬁdentiality reasons, the raw data

is normalized to (−1, 1).

Results and discussions

The diagnostics confusion matrices obtained on the test set

for all three motor speeds are shown in Fig. 15, where the

accuracies are nearly 100%. Labels 0–6 representes the fol-

lowing fault modes, i.e., 0-ball, 1-inner race, 2-outer race,

3-compound fault of inner race and ball, 4-compound fault

of outer race and ball, 5-compound fault of outer race and

inner race, 6-normal.

Due to the space limitation, only the feature learning

process during testing under the motor speed 1500 rpm is

visualized by t-SNE, as shown in Fig. 16. It can be seen that

in the output layer the features of same fault mode have been

well aggregated and the features belonging to different modes

have been well separated. Note that in the output layer, the

samples belonging to “6” are totally aggregate. Very few con-

fusions occur between “3” and “4”, and between “4” and “5”,

which is consistent with the confusion matrix of Fig. 15a.

Case study 4: PHM 2009 spur gearbox challenge data

Data description and preparation

The 2009 PHM data challenge of gearbox fault data is used

in this case study. Readers are referred to “PHM data chal-

lenge 2009,https://www.phmsociety.org/competition/PHM/

09” for more information about the experiment setting. The

overview of the apparatus is shown in Fig. 17a, including

the drive system, a tachometer for providing zero-crossing

information, the testing gearbox, and two accelerometers for

collecting data. Two sets of gears, i.e., spur gears and helical

gears were tested. We used the data of the spur gear since it

contains more fault modes than that of the helical gears. The

spur gearbox is a generic industrial one containing 3 shafts,

4 gears and 6 bearing, as shown in Fig. 17b. The teeth of the

input gear, 1st idler gear, 2nd idler gear and the output gear

are 32, 96, 48, and 80, respectively. Therefore, from input

to output the gear reduction ratio is (16/48) ×(24/40), or 5

to 1 reduction. For the gearbox, instead of single-point fault,

eight types of compound faults caused by gear chipped, gear

eccentric, bearing ball fault, shaft imbalance, shaft keyway

fault, etc. are considered. The detail descriptions of the eight

fault types are reported in Table 5. The faults were seeded in

the experiments. These faults covered the common failures

of gearboxes in real cases.

The experiments were carried out under 10 working condi-

tions, i.e., 1800, 2100, 2400, 2700, 3000 rpm (revolutions per

minute) shaft speed under high and low loading, respectively.

For each fault type under each working condition, vibration

signals were sampled synchronously from accelerometers

mounted on both the input and output shaft retaining plates,

as shown in Fig. 18. Data were acquired with a sampling

rate 66.67 kHz and sampling time 4 s, and thus 266,655 data

points are obtained and further truncated to 2.56 ×105for

each fault type. Additionally, for each working condition,

the experiment was repeated twice.

For data preparation, 6.4 ×103data points are segmented

and reshaped to (64, 100) as one sample. Therefore, 80 sam-

ples are obtained for each label. We randomly draw 80%

data (64 samples) for training and the 20% data (16 samples)

for testing. Finally, the training/testing data taken from each

label form the training sets (512 samples) and the testing sets

(128 samples).

Results and discussions

For all the 10 working conditions, the testing accuracy of

the eight types of faults are 100%. Due to space limitation,

we only show the feature learning process during testing at

working condition 2700 rpm under low loading as an exam-

ple, as given in Fig. 19. The labels 1–8 represent the fault

label as listed in Table 6.

123

Journal of Intelligent Manufacturing

Fig. 14 Visualization of the raw vibration data of the eight heath states

The ﬁvefold cross validation is used to evaluate the model.

All samples and corresponding labels are randomly divided

into ﬁve groups (the total number of samples in each group

is the same). Each round four out of the ﬁve groups are used

for training the model and the remaining is used for testing.

By cross validation, the model has been tested ﬁve times

and all the samples have the chance of being training/testing

data. Since the samples are totally divided randomly, in each

group the number of samples belonging to each label may

imbalanced. For all the ten working conditions, the testing

accuracies are 100%. We take working condition 2700 rpm

shaft speed and low loading as an example and show the

confusion matrices of the ﬁve testing results in Fig. 20.

Comparison with traditional diagnostics methods

The method of signal processing based feature extraction

combined with a long short-term memory (LSTM) network

123

Journal of Intelligent Manufacturing

159

14.2%

0.0%

0.2%

152

13.6%

0.0%

163

14.5%

163

14.5%

163

14.5%

33.3%

0.0%

1.2%

0.0%

0.2%

0.1%

0.0%

167

14.9%

0.0%

33.3%

98.8%

1.2%

100.0%

0.0%

100.0%

0.0%

98.8%

1.2%

99.4%

0.6%

100.0%

0.0%

148

13.2%

0.0%

100.0%

0.0%

100.0%

0.0%

98.0%

2.0%

100.0%

0.0%

98.7%

1.3%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

99.6%

0.4%

02134

(c)

motor speed 2500 rpm

161

14.4%

0.0%

0.2%

152

13.6%

0.0%

163

14.6%

163

14.6%

164

14.6%

33.3%

0.0%

0.3%

1.2%

0.0%

0.3%

0.0%

167

14.9%

0.0%

33.3%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

98.8%

1.2%

100.0%

0.0%

100.0%

0.0%

148

13.2%

0.0%

100.0%

0.0%

100.0%

0.0%

99.3%

0.7%

100.0%

0.0%

100.0%

1.3%

100.0%

0.0%

100.0%

0.0%

99.4%

0.6%

99.8%

0.2%

02134

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

98.2%

1.8%

98.2%

1.8%

99.5%

0.5%

(b)

motor speed 2000 rpm

Output health st ate label of network

161

14.4%

0.0%

0.2%

152

13.6%

0.0%

163

14.6%

162

14.5%

164

14.6%

33.3%

0.0%

0.3%

1.2%

0.0%

0.3%

0.0%

164

14.6%

0.0%

33.3%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

98.2%

1.8%

100.0%

0.0%

98.2%

1.8%

148

13.2%

0.0%

100.0%

00.0%

02134

(a)

motor speed 1500 rpm

Actual health state lab e Actual health state labe Actual health state lab e

Fig. 15 Confusion matrix of case study 3 under three motor speed, given by proposed CNN model

Fig. 16 Model testing process

visualization under motor speed

1500 rpm, case study 3

Input layer 1

layer 2

layer

layer 4

layer 5

layer

layer Output layer

0 - red

1 - yellow

2 - blue

3 - green

4 - verdigris

5 - magenta

6 - gray

as a classiﬁer (which is referred to as traditional method from

now on) is utilized to compare with the proposed.

CNN model in the above four case studies. The ﬂowchart

of the traditional method is shown in Fig. 21. The time

domain signal of different health states is ﬁrstly divided

into data segments. Then three manually extracted features,

i.e., wavelet packet energy (WPE) based on wavelet packet

decomposition (Zhang et al. 2013), instantaneous frequency

(IF) (Boashash 1992a,1992b) and instantaneous spectral

entropy (ISE) (Pan et al. 2008) based on power spectrogram,

123

Journal of Intelligent Manufacturing

Fig. 17 Gearbox used in 2009

PHM data challenge [PHM data

challenge 2009,https://www.

phmsociety.org/competition/

PHM/09]

Tested gearbox

Drive system

Tachometer

Accelerometer

Table 5 Fault modes description of 2009 PHM spur gears

Fault label Fault description

Gear Bearing Shaft

32T 96T 48T 80T IS:IS ID:IS OS:IS IS:OS ID:OS OS:OS Input Output

1 Good Good Good Good Good Good Good Good Good Good Good Good

2 Chipped Good Eccentric Good Good Good Good Good Good Good Good Good

3 Good Good Eccentric Good Good Good Good Good Good Good Good Good

4 Good Good Eccentric Broken Ball Good Good Good Good Good Good Good

5 Chipped Good Eccentric Broken Inner Ball Outer Good Good Good Good Good

6 Good Good Good Broken Inner Ball Outer Good Good Good Imbalance Good

7 Good Good Good Good Inner Good Good Good Good Good Good Keyway

Sheared

8 Good Good Good Good Good Ball Outer Good Good Good Imbalance Good

Tteeth of the gear, IS input shaft, ID idler shaft, OS output side, OS output side, IS input side

Fig. 18 The location of input

and output shaft accelerometers

[PHM data challenge 2009,

https://www.phmsociety.org/

competition/PHM/09]

Locati on of input

shaft accelerom eter

Location of outpu t

shaft accelerometer

are constructed from each data segment. The LSTM serves

as the classiﬁer. Therefore, three traditional methods, i.e.,

WPD-LSTM, IF-LSTM, ISE-LSTM are compared.

The architecture of the LSTM is composed of a sequence

input, one LSTM layer, a fully connected layer and a soft-

max layer. The fully connected layer multiplies the input

by the weight matrix and adds a bias vector. The output

is ﬁnally calculated by a softmax transfer function. For the

hyperparameters of LSTM, through initial trials we found

that the number of LSTM units and batch size are two obvi-

ous parameters that affect the accuracy, given an appropriate

learning rate. Speciﬁcally, we changed the number of LSTM

units consecutively from 22to 29for the four case studies

and found that a large amount of LSTM units are normally

required when the number of training samples is large, and

verse vice. For instance, in case study 1, where 240 training

samples are available, a LSTM network with 128 units (or

even fewer) performs better than that with 256 units, while

in case study 2 in which 720 training samples are available, a

LSTM network with 256 units are better than that with fewer

units. In terms of batch size, we ﬁnd that a smaller batch size

tends to result in a higher accuracy but the training oscilla-

tion increases accordingly. In addition, too small batch sizes

suffer the risk of non-convergence.

123

Journal of Intelligent Manufacturing

Fig. 19 Model testing process

visualization under 2700 rpm

and low loading, case study 4

Input layer 1

layer 2

layer

layer 4

layer 5

layer

layer Output layer

Speciﬁcally, the accuracies with all three features of WPE,

IF and ISE in case study 3 are high. Figure 22 illustrates

the confusion matrices in case study 3 under motor speed

1500 rpm as an example. The accuracies of IF-LSTM and

ISE-LSTM are acceptable in the application of gearbox under

the low loading condition, but dramatically decrease under

the high loading as well as in the case of ball screw. WPE,

which performs the best among the three manually extracted

features in many cases but suffers the risk of non-convergence

in some working conditions of gearbox application. The con-

fusion matrices given by the traditional methods for gearbox

application under speed 2700 rpm and low loading are shown

in Fig. 23 as an example. The accuracy given by WPE-LSTM

is very low due to non-convergence. In contrast, the proposed

CNN model can well identify the eight health states under

this working condition, which can be clearly visualized in

Fig. 19.

Through the comparison among the proposed CNN model

and the traditional methods in various applications under

various working conditions, it can be seen that the pro-

posed CNN model exhibits much more robustness, giving

consistently high accuracies in all four case studies. More-

over, the end-to-end structure of the CNN model requires

less reliance on empirical expertise and advanced signal pro-

cessing techniques, which enables the proposed model to be

easily adapted to different diagnostics tasks.

Conclusions and future work

Manual feature extraction based on signal processing tech-

niques is normally required in traditional diagnostics for

rotating machinery, which has the drawbacks such as strong

dependencies on the expertise and prior knowledge, the

requirement for lots of skilled human labour, the sensitiv-

ity to changes, etc., and thus requires extensive ﬁne-tuning.

Some recent works based on deep learning convert the vibra-

tion signal to images based on some time-frequency methods,

which can circumvent some of the previous drawbacks but

still need application-speciﬁc adaptation. In this paper, we

proposed an end-to-end health state diagnostics model based

on convolutional neural network (CNN), which can directly

learn feature representation from the raw vibration signal

and no manually extracted feature is required. In addition,

to fully validate the effectiveness and the generalizability of

the proposed model for fault diagnostics of the rotating com-

ponent, we carried out tests on four datasets, including two

public ones and two datasets of our own, covering the appli-

cations of ball screw, bearing and gearbox. The results show

high diagnostics accuracies for all the four tasks. To our best

knowledge, our work ﬁrstly validates the CNN model in such

wide applications.

Moreover, the signal processing based feature extraction

combined with long short-term memory (LSTM) network

123

Journal of Intelligent Manufacturing

10.9%

0.0%

9.4%

0.0%

15.6%

9.4%

17.2%

33.3%

0.0%

0.2%

1.2%

0.0%

4.7%

0.0%

33.3%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

14.1%

0.0%

100.0%

00.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

164

14.6%

0.0%

100.0%

00.0%

164

14.6%

0.0%

100.0%

0.0%

18.8%

0.0%

True Label

Predicted label

17.2%

0.0%

12.5%

0.0%

17.2%

14.1%

9.4%

33.3%

0.0%

0.2%

1.2%

0.0%

10.9%

0.0%

33.3%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

12.5%

0.0%

100.0%

00.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

164

14.6%

0.0%

100.0%

00.0%

164

14.6%

0.0%

100.0%

0.0%

6.3%

0.0%

True Label

Predicted label

10.9%

0.0%

12.5%

0.0%

10.9%

15.6%

9.4%

33.3%

0.0%

0.2%

1.2%

0.0%

15.6%

0.0%

33.3%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

10.9%

0.0%

100.0%

00.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

164

14.6%

0.0%

100.0%

00.0%

164

14.6%

0.0%

100.0%

0.0%

14.1%

0.0%

True Label

Predicted label

12.5%

0.0%

14.1%

0.0%

10.9%

9.4%

15.6%

33.3%

0.0%

0.2%

1.2%

0.0%

17.2%

0.0%

33.3%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

14.1%

0.0%

100.0%

00.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

164

14.6%

0.0%

100.0%

00.0%

164

14.6%

0.0%

100.0%

0.0%

6.3%

0.0%

True Label

Predicted label

10.9%

0.0%

14.1%

0.0%

7.8%

14.1%

10.9%

33.3%

0.0%

0.2%

1.2%

0.0%

14.1%

0.0%

33.3%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

10.9%

0.0%

100.0%

00.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

100.0%

0.0%

164

14.6%

0.0%

100.0%

00.0%

164

14.6%

0.0%

100.0%

0.0%

17.2%

0.0%

True Label

Predicted label

Fig. 20 Testing accuracies of ﬁvefold cross validation under the working condition 2700 rpm shaft speed and low loading

(here is referred to as traditional method) is also explored and

compared with the proposed CNN model. Speciﬁcally, three

typical engineered features, i.e., (a) wavelet packet energy

(WPE) based on wavelet packet decomposition, (b) instanta-

123

Journal of Intelligent Manufacturing

Fig. 21 Flowchart of the

implementation of the

traditional methods

Dat a seg me ntati onDat a seg me ntati onDat a seg me ntati on

Wavelet packet decompos ition, Power spectr ogram

Build LSTM network

Train network

Input test set

Diagnostics result

wavelet packet energy (WPE), Instantaneous frequency(IF), Instantaneous

spectral entropy (ISE)

. . . . . . . . .

Extracted features of training sets

Trained model

Extracted features of testing sets

Testing set Training set

Health state 1 Health state 2 Health state n

. . .

Actual health state lab el

0213456

(a)

WPE-LS TM

(b)

IF-LSTM

(c)

ISE-LS TM

0213456 0213456

Actual health state lab el Actual health state lab el

Fig. 22 Confusion matrix of case study 3 under motor speed 1500, given by the traditional method

123

Journal of Intelligent Manufacturing

Output health state label of network

1324567

(a) WPE-LS TM (b) IF-LSTM (c) ISE-LSTM

Actual health state label

813245678 13245678

Actual health state label Actual health state label

Fig. 23 Confusion matrix of case study 4 under speed 2700 rpm and low loading, given by the traditional method

Table 6 Accuracies of proposed CNN and traditional methods in all case studies

Case study Working

conditions

Number of labels Proposed CNN model (%) WPE-LSTM (%) IF-LSTM (%) ISE-LSTM (%)

1 500 rpm, 0 load 3 100 97.3 ±3.2 86.8 ±4.3 69.6 ±5.6

2 1797 rpm, 0 load 9 100 79.4 ±5.0 89.2 ±2.5 90.3 ±3.5

1772 rpm, 1hp load 10 100 92.7 ±5.2 86.4 ±6.0 82.2 ±4.5

1750 rpm, 2hp load 10 100 82.2 ±7.4 86.8 ±1.8 88.7 ±3.6

1730 rpm, 3hp load 10 100 92.4 ±7.0 86.8 ±3.6 79.4 ±4.6

3 1500 rpm, 0 load 7 99.5 ±0.5 96.3 ±2.1 97.5 ±2.4 92.2 ±2.5

2000 rpm, 0 load 7 99.8 ±0.5 95.4±5.9 99.1 ±2.3 92.3 ±2.1

2500 rpm, 0 load 7 99.5 ±0.5 98.5±3.4 94.6 ±2.4 99.4 ±2.1

4 1800 rpm, low load 8 100 93.4 ±3.6 90.1 ±2.3 81.9 ±4.1

1800 rpm, high load 8 100 91.0 ±1.7 82.1 ±7.3 84.5 ±2.3

2100 rpm, low load 8 100 99.5 ±0.5 88.0 ±8.4 89.4 ±5.4

2100 rpm, high load 8 100 – 83.5 ±3.4 84.0 ±2.1

2400 rpm, low load 8 100 98.6 ±1.7 89.9 ±2.5 90.7 ±1.8

2400 rpm, high load 8 100 86.8 ±2.4 82.2 ±2.9 81.1 ±2.7

2700 rpm, low load 8 100 – 97.1 ±1.9 89.7 ±2.4

2700 rpm, high load 8 100 92.2 ±4.6 84.0 ±1.6 76.9 ±2.7

3000 rpm, low load 8 100 95.2 ±5.9 90.9 ±4.0 91.0 ±1.7

3000 rpm, high load 8 100 96.1 ±2.7 82.2 ±1.6 76.8 ±2.1

“–” represents non-convergence

neous frequency (IF), and (c) instantaneous spectral entropy

(ISE) based on power spectrogram, are constructed from the

raw vibration data and then used as the input of a clas-

siﬁer (LSTM network). The results indicate that manually

extracted features based on signal processing techniques are

indeed sensitive to diagnostics tasks. One feature performs

well in one task but may fail to give satisfactory accuracy

or lead to non-convergence in another task. The comparison

shows that the proposed CNN based model has indeed good

robustness and ability of generalization that is easy to adapt

to different diagnostics task without any manual tuning.

The limits of the current work, and the corresponding

future work are summarized as follows. The current work

used the data acquired from the test benches of the laboratory.

Next, we will investigate the performance of the proposed

model in real industry environment. In the current work, the

high diagnostics accuracy of each application is based on the

assumptions that sufﬁcient labeled data are available, and that

the training and testing data are from the same distribution,

which may be a limiting factor in industrial applications.

To release these assumptions, our future work will focus

on transfer learning methods, which are able to transfer

123

Journal of Intelligent Manufacturing

vibration-based diagnostics capabilities to new working con-

ditions, experimental protocols and instrumented devices

while avoiding the requirement for new labeled fault data.

By this way, the diagnostics models trained with laboratory

data have the potential of being used in the real industry envi-

ronment. In the current work, the fault data of each label are

balanced. In our future work, we will focus on building the

diagnostics model when the fault data are unbalanced, i.e.,

only small fault data or even no fault data are available for

some speciﬁc fault labels, since in practice, faults of high-

stakes industrial devices are rare. In addition to the single

fault type considered in the current work, we will study the

fault diagnostics of compound faults. The issue of low signal

to noise ratio in the acquired vibration signal caused by the

strong coupling of different components is also our interests

for future work.

Acknowledgements The present work was funded by the National Nat-

ural Science Foundation of China (No.51805262) and the Graduate

Student Innovation Fund of Beihang University (YCSJ-03-2019-06).

The authors gratefully acknowledge the Key Laboratory of Performance

Test for CNC Machine Tool Components afﬁliated of Ministry of Indus-

try and Information Technology of China for providing the ball screw

test bench and experiment materials.

References

Behley, J., Steinhage, V., & Cremers, A. B. (2013). Laser-based seg-

ment classiﬁcation using a mixture of bag-of-words. In 2013

IEEE/RSJ international conference on intelligent robots and sys-

tems (pp. 4195–4200). https://doi.org/10.1109/IROS.2013.66969

57.

Boashash, B. (1992a). Estimating and interpreting the instantaneous

frequency of a signal. I. Fundamentals. Proceedings of the IEEE,

80(4), 520–538, doi:https://doi.org/10.1109/5.135376.

Boashash B (1992b). Estimating and interpreting the instantaneous

frequency of a signal. Proceedings of the IEEE 80(4), 540–568,

doi:https://doi.org/10.1109/5.135378.

Case Western Reserve University Bearing Data Center Website, Avail-

able: https://csegroups.case.edu/bearingdatacenter/home.

Chen, R., Huang, X., Yang, L., Xu, X., Zhang, X., & Zhang, Y. (2019).

Intelligent fault diagnosis method of planetary gearboxes based

on convolution neural network and discrete wavelet transform.

Computers in Industry, 106, 48–59. doi:https://doi.org/10.1016/j.

compind.2018.11.003.

Chen, Z., Mauricio, A., Li, W., & Gryllias, K. (2020). A deep learn-

ing method for bearing fault diagnosis based on Cyclic Spectral

Coherence and Convolutional Neural Networks. Mechanical Sys-

tems and Signal Processing, 140, 106683. doi:https://doi.org/10.

1016/j.ymssp.2020.106683.

Dhamande, L. S., & Chaudhari, M. B. (2018). Compound gear-bearing

fault feature extraction using statistical features based on time-

frequency method. Measurement, 125, 63–77. doi:https://doi.org/

10.1016/j.measurement.2018.04.059.

Feng, Z., Lin, X., & Zuo, M. J. (2016). Joint amplitude and frequency

demodulation analysis based on intrinsic time-scale decomposi-

tion for planetary gearbox fault diagnosis. Mechanical Systems

and Signal Processing, 72–73, 223–240. doi:https://doi.org/10.1

016/j.ymssp.2015.11.024.

Feng, G., & Pan, Y. (2012). Establishing a cost-effective sensing system

and signal processing method to diagnose preload levels of ball

screws. Mechanical Systems and Signal Processing, 28, 78–88.

doi:https://doi.org/10.1016/j.ymssp.2011.10.004.

Goodfellow, I., Bengio, Y., & Courville, A. (2019). Deep learning.

Cambridge, MIT Press.

Goyal, D., Choudhary, A., Pabla, B. S., & Dhami, S. S. (2019). Sup-

port vector machines based non-contact fault diagnosis system

for bearings. Journal of Intelligent Manufacturing. doi:https://doi.

org/10.1007/s10845-019-01511-x.

Hamadache, M., Jung, J. H., Park, J., & Youn, B. D. (2019). A com-

prehensive review of artiﬁcial intelligence-based approaches for

rolling element bearing PHM: shallow and deep learning. JMST

Advances, 1(1), 125–151. doi:https://doi.org/10.1007/s42791-01

9-0016-y.

Hoang, D. T., & Kang, H. J. (2019). Rolling element bearing fault diag-

nosis using convolutional neural network and vibration image.

Cognitive Systems Research, 53, 42–50. doi:https://doi.org/10.10

16/j.cogsys.2018.03.002.

Islam, M. M. M., & Kim, J. M. (2019a). Reliable multiple combined

fault diagnosis of bearings using heterogeneous feature models

and multiclass support vector Machines. Reliability Engineering

& System Safety, 184, 55–66. doi:https://doi.org/10.1016/j.ress.2

018.02.012.

Islam, M. M. M., & Kim, J. M. (2019b). Automated bearing fault diag-

nosis scheme using 2D representation of wavelet packet transform

and deep convolutional neural network. Computers in Industry,

106, 142–153. doi:https://doi.org/10.1016/j.compind.2019.01.00

Jia, F., Lei, Y., Guo, L., Lin, J., & Xing, S. (2018). A neural net-

work constructed by deep learning technique and its application

to intelligent fault diagnosis of machines. Neurocomputing, 272,

619–628. doi:https://doi.org/10.1016/j.neucom.2017.07.032.

Jing, L., Zhao, M., Li, P., & Xu, X. (2017). A convolutional neural

network based feature learning and fault diagnosis method for

the condition monitoring of gearbox. Measurement, 111, 1–10.

doi:https://doi.org/10.1016/j.measurement.2017.07.017.

Kingma, D. P., & Ba, J. (2015). Adam: A method for Stochastic

Optimization. the 3rd International Conference for Learning Rep-

resentations, San Diego, 2015, arXiv preprint arXiv:1412.6980.

Lee, J., Davari, H., Singh, J., & Pandhare, V. (2018). Industrial Arti-

ﬁcial Intelligence for industry 4.0-based manufacturing systems.

Manufacturing Letters, 18, 20–23. doi:https://doi.org/10.1016/j.

mfglet.2018.09.002.

Li, P., Jia, X., Feng, J., Davari, H., Qiao, G., Hwang, Y., et al. (2018a).

Prognosability study of ball screw degradation using systematic

methodology. Mechanical Systems and Signal Processing, 109,

45–57. doi:https://doi.org/10.1016/j.ymssp.2018.02.046.

Li, X., Li, J., Qu, Y., & He, D. (2019a). Semi-supervised gear fault

diagnosis using raw vibration signal based on deep learning. Chi-

nese Journal of Aeronautics. doi:https://doi.org/10.1016/j.cja.20

19.04.018.

Li, X., Li, J., Zhao, C., Qu, Y., & He, D. (2020). Gear pitting fault

diagnosis with mixed operating conditions based on adaptive 1D

separable convolution with residual connection. Mechanical Sys-

tems and Signal Processing, 142, 106740. doi:https://doi.org/10.

1016/j.ymssp.2020.106740.

Li, X., Zhang, W., & Ding, Q. (2019b). Deep learning-based remaining

useful life estimation of bearings using multi-scale feature extrac-

tion. Reliability Engineering & System Safety, 182, 208–218.

doi:https://doi.org/10.1016/j.ress.2018.11.011.

Li, X., Zhang, W., Ding, Q., & Sun, J. Q. (2018b). Intelligent rotating

machinery fault diagnosis based on deep learning using data aug-

mentation. Journal of Intelligent Manufacturing. doi:https://doi.

org/10.1007/s10845-018-1456-1.

123

Journal of Intelligent Manufacturing

Liang, P., Deng, C., Wu, J., & Yang, Z. (2020). Intelligent fault diag-

nosis of rotating machinery via wavelet transform, generative

adversarial nets and convolutional neural network. Measurement,

159, 107768. doi:https://doi.org/10.1016/j.measurement.2020.10

7768.

Liu, L., Liang, X., & Zuo, M. J. (2018). A dependence-based feature

vector and its application on planetary gearbox fault classiﬁcation.

Journal of Sound and Vibration, 431, 192–211. doi:https://doi.org/

10.1016/j.jsv.2018.06.015.

Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal

of Machine Learning research, 9(Nov), 2579–2605.

Ng, A. Y. (2004). Feature selection, L1 vs L2 regularization, and

rotational invariance. In Proceedings of the 21th international

conference on machine learning.

Nguyen, D., Kang, M., Kim, C. H., & Kim, J.-M. (2013). Highly

reliable state monitoring system for induction motors using dom-

inant features in a two-dimension vibration signal. NewReviewof

Hypermedia and Multimedia, 19(3–4), 248–258. doi:https://doi.

org/10.1080/13614568.2013.832407.

PHM data challenge. (2009). Available from https://www.phmsociety.

org/competition/PHM/09.

Pan, Y. N., Chen, J., & Li, X. L. (2008). Spectral entropy: A complemen-

tary index for rolling element bearing performance degradation

assessment. Proceedings of the Institution of Mechanical Engi-

neers, Part C: Journal of Mechanical Engineering Science, 223(5),

1223–1231, doi:https://doi.org/10.1243/09544062JMES1224.

Park, S., Kim, S., & Choi, J. H. (2018). Gear fault diagnosis using

transmission error and ensemble empirical mode decomposi-

tion. Mechanical Systems and Signal Processing, 108, 262–275.

doi:https://doi.org/10.1016/j.ymssp.2018.02.028.

Peng, D., Liu, Z., Wang, H., Qin, Y., & Jia, L. (2019). A novel deeper

one-dimensional CNN with residual learning for fault diagno-

sis of wheelset bearings in high-speed trains. IEEE Access, 7,

10278–10293. doi:https://doi.org/10.1109/ACCESS.2018.28888

42.

Smith, W. A., & Randall, R. B. (2015). Rolling element bearing

diagnostics using the Case Western Reserve University data: A

benchmark study. Mechanical Systems and Signal Processing,

64–65, 100–131. doi:https://doi.org/10.1016/j.ymssp.2015.04.02

Srivastava, N., Hinton, G., Krizhevsky, A., Sutsjever, I., & Salakhut-

dinov, R. (2014). DropOut: A simple way to prevent neural

network from overﬁtting. Journal of Machine Learning research,

15, 1929–1958.

Vogl, G. W., Weiss, B. A., & Helu, M. (2019). A review of diagnostic

and prognostic capabilities and best practices for manufacturing.

Journal of Intelligent Manufacturing, 30(1), 79–95. doi:https://

doi.org/10.1007/s10845-016-1228-8.

Wang,P., Ananya,Yan, R., & Gao, R. X. (2017). Virtualizationand deep

recognition for system fault classiﬁcation. Journal of Manufactur-

ing Systems, 44, 310–316. doi:https://doi.org/10.1016/j.jmsy.201

7.04.012.

Wang, C., Gan, M., & Zhu, C. a. (2018a). Fault feature extraction of

rolling element bearings based on wavelet packet transform and

sparse representation theory. Journal of Intelligent Manufactur-

ing, 29(4), 937–951. doi:https://doi.org/10.1007/s10845-015-115

3-2.

Wang, H., Li, S., Song, L., & Cui, L. (2019). A novel convolutional

neural network based fault recognition method via image fusion

of multi-vibration-signals. Computers in Industry, 105, 182–190.

doi:https://doi.org/10.1016/j.compind.2018.12.013.

Wang, L., Liu, Z., Miao, Q., & Zhang, X. (2018b). Complete ensemble

local mean decomposition with adaptive noise and its applica-

tion to fault diagnosis for rolling bearings. Mechanical Systems

and Signal Processing, 106, 24–39. doi:https://doi.org/10.1016/j.

ymssp.2017.12.031.

Wu, C., Jiang, P., Ding, C., Feng, F., & Chen, T. (2019). Intelligent

fault diagnosis of rotating machinery based on one-dimensional

convolutional neural network. Computers in Industry, 108, 53–61.

doi:https://doi.org/10.1016/j.compind.2018.12.001.

Xia, T., & Xi, L. (2019). Manufacturing paradigm-oriented PHM

methodologies for cyber-physical systems. Journal of Intelligent

Manufacturing, 30(4), 1659–1672. doi:https://doi.org/10.1007/s1

0845-017-1342-2.

Yan, X., & Jia, M. (2018). A novel optimized SVM classiﬁcation

algorithm with multi-domain feature and its application to fault

diagnosis of rolling bearing. Neurocomputing. doi:https://doi.org/

10.1016/j.neucom.2018.05.002.

Zhang, J., Sun, Y., Guo, L., Gao, H., Hong, X., & Song, H. (2020).

A new bearing fault diagnosis method based on modiﬁed convo-

lutional neural networks. Chinese Journal of Aeronautics, 33(2),

439–447. doi:https://doi.org/10.1016/j.cja.2019.07.011.

Zhang, Z., Wang, Y., & Wang, K. (2013). Fault diagnosis and prognosis

using wavelet packet decomposition, Fourier transform and artiﬁ-

cial neural network. Journal of Intelligent Manufacturing, 24(6),

1213–1227. doi:https://doi.org/10.1007/s10845-012-0657-2.

Zhao, X., Jia, M., & Lin, M. (2020). Deep Laplacian Auto-encoder

and its application into imbalanced fault diagnosis of rotating

machinery. Measurement, 152, 107320. doi:https://doi.org/10.1

016/j.measurement.2019.107320.

Zhao, R., Yan, R., Chen, Z., Mao, K., Wang, P., & Gao, R. X. (2019).

Deep learning and its applications to machine health monitor-

ing. Mechanical Systems and Signal Processing, 115, 213–237.

doi:https://doi.org/10.1016/j.ymssp.2018.05.050.

Zhu, X., Hou, D., Zhou, P., Han, Z., Yuan, Y., Zhou, W., et al. (2019a).

Rotor fault diagnosis using a convolutional neural network with

symmetrized dot pattern images. Measurement, 138, 526–535.

doi:https://doi.org/10.1016/j.measurement.2019.02.022.

Zhu, Z., Peng, G., Chen, Y., & Gao, H. (2019b). A convolutional neural

network based on a capsule network with strong generalization for

bearing fault diagnosis. Neurocomputing, 323, 62–75. doi:https://

doi.org/10.1016/j.neucom.2018.09.050.

Publisher’s Note Springer Nature remains neutral with regard to juris-

dictional claims in published maps and institutional afﬁliations.

123

A preview of this full-text is provided by Springer Nature.

Learn more

Content available from Journal of Intelligent Manufacturing

This content is subject to copyright. Terms and conditions apply.

A Zero-cost Unsupervised Transfer Method based on Non-vibration Signals Fusion for Ball Screw Fault Diagnosis

Article

Mar 2024
KNOWL-BASED SYST

Vibration-based fault diagnosis methods of ball screw are susceptible to noise and transmission path. Moreover, the accuracy of supervised deep learning models depends on large amounts of labeled samples, which are not only difficult to obtain but also laborious to label. Therefore, to solve these problems, a zero-cost unsupervised transfer method based on non-vibration signals fusion is proposed to achieve ball screw fault diagnosis in this paper. Firstly, non-vibration signals, such as current and speed, are adopted and orderly fused together to constitute multi-source fusion signal samples, which are easier to obtain and contain fewer interferences than vibration signals. Secondly, by virtue of its excellent abnormal detection ability, isolation forest algorithm is circularly utilized to generate pseudo-labels of source domain samples without manual labeling, which further realizes zero-cost sample labeling and unsupervised process. Finally, large amount of generated pseudo-labeled samples of source domain is applied to pre-train the transfer model parameters, and fine-tuning strategy with small number of labeled samples of target domain is used to complete transfer fault diagnosis of ball screw. The effectiveness of the proposed method is verified by ball screw signals across three different operation conditions, ablation and comparison analysis are also studied to illustrate its advantages.

Metric Learning-Based Few-Shot Adversarial Domain Adaptation: A Cross-Machine Diagnosis Method for Ball Screws of Industrial Robots

Article

Full-text available

Jan 2024

Due to the varying working conditions of SCARA (Selective Compliance Assembly Robot Arm) robots, there are significant differences in data distribution among different machines. As a result, it is challenging to apply unsupervised methods for cross-machine fault diagnosis. This paper proposes a method called Metric Learning-based Few-shot Adversarial Domain Adaptation (MLFADA) for cross-machine diagnosis of the SCARA robot’s ball screws. Firstly, MLFADA constructs data pairs by sampling a few shots of samples from the source domain (named SCARA A) and the target domain (named SCARA B). Subsequently, it integrates metric learning and adversarial learning theories to minimize the distance between data pairs that belong to the same class in both domains while maximizing the distance between data pairs from different classes. Secondly, to further enhance the performance of MLFADA, a strategy called Pseudo-label Self-correcting Maximum Mean Discrepancy (PSMMD) is proposed to reduce the conditional distribution differences between the two domains. Finally, a lightweight network is designed for feature extraction and fault classification to facilitate deployment on terminal devices. The experiment demonstrates that the challenge of cross-machine fault diagnosis for the ball screws of SCARA robots has been successfully resolved. This is a relatively understudied problem. Compared to mainstream domain adaptation and few-shot methods, the proposed method achieved the best diagnostic accuracy of 88.14%, even when there was only one labeled sample available in the target domain.

Estimation of Remaining Useful Life for Turbofan Engine Based on Deep Learning Networks

Conference Paper

Full-text available

Oct 2023

Having accurate prediction on the health of machines in manufacturing can lead to a profitable organization if the operations and maintenance decisions are appropriately performed. This hinges on making well-informed operational and maintenance decisions. Incorporating condition monitoring and predictive maintenance strategies can significantly contribute to achieving this goal. By continuously monitoring the real-time condition of machines, organizations can gather valuable data that offers insights into the performance and health of the equipment. However, dealing with a scarce dataset, which is common in real world applications, makes any prognostics on the maintenance system intricate. This is further exacerbated by the unavailability of failure data within the system which makes degradation model is best suited for the said situation. Since there is no extensive study discussing computational time under similar settings of two different networks for the degradation model in estimating RUL, this study investigates a simple Long Short-Term Model (LSTM) method for prognostics, which is compared to a two-dimensional Convolutional Neural Network (CNN) under the same training options. The networks are trained using the popular Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) dataset from the National Aeronautics and Space Administration (NASA). The aim of this study is to estimate the remaining useful life (RUL) of a turbofan engine in the most effective way. With carefully designed and defined network architectures, better performance can be attained, enabling proper foreseen of the RUL of an engine as soon as it is more likely to be close to failure. Based on the comparison, it is noted that the simple LSTM method for RUL prediction outperforms the two-dimensional CNN with better RUL prediction, Root Mean Square Error (RMSE), and computational time. For future improvement, this study can be further explored for a more sophisticated hybrid model that might produce better prediction in various sectors such as manufacturing, automotive, and military applications.

Prognostics and Health Management for Induction Machines: A Comprehensive Review

Article

Full-text available

Mar 2023
J INTELL MANUF

Induction machines (IMs) are utilized in different industrial sectors such as manufacturing, transportation, transmission, and energy due to their ruggedness, low cost, and high efficiency. If IMs fail without advanced warning, unscheduled maintenance needs to be performed, leading to downtime and maintenance costs for asset owners. To avoid these, conducting prognostics and health management (PHM) for IMs is indispensable. There are different PHM methods (expert knowledge, physics-based, and machine learning) to analyze the health and estimate the remaining useful life (RUL) of IMs. It is essential to select appropriate methods and algorithms to solve practical engineering problems by comparing their pros and cons. This paper will systematically summarize the application of the PHM framework to IMs and comprehensively present how to select appropriate general methods as well as specific algorithms Springer Nature 2021 L A T E X template 3 applied in the PHM for IMs to solve practical engineering problems, aiming to provide some guidance for future researchers and practitioners.

An Improved Convolutional-Neural-Network-Based Fault Diagnosis Method for the Rotor–Journal Bearings System

Article

Full-text available

Jun 2022

More layers in a convolution neural network (CNN) means more computational burden and longer training time, resulting in poor performance of pattern recognition. In this work, a simplified global information fusion convolution neural network (SGIF-CNN) is proposed to improve computational efficiency and diagnostic accuracy. In the improved CNN architecture, the feature maps of all the convolutional and pooling layers are globally convoluted into a corresponding one-dimensional feature sequence, and then all the feature sequences are concatenated into the fully connected layer. On this basis, this paper further proposes a novel fault diagnosis method for a rotor–journal bearing system based on SGIF-CNN. Firstly, the time-frequency distributions of samples are obtained using the Adaptive Optimal-Kernel Time–Frequency Representation algorithm (AOK-TFR). Secondly, the time–frequency diagrams of the training samples are utilized to train the SGIF-CNN model using a shallow information fusion method, and the trained SGIF-CNN model can be tested using the time–frequency diagrams of the testing samples. Finally, the trained SGIF-CNN model is transplanted to the equipment’s online monitoring system to monitor the equipment’s operating conditions in real time. The proposed method is verified using the data from a rotor test rig and an ultra-scale air separator, and the analysis results show that the proposed SGIF-CNN improves the computing efficiency compared to the traditional CNN while ensuring the accuracy of the fault diagnosis.

Experimental Verification of the Impact of Radial Internal Clearance on a Bearing’s Dynamics

Article

Full-text available

Aug 2022
SENSORS-BASEL

This paper focuses on the influence of radial internal clearance on the dynamics of a rolling-element bearing. In the beginning, the 2—Degree of Freedom (DOF) model was studied, in which the clearance was treated as a bifurcation parameter. The derived nonlinear mathematical model is based on Hertzian contact theory and takes into consideration shape errors of rolling surfaces and eccentricity reflecting real operating conditions. The analysis showed characteristic dynamical behavior by specific clearance range, which reflects others in a low or high amplitude and can refer to the optimal clearance. The experimental validation was conducted with the use of a double row self-aligning ball bearing (SABB) NTN 2309SK in which the acceleration response was measured by various rotational velocities. The time series obtained from the mathematical model and the experiment were analyzed with the recurrence quantification analysis.

Detail-semantic guide network based on spatial attention for surface defect detection with fewer samples

Article

Full-text available

Jul 2022
APPL INTELL

Surface defect detection is an important part of the process of product quality control in the industry. Automatic detection of surface defects based on machine learning is an up-and-coming research field, and there have been many successful cases. Deep learning has become the most suitable detection method for this task. Most algorithms require a large number of defect samples to achieve good results. However, defect samples in actual production are very limited. Although some unsupervised or semi-supervised methods can reduce training costs, their accuracy is difficult to guarantee, so they are difficult to be applied in industrial inspection. In this paper, we propose a detail-semantic guide network (DSGNet), which can achieve better result with fewer training samples. It is a two-stage neural network framework. In the first stage, we design a new semantic branch based on the modified residual shrinkage network and the proposed joined atrous spatial pyramid pooling (JASPP) module. This is the first time that residual shrinkage network is applied to defect detection and achieves good results. Also, we design a clear and efficient detail branch based on dense connection network. Specially, we propose a new detail-semantic guide module (DSGM), which can better integrate the feature information of the two branches. In the training phase, we propose a weight mask based on defect area to improve the ability of extracting small defects. We did experiments on four datasets and our method achieved excellent detection results even with only a small number of training samples.

An end-to-end harmful object identification method for sizer crusher based on time series classification and deep learning

Article

Apr 2023
ENG APPL ARTIF INTEL

A Big Data Application in Manufacturing Industry-Computer Vision to Detect Defects on Bearings

Conference Paper

Full-text available

Dec 2022

A Review of Fault Diagnosis Methods for Marine Electric Propulsion System

Chapter

Sep 2022

With the rapid development of power electronics technology and the proposal of intelligent ships, electric propulsion systems on ships are becoming more and more widespread. As the power source for ship navigation, timely and accurate diagnosis and prediction of faults of electric propulsion system play a vital role in the operation safety of ships. This paper summarises the common faults of electric propulsion systems, reviews the latest developments and applications of fault diagnosis techniques based on fault signal analysis in electric propulsion system fault diagnosis, and discusses the advantages and disadvantages of typical methods in the light of the latest literature and current research problems. The paper concludes by proposing future trends in fault diagnosis and prediction for ship electric propulsion systems.KeywordsMarine electric propulsion systemFault diagnosisSignal analysis

Support vector machines based non-contact fault diagnosis system for bearings

Article

Full-text available

Jun 2020
J INTELL MANUF

Bearing defects have been accepted as one of the major causes of failure in rotating machinery. It is important to identify and diagnose the failure behavior of bearings for the reliable operation of equipment. In this paper, a low-cost non-contact vibration sensor has been developed for detecting the faults in bearings. The supervised learning method, support vector machine (SVM), has been employed as a tool to validate the effectiveness of the developed sensor. Experimental vibration data collected for different bearing defects under various loading and running conditions have been analyzed to develop a system for diagnosing the faults for machine health monitoring. Fault diagnosis has been accomplished using discrete wavelet transform for denoising the signal. Mahalanobis distance criteria has been employed for selecting the strongest feature on the extracted relevant features. Finally, these selected features have been passed to the SVM classifier for identifying and classifying the various bearing defects. The results reveal that the vibration signatures obtained from developed non-contact sensor compare well with the accelerometer data obtained under the same conditions. A developed sensor is a promising tool for detecting the bearing damage and identifying its class. SVM results have established the effectiveness of the developed non-contact sensor as a vibration measuring instrument which makes the developed sensor a cost-effective tool for the condition monitoring of rotating machines.

A new bearing fault diagnosis method based on modified convolutional neural networks

Article

Full-text available

Aug 2019
CHINESE J AERONAUT

Fault diagnosis is vital in manufacturing system. However, the first step of the traditional fault diagnosis method is to process the signal, extract the features and then put the features into a selected classifier for classification. The process of feature extraction depends on the experimenters’ experience, and the classification rate of the shallow diagnostic model does not achieve satisfactory results. In view of these problems, this paper proposes a method of converting raw signals into two-dimensional images. This method can extract the features of the converted two-dimensional images and eliminate the impact of expert’s experience on the feature extraction process. And it follows by proposing an intelligent diagnosis algorithm based on Convolution Neural Network (CNN), which can automatically accomplish the process of the feature extraction and fault diagnosis. The effect of this method is verified by bearing data. The influence of different sample sizes and different load conditions on the diagnostic capability of this method is analyzed. The results show that the proposed method is effective and can meet the timeliness requirements of fault diagnosis.

Semi-supervised Gear Fault Diagnosis Using Raw Vibration Signal Based on Deep Learning

Article

Full-text available

May 2019
CHINESE J AERONAUT

In aerospace industry, gears are the most common parts of a mechanical transmission system. Gear pitting faults could cause the transmission system to crash and give rise to safety disaster. It is always a challenging problem to diagnose the gear pitting condition directly through the raw signal of vibration. In this paper, a novel method named augmented deep sparse autoencoder (ADSAE) is proposed. The method can be used to diagnose the gear pitting fault with relatively few raw vibration signal data. This method is mainly based on the theory of pitting fault diagnosis and creatively combines with both data augmentation ideology and the deep sparse autoencoder algorithm for the fault diagnosis of gear wear. The effectiveness of the proposed method is validated by experiments of six types of gear pitting conditions. The results show that the ADSAE method can effectively increase the network generalization ability and robustness with very high accuracy. This method can effectively diagnose different gear pitting conditions and show the obvious trend according to the severity of gear wear faults. The results obtained by the ADSAE method proposed in this paper are compared with those obtained by other common deep learning methods. This paper provides an important insight into the field of gear fault diagnosis based on deep learning and has a potential practical application value. Keywords: Deep learning, Gear pitting diagnosis, Gear teeth, Raw vibration signal, Semi-supervised learning, Sparse autoencoder

Gear fault diagnosis using transmission error and ensemble empirical mode decomposition

Article

Aug 2018

Classification of spall and crack faults of gear teeth is studied by applying the ensemble empirical mode decomposition (EEMD) to the transmission error (TE) measured by the encoders of the input and output shafts. Finite element models of the gears with the two faults are built, and TE’s are obtained by simulation of the faulty gears under loaded contact to identify the different characteristics. A simple test bed for a pair of spur gears is prepared to illustrate the approach, in which the TE’s are measured for the gears with seeded spall and crack, respectively. EEMD is applied to extract fault features under the noise from the measured TE. The differences of the spall and crack are clearly identified by the selected features of the intrinsic mode functions based on the class separability criterion. The k-nearest neighbor method is applied for the classification of the faults and normal gears using the features. The proposed method is advantageous over the existing practices in the sense that the TE signal measures the gear faults more directly with less noise, enabling successful diagnosis.

Intelligent Fault Diagnosis of Rotating Machinery via Wavelet Transform, Generative Adversarial Nets and Convolutional Neural Network

Article

Mar 2020
MEASUREMENT

The fault detection of rotating machinery systems especially its typical components such as bearings and gears is of special importance for maintaining machine systems working normally and safely. However, due to the change of working conditions, the disturbance of environment noise, the weakness of early features and various unseen compound failure modes, it is quite hard to achieve high-accuracy intelligent failure monitoring task of rotating machinery using existing intelligent fault diagnosis approaches in real industrial applications. In the paper, a novel and high-accuracy fault detection approach named WT-GAN-CNN for rotating machinery is presented based on Wavelet Transform (WT), Generative Adversarial Nets (GANs) and convolutional neural network (CNN). The proposed WT-GAN-CNN approach includes three parts. To begin with, WT is employed for extracting time-frequency image features from one-dimension raw time domain signals. Secondly, GANs are used to generate more training image samples. Finally, the built CNN model is used to accomplish the fault detection of rotating machinery by the original training time-frequency images and the generated fake training time-frequency images. Two experiment studies are implemented to assess the effectiveness of our proposed approach and the results demonstrate it is higher in testing accuracy than other intelligent failure detection approaches in the literatures even in the interference of strong environment noise or when working conditions are changed. Furthermore, its result in the stability of testing accuracy is also quite excellent.

Gear pitting fault diagnosis with mixed operating conditions based on adaptive 1D separable convolution with residual connection

Article

Aug 2020

Gear pitting fault diagnosis has always been an important subject to industry and research community. In the past, the diagnosis of early gear pitting faults has usually been carried out under single gear health state. In order to diagnose the early gear pitting faults with mixed operating conditions and reduce the number of training parameters, a new method is proposed in this paper. The proposed method uses an adaptive 1D separable convolution with residual connection network to classify gear pitting faults with mixed operating conditions. Compared to the traditional convolutional neural network, the separable convolution with residual connection network can carry out the channel convolution with point-by-point convolution to effectively reduce the number of network parameters. The residual connection can solve the representational bottleneck problem of the features in the model. Moreover, the method proposed in this paper applies the search algorithm to select better hyperparameters of the model. The raw vibration signals of the gear pitting faults at different speeds collected in a gear test rig are used to validate the effectiveness of the proposed method. The results show that the proposed method can accurately diagnose the early gear pitting faults with mixed speeds. In comparison with other machine learning models, the proposed method has provided a better diagnostic accuracy with fewer model parameters.

A Deep Learning method for bearing fault diagnosis based on Cyclic Spectral Coherence and Convolutional Neural Networks

Article

Jan 2020

Accurate fault diagnosis is critical to ensure the safe and reliable operation of rotating machinery. Data-driven fault diagnosis techniques based on Deep Learning (DL) have recently gained increasing attention due to theirs powerful feature learning capacity. However, one of the critical challenges lies in how to embed domain diagnosis knowledge into DL to obtain suitable features that correlate well with the health conditions and to generate better predictors. In this paper, a novel DL-based fault diagnosis method, based on 2D map representations of Cyclic Spectral Coherence (CSCoh) and Convolutional Neural Networks (CNN), is proposed to improve the recognition performance of rolling element bearing faults. Firstly, the 2D CSCoh maps of vibration signals are estimated by cyclic spectral analysis to provide bearing discriminative patterns for specific type of faults. The motivation for using CSCoh-based preprocessing scheme is that the valuable health condition information can be revealed by exploiting the second-order cyclostationary behavior of bearing vibration signals. Thus, the difficulty of feature learning in deep diagnosis model is reduced by leveraging domain-related diagnosis knowledge. Secondly, a CNN model is constructed to learn high-level feature representations and conduct fault classification. More specifically, Group Normalization (GN) is employed in CNN to normalize the feature maps of network, which can reduce the internal covariant shift induced by data distribution discrepancy. The proposed method is tested and evaluated on two experimental datasets, including data category imbalances and data collected under different operating conditions. Experimental results demonstrate that the proposed method can achieve high diagnosis accuracy under different datasets and present better generalization ability, compared to state of the art fault diagnosis techniques.

Deep Laplacian Auto-encoder and its Application into Imbalanced Fault Diagnosis of Rotating Machinery

Article

Nov 2019
MEASUREMENT

Generally, the measured health condition data from mechanical system often exhibits imbalanced distribution in real-world cases. To enhance fault diagnostic accuracy of the imbalanced data set, a novel rotating machinery fault imbalanced diagnostic approach based on Deep Laplacian Auto-encoder (DLapAE) is firstly developed in this paper. First of all, the collected vibration signals are immediately entered into the constructed DLapAE algorithm for layer-by-layer feature extraction, afterwards the extracted deep discriminative sensitive features are flowed into Back Propagation (BP) classifier for health condition diagnosis. More specifically, it is well worth mentioning that Laplacian regularization term can be reasonably added into the original objective function of Deep Auto-encoder (DAE) for smoothing the manifold structure of data in DLapAE. Namely, the proposed DLapAE algorithm with Laplacian regularization can improve the generalization performance of this fault diagnosis framework and make it more suitable for feature learning and classification of imbalanced data. Last but not least, two case of the experimental bearing systems can prove the effectiveness of proposed methodology. Compared with other existing fault diagnosis methods based on deep learning, the proposed fault diagnosis method can effectively implement the accurate fault diagnosis for rotating machinery balanced and imbalanced datasets.

Intelligent fault diagnosis of rotating machinery based on one-dimensional convolutional neural network

Article

Jun 2019
COMPUT IND

Fault diagnosis of rotating machinery plays a significant role in the reliability and safety of modern industrial systems. The traditional fault diagnosis methods usually need manually extracting the features from raw sensor data before classifying them with pattern recognition models. This requires much professional knowledge and complex feature extraction, only to cause results in a poor flexibility of the model, which only applies to the diagnosis of a fault in particular equipment. In recent years, deep learning has developed rapidly, and great achievements have been made in image analysis, speech recognition and natural language processing. However, its application in fault diagnosis of rotating machinery is still at the initial stage. In order to solve the problem of end-to-end fault diagnosis, this paper focuses on developing a convolutional neural network to learn features directly from the original vibration signals and then diagnose faults. The effectiveness of the proposed method is validated through PHM (Prognostics and Health Management) 2009 gearbox challenge data and a planetary gearbox test rig. Compared with the other three traditional methods, the results show that the one-dimensional convolutional neural network (1-DCNN) model has higher accuracy for fixed-shaft gearbox and planetary gearbox fault diagnosis than that of the traditional diagnostic ones.

A comprehensive review of artificial intelligence-based approaches for rolling element bearing PHM: shallow and deep learning

Article

May 2019

The objective of this paper is to present a comprehensive review of the contemporary techniques for fault detection, diagnosis, and prognosis of rolling element bearings (REBs). Data-driven approaches, as opposed to model-based approaches, are gaining in popularity due to the availability of low-cost sensors and big data. This paper first reviews the fundamentals of prognostics and health management (PHM) techniques for REBs. A brief description of the different bearing-failure modes is given, then, the paper presents a comprehensive representation of the different health features (indexes, criteria) used for REB fault diagnostics and prognostics. Thus, the paper provides an overall platform for researchers, system engineers, and experts to select and adopt the best fit for their applications. Second, the paper provides overviews of contemporary REB PHM techniques with a specific focus on modern artificial intelligence (AI) techniques (i.e., shallow learning algorithms). Finally, deep-learning approaches for fault detection, diagnosis, and prognosis for REB are comprehensively reviewed.

An end-to-end fault diagnostics method based on convolutional neural network for rotating machinery with multiple case studies

Abstract and Figures

Recommended publications

Generalized multiscale feature extraction for remaining useful life prediction of bearings with gene...

A Multistage Deep Transfer Learning Method for Machinery Fault Diagnostics Across Diverse Working Co...

Effect of image size on performance of a plastic gear crack detection system based convolutional neu...

Vibration-based early detection of plastic gear faults using Fourier decomposition and deep learning

Rolling Bearing Fault Diagnosis Based on One-Dimensional Dilated Convolution Network With Residual C...